[PATCH AUTOSEL 6.12 072/486] btrfs: make btrfs_discard_workfn() block

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 6.12 072/486] btrfs: make btrfs_discard_workfn() block_group ref explicit
       [not found] <20250505223922.2682012-1-sashal@kernel.org>
@ 2025-05-05 22:32 ` Sasha Levin
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 073/486] btrfs: avoid linker error in btrfs_find_create_tree_block() Sasha Levin
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-05-05 22:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Boris Burkov, Filipe Manana, David Sterba, Sasha Levin, clm,
	josef, linux-btrfs

From: Boris Burkov <boris@bur.io>

[ Upstream commit 895c6721d310c036dcfebb5ab845822229fa35eb ]

Currently, the async discard machinery owns a ref to the block_group
when the block_group is queued on a discard list. However, to handle
races with discard cancellation and the discard workfn, we have a
specific logic to detect that the block_group is *currently* running in
the workfn, to protect the workfn's usage amidst cancellation.

As far as I can tell, this doesn't have any overt bugs (though
finish_discard_pass() and remove_from_discard_list() racing can have a
surprising outcome for the caller of remove_from_discard_list() in that
it is again added at the end).

But it is needlessly complicated to rely on locking and the nullity of
discard_ctl->block_group. Simplify this significantly by just taking a
refcount while we are in the workfn and unconditionally drop it in both
the remove and workfn paths, regardless of if they race.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/discard.c | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index e815d165cccc2..d6eef4bd9e9d4 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -167,13 +167,7 @@ static bool remove_from_discard_list(struct btrfs_discard_ctl *discard_ctl,
 	block_group->discard_eligible_time = 0;
 	queued = !list_empty(&block_group->discard_list);
 	list_del_init(&block_group->discard_list);
-	/*
-	 * If the block group is currently running in the discard workfn, we
-	 * don't want to deref it, since it's still being used by the workfn.
-	 * The workfn will notice this case and deref the block group when it is
-	 * finished.
-	 */
-	if (queued && !running)
+	if (queued)
 		btrfs_put_block_group(block_group);
 
 	spin_unlock(&discard_ctl->lock);
@@ -260,9 +254,10 @@ static struct btrfs_block_group *peek_discard_list(
 			block_group->discard_cursor = block_group->start;
 			block_group->discard_state = BTRFS_DISCARD_EXTENTS;
 		}
-		discard_ctl->block_group = block_group;
 	}
 	if (block_group) {
+		btrfs_get_block_group(block_group);
+		discard_ctl->block_group = block_group;
 		*discard_state = block_group->discard_state;
 		*discard_index = block_group->discard_index;
 	}
@@ -493,9 +488,20 @@ static void btrfs_discard_workfn(struct work_struct *work)
 
 	block_group = peek_discard_list(discard_ctl, &discard_state,
 					&discard_index, now);
-	if (!block_group || !btrfs_run_discard_work(discard_ctl))
+	if (!block_group)
 		return;
+	if (!btrfs_run_discard_work(discard_ctl)) {
+		spin_lock(&discard_ctl->lock);
+		btrfs_put_block_group(block_group);
+		discard_ctl->block_group = NULL;
+		spin_unlock(&discard_ctl->lock);
+		return;
+	}
 	if (now < block_group->discard_eligible_time) {
+		spin_lock(&discard_ctl->lock);
+		btrfs_put_block_group(block_group);
+		discard_ctl->block_group = NULL;
+		spin_unlock(&discard_ctl->lock);
 		btrfs_discard_schedule_work(discard_ctl, false);
 		return;
 	}
@@ -547,15 +553,7 @@ static void btrfs_discard_workfn(struct work_struct *work)
 	spin_lock(&discard_ctl->lock);
 	discard_ctl->prev_discard = trimmed;
 	discard_ctl->prev_discard_time = now;
-	/*
-	 * If the block group was removed from the discard list while it was
-	 * running in this workfn, then we didn't deref it, since this function
-	 * still owned that reference. But we set the discard_ctl->block_group
-	 * back to NULL, so we can use that condition to know that now we need
-	 * to deref the block_group.
-	 */
-	if (discard_ctl->block_group == NULL)
-		btrfs_put_block_group(block_group);
+	btrfs_put_block_group(block_group);
 	discard_ctl->block_group = NULL;
 	__btrfs_discard_schedule_work(discard_ctl, now, false);
 	spin_unlock(&discard_ctl->lock);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 6.12 073/486] btrfs: avoid linker error in btrfs_find_create_tree_block()
       [not found] <20250505223922.2682012-1-sashal@kernel.org>
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 072/486] btrfs: make btrfs_discard_workfn() block_group ref explicit Sasha Levin
@ 2025-05-05 22:32 ` Sasha Levin
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 074/486] btrfs: run btrfs_error_commit_super() early Sasha Levin
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-05-05 22:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mark Harmstone, Qu Wenruo, David Sterba, Sasha Levin, clm, josef,
	linux-btrfs

From: Mark Harmstone <maharmstone@fb.com>

[ Upstream commit 7ef3cbf17d2734ca66c4ed8573be45f4e461e7ee ]

The inline function btrfs_is_testing() is hardcoded to return 0 if
CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set. Currently we're relying on
the compiler optimizing out the call to alloc_test_extent_buffer() in
btrfs_find_create_tree_block(), as it's not been defined (it's behind an
 #ifdef).

Add a stub version of alloc_test_extent_buffer() to avoid linker errors
on non-standard optimization levels. This problem was seen on GCC 14
with -O0 and is helps to see symbols that would be otherwise optimized
out.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/extent_io.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6551fb003eed2..b134cb324dab8 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2826,10 +2826,10 @@ struct extent_buffer *find_extent_buffer(struct btrfs_fs_info *fs_info,
 	return eb;
 }
 
-#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
 					u64 start)
 {
+#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 	struct extent_buffer *eb, *exists = NULL;
 	int ret;
 
@@ -2865,8 +2865,11 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
 free_eb:
 	btrfs_release_extent_buffer(eb);
 	return exists;
-}
+#else
+	/* Stub to avoid linker error when compiled with optimizations turned off. */
+	return NULL;
 #endif
+}
 
 static struct extent_buffer *grab_extent_buffer(
 		struct btrfs_fs_info *fs_info, struct page *page)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 6.12 074/486] btrfs: run btrfs_error_commit_super() early
       [not found] <20250505223922.2682012-1-sashal@kernel.org>
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 072/486] btrfs: make btrfs_discard_workfn() block_group ref explicit Sasha Levin
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 073/486] btrfs: avoid linker error in btrfs_find_create_tree_block() Sasha Levin
@ 2025-05-05 22:32 ` Sasha Levin
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 075/486] btrfs: fix non-empty delayed iputs list on unmount due to async workers Sasha Levin
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-05-05 22:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Qu Wenruo, Filipe Manana, David Sterba, Sasha Levin, clm, josef,
	linux-btrfs

From: Qu Wenruo <wqu@suse.com>

[ Upstream commit df94a342efb451deb0e32b495d1d6cd4bb3a1648 ]

[BUG]
Even after all the error fixes related the
"ASSERT(list_empty(&fs_info->delayed_iputs));" in close_ctree(), I can
still hit it reliably with my experimental 2K block size.

[CAUSE]
In my case, all the error is triggered after the fs is already in error
status.

I find the following call trace to be the cause of race:

           Main thread                       |     endio_write_workers
---------------------------------------------+---------------------------
close_ctree()                                |
|- btrfs_error_commit_super()                |
|  |- btrfs_cleanup_transaction()            |
|  |  |- btrfs_destroy_all_ordered_extents() |
|  |     |- btrfs_wait_ordered_roots()       |
|  |- btrfs_run_delayed_iputs()              |
|                                            | btrfs_finish_ordered_io()
|                                            | |- btrfs_put_ordered_extent()
|                                            |    |- btrfs_add_delayed_iput()
|- ASSERT(list_empty(delayed_iputs))         |
   !!! Triggered !!!

The root cause is that, btrfs_wait_ordered_roots() only wait for
ordered extents to finish their IOs, not to wait for them to finish and
removed.

[FIX]
Since btrfs_error_commit_super() will flush and wait for all ordered
extents, it should be executed early, before we start flushing the
workqueues.

And since btrfs_error_commit_super() now runs early, there is no need to
run btrfs_run_delayed_iputs() inside it, so just remove the
btrfs_run_delayed_iputs() call from btrfs_error_commit_super().

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/disk-io.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 19e5f8eaae772..e0e740e4d7c75 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4254,6 +4254,14 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
 	/* clear out the rbtree of defraggable inodes */
 	btrfs_cleanup_defrag_inodes(fs_info);
 
+	/*
+	 * Handle the error fs first, as it will flush and wait for all ordered
+	 * extents.  This will generate delayed iputs, thus we want to handle
+	 * it first.
+	 */
+	if (unlikely(BTRFS_FS_ERROR(fs_info)))
+		btrfs_error_commit_super(fs_info);
+
 	/*
 	 * Wait for any fixup workers to complete.
 	 * If we don't wait for them here and they are still running by the time
@@ -4343,9 +4351,6 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
 			btrfs_err(fs_info, "commit super ret %d", ret);
 	}
 
-	if (BTRFS_FS_ERROR(fs_info))
-		btrfs_error_commit_super(fs_info);
-
 	kthread_stop(fs_info->transaction_kthread);
 	kthread_stop(fs_info->cleaner_kthread);
 
@@ -4468,10 +4473,6 @@ static void btrfs_error_commit_super(struct btrfs_fs_info *fs_info)
 	/* cleanup FS via transaction */
 	btrfs_cleanup_transaction(fs_info);
 
-	mutex_lock(&fs_info->cleaner_mutex);
-	btrfs_run_delayed_iputs(fs_info);
-	mutex_unlock(&fs_info->cleaner_mutex);
-
 	down_write(&fs_info->cleanup_work_sem);
 	up_write(&fs_info->cleanup_work_sem);
 }
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 6.12 075/486] btrfs: fix non-empty delayed iputs list on unmount due to async workers
       [not found] <20250505223922.2682012-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 074/486] btrfs: run btrfs_error_commit_super() early Sasha Levin
@ 2025-05-05 22:32 ` Sasha Levin
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 076/486] btrfs: properly limit inline data extent according to block size Sasha Levin
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-05-05 22:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Filipe Manana, David Sterba, Qu Wenruo, Sasha Levin, clm, josef,
	linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

[ Upstream commit cda76788f8b0f7de3171100e3164ec1ce702292e ]

At close_ctree() after we have ran delayed iputs either explicitly through
calling btrfs_run_delayed_iputs() or later during the call to
btrfs_commit_super() or btrfs_error_commit_super(), we assert that the
delayed iputs list is empty.

We have (another) race where this assertion might fail because we have
queued an async write into the fs_info->workers workqueue. Here's how it
happens:

1) We are submitting a data bio for an inode that is not the data
   relocation inode, so we call btrfs_wq_submit_bio();

2) btrfs_wq_submit_bio() submits a work for the fs_info->workers queue
   that will run run_one_async_done();

3) We enter close_ctree(), flush several work queues except
   fs_info->workers, explicitly run delayed iputs with a call to
   btrfs_run_delayed_iputs() and then again shortly after by calling
   btrfs_commit_super() or btrfs_error_commit_super(), which also run
   delayed iputs;

4) run_one_async_done() is executed in the work queue, and because there
   was an IO error (bio->bi_status is not 0) it calls btrfs_bio_end_io(),
   which drops the final reference on the associated ordered extent by
   calling btrfs_put_ordered_extent() - and that adds a delayed iput for
   the inode;

5) At close_ctree() we find that after stopping the cleaner and
   transaction kthreads the delayed iputs list is not empty, failing the
   following assertion:

      ASSERT(list_empty(&fs_info->delayed_iputs));

Fix this by flushing the fs_info->workers workqueue before running delayed
iputs at close_ctree().

David reported this when running generic/648, which exercises IO error
paths by using the DM error table.

Reported-by: David Sterba <dsterba@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/disk-io.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e0e740e4d7c75..147c50ef912ac 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4282,6 +4282,19 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
 	 */
 	btrfs_flush_workqueue(fs_info->delalloc_workers);
 
+	/*
+	 * We can have ordered extents getting their last reference dropped from
+	 * the fs_info->workers queue because for async writes for data bios we
+	 * queue a work for that queue, at btrfs_wq_submit_bio(), that runs
+	 * run_one_async_done() which calls btrfs_bio_end_io() in case the bio
+	 * has an error, and that later function can do the final
+	 * btrfs_put_ordered_extent() on the ordered extent attached to the bio,
+	 * which adds a delayed iput for the inode. So we must flush the queue
+	 * so that we don't have delayed iputs after committing the current
+	 * transaction below and stopping the cleaner and transaction kthreads.
+	 */
+	btrfs_flush_workqueue(fs_info->workers);
+
 	/*
 	 * When finishing a compressed write bio we schedule a work queue item
 	 * to finish an ordered extent - btrfs_finish_compressed_write_work()
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 6.12 076/486] btrfs: properly limit inline data extent according to block size
       [not found] <20250505223922.2682012-1-sashal@kernel.org>
                   ` (3 preceding siblings ...)
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 075/486] btrfs: fix non-empty delayed iputs list on unmount due to async workers Sasha Levin
@ 2025-05-05 22:32 ` Sasha Levin
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 077/486] btrfs: prevent inline data extents read from touching blocks beyond its range Sasha Levin
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-05-05 22:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Qu Wenruo, David Sterba, Sasha Levin, clm, josef, linux-btrfs

From: Qu Wenruo <wqu@suse.com>

[ Upstream commit 23019d3e6617a8ec99a8d2f5947aa3dd8a74a1b8 ]

Btrfs utilizes inline data extent for the following cases:

- Regular small files
- Symlinks

And "btrfs check" detects any file extents that are too large as an
error.

It's not a problem for 4K block size, but for the incoming smaller
block sizes (2K), it can cause problems due to bad limits:

- Non-compressed inline data extents
  We do not allow a non-compressed inline data extent to be as large as
  block size.

- Symlinks
  Currently the only real limit on symlinks are 4K, which can be larger
  than 2K block size.

These will result btrfs-check to report too large file extents.

Fix it by adding proper size checks for the above cases.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/inode.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5b842276573e8..3dee0565ad21a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -623,6 +623,10 @@ static bool can_cow_file_range_inline(struct btrfs_inode *inode,
 	if (size > fs_info->sectorsize)
 		return false;
 
+	/* We do not allow a non-compressed extent to be as large as block size. */
+	if (data_len >= fs_info->sectorsize)
+		return false;
+
 	/* We cannot exceed the maximum inline data size. */
 	if (data_len > BTRFS_MAX_INLINE_DATA_SIZE(fs_info))
 		return false;
@@ -8683,7 +8687,12 @@ static int btrfs_symlink(struct mnt_idmap *idmap, struct inode *dir,
 	struct extent_buffer *leaf;
 
 	name_len = strlen(symname);
-	if (name_len > BTRFS_MAX_INLINE_DATA_SIZE(fs_info))
+	/*
+	 * Symlinks utilize uncompressed inline extent data, which should not
+	 * reach block size.
+	 */
+	if (name_len > BTRFS_MAX_INLINE_DATA_SIZE(fs_info) ||
+	    name_len >= fs_info->sectorsize)
 		return -ENAMETOOLONG;
 
 	inode = new_inode(dir->i_sb);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 6.12 077/486] btrfs: prevent inline data extents read from touching blocks beyond its range
       [not found] <20250505223922.2682012-1-sashal@kernel.org>
                   ` (4 preceding siblings ...)
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 076/486] btrfs: properly limit inline data extent according to block size Sasha Levin
@ 2025-05-05 22:32 ` Sasha Levin
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 078/486] btrfs: get zone unusable bytes while holding lock at btrfs_reclaim_bgs_work() Sasha Levin
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-05-05 22:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Qu Wenruo, Filipe Manana, David Sterba, Sasha Levin, clm, josef,
	linux-btrfs

From: Qu Wenruo <wqu@suse.com>

[ Upstream commit 1a5b5668d711d3d1ef447446beab920826decec3 ]

Currently reading an inline data extent will zero out the remaining
range in the page.

This is not yet causing problems even for block size < page size
(subpage) cases because:

1) An inline data extent always starts at file offset 0
   Meaning at page read, we always read the inline extent first, before
   any other blocks in the page. Then later blocks are properly read out
   and re-fill the zeroed out ranges.

2) Currently btrfs will read out the whole page if a buffered write is
   not page aligned
   So a page is either fully uptodate at buffered write time (covers the
   whole page), or we will read out the whole page first.
   Meaning there is nothing to lose for such an inline extent read.

But it's still not ideal:

- We're zeroing out the page twice
  Once done by read_inline_extent()/uncompress_inline(), once done by
  btrfs_do_readpage() for ranges beyond i_size.

- We're touching blocks that don't belong to the inline extent
  In the incoming patches, we can have a partial uptodate folio, of
  which some dirty blocks can exist while the page is not fully uptodate:

  The page size is 16K and block size is 4K:

  0         4K        8K        12K        16K
  |         |         |/////////|          |

  And range [8K, 12K) is dirtied by a buffered write, the remaining
  blocks are not uptodate.

  If range [0, 4K) contains an inline data extent, and we try to read
  the whole page, the current behavior will overwrite range [8K, 12K)
  with zero and cause data loss.

So to make the behavior more consistent and in preparation for future
changes, limit the inline data extents read to only zero out the range
inside the first block, not the whole page.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/inode.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3dee0565ad21a..19ed56d3638dc 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6817,6 +6817,7 @@ static noinline int uncompress_inline(struct btrfs_path *path,
 {
 	int ret;
 	struct extent_buffer *leaf = path->nodes[0];
+	const u32 blocksize = leaf->fs_info->sectorsize;
 	char *tmp;
 	size_t max_size;
 	unsigned long inline_size;
@@ -6833,7 +6834,7 @@ static noinline int uncompress_inline(struct btrfs_path *path,
 
 	read_extent_buffer(leaf, tmp, ptr, inline_size);
 
-	max_size = min_t(unsigned long, PAGE_SIZE, max_size);
+	max_size = min_t(unsigned long, blocksize, max_size);
 	ret = btrfs_decompress(compress_type, tmp, folio, 0, inline_size,
 			       max_size);
 
@@ -6845,8 +6846,8 @@ static noinline int uncompress_inline(struct btrfs_path *path,
 	 * cover that region here.
 	 */
 
-	if (max_size < PAGE_SIZE)
-		folio_zero_range(folio, max_size, PAGE_SIZE - max_size);
+	if (max_size < blocksize)
+		folio_zero_range(folio, max_size, blocksize - max_size);
 	kfree(tmp);
 	return ret;
 }
@@ -6854,6 +6855,7 @@ static noinline int uncompress_inline(struct btrfs_path *path,
 static int read_inline_extent(struct btrfs_inode *inode, struct btrfs_path *path,
 			      struct folio *folio)
 {
+	const u32 blocksize = path->nodes[0]->fs_info->sectorsize;
 	struct btrfs_file_extent_item *fi;
 	void *kaddr;
 	size_t copy_size;
@@ -6868,14 +6870,14 @@ static int read_inline_extent(struct btrfs_inode *inode, struct btrfs_path *path
 	if (btrfs_file_extent_compression(path->nodes[0], fi) != BTRFS_COMPRESS_NONE)
 		return uncompress_inline(path, folio, fi);
 
-	copy_size = min_t(u64, PAGE_SIZE,
+	copy_size = min_t(u64, blocksize,
 			  btrfs_file_extent_ram_bytes(path->nodes[0], fi));
 	kaddr = kmap_local_folio(folio, 0);
 	read_extent_buffer(path->nodes[0], kaddr,
 			   btrfs_file_extent_inline_start(fi), copy_size);
 	kunmap_local(kaddr);
-	if (copy_size < PAGE_SIZE)
-		folio_zero_range(folio, copy_size, PAGE_SIZE - copy_size);
+	if (copy_size < blocksize)
+		folio_zero_range(folio, copy_size, blocksize - copy_size);
 	return 0;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 6.12 078/486] btrfs: get zone unusable bytes while holding lock at btrfs_reclaim_bgs_work()
       [not found] <20250505223922.2682012-1-sashal@kernel.org>
                   ` (5 preceding siblings ...)
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 077/486] btrfs: prevent inline data extents read from touching blocks beyond its range Sasha Levin
@ 2025-05-05 22:32 ` Sasha Levin
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 079/486] btrfs: send: return -ENAMETOOLONG when attempting a path that is too long Sasha Levin
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 080/486] btrfs: zoned: exit btrfs_can_activate_zone if BTRFS_FS_NEED_ZONE_FINISH is set Sasha Levin
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-05-05 22:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Filipe Manana, Johannes Thumshirn, David Sterba, Sasha Levin, clm,
	josef, linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

[ Upstream commit 1283b8c125a83bf7a7dbe90c33d3472b6d7bf612 ]

At btrfs_reclaim_bgs_work(), we are grabbing a block group's zone unusable
bytes while not under the protection of the block group's spinlock, so
this can trigger race reports from KCSAN (or similar tools) since that
field is typically updated while holding the lock, such as at
__btrfs_add_free_space_zoned() for example.

Fix this by grabbing the zone unusable bytes while we are still in the
critical section holding the block group's spinlock, which is right above
where we are currently grabbing it.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/block-group.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 4423d8b716a58..aa8656c8b7e7e 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1891,6 +1891,17 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
 			up_write(&space_info->groups_sem);
 			goto next;
 		}
+
+		/*
+		 * Cache the zone_unusable value before turning the block group
+		 * to read only. As soon as the block group is read only it's
+		 * zone_unusable value gets moved to the block group's read-only
+		 * bytes and isn't available for calculations anymore. We also
+		 * cache it before unlocking the block group, to prevent races
+		 * (reports from KCSAN and such tools) with tasks updating it.
+		 */
+		zone_unusable = bg->zone_unusable;
+
 		spin_unlock(&bg->lock);
 		spin_unlock(&space_info->lock);
 
@@ -1907,13 +1918,6 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
 			goto next;
 		}
 
-		/*
-		 * Cache the zone_unusable value before turning the block group
-		 * to read only. As soon as the blog group is read only it's
-		 * zone_unusable value gets moved to the block group's read-only
-		 * bytes and isn't available for calculations anymore.
-		 */
-		zone_unusable = bg->zone_unusable;
 		ret = inc_block_group_ro(bg, 0);
 		up_write(&space_info->groups_sem);
 		if (ret < 0)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 6.12 079/486] btrfs: send: return -ENAMETOOLONG when attempting a path that is too long
       [not found] <20250505223922.2682012-1-sashal@kernel.org>
                   ` (6 preceding siblings ...)
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 078/486] btrfs: get zone unusable bytes while holding lock at btrfs_reclaim_bgs_work() Sasha Levin
@ 2025-05-05 22:32 ` Sasha Levin
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 080/486] btrfs: zoned: exit btrfs_can_activate_zone if BTRFS_FS_NEED_ZONE_FINISH is set Sasha Levin
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-05-05 22:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Filipe Manana, David Sterba, Sasha Levin, clm, josef, linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

[ Upstream commit a77749b3e21813566cea050bbb3414ae74562eba ]

When attempting to build a too long path we are currently returning
-ENOMEM, which is very odd and misleading. So update fs_path_ensure_buf()
to return -ENAMETOOLONG instead. Also, while at it, move the WARN_ON()
into the if statement's expression, as it makes it clear what is being
tested and also has the effect of adding 'unlikely' to the statement,
which allows the compiler to generate better code as this condition is
never expected to happen.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/send.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index b1015f383f75e..c843b4aefb8ac 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -487,10 +487,8 @@ static int fs_path_ensure_buf(struct fs_path *p, int len)
 	if (p->buf_len >= len)
 		return 0;
 
-	if (len > PATH_MAX) {
-		WARN_ON(1);
-		return -ENOMEM;
-	}
+	if (WARN_ON(len > PATH_MAX))
+		return -ENAMETOOLONG;
 
 	path_len = p->end - p->start;
 	old_buf_len = p->buf_len;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH AUTOSEL 6.12 080/486] btrfs: zoned: exit btrfs_can_activate_zone if BTRFS_FS_NEED_ZONE_FINISH is set
       [not found] <20250505223922.2682012-1-sashal@kernel.org>
                   ` (7 preceding siblings ...)
  2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 079/486] btrfs: send: return -ENAMETOOLONG when attempting a path that is too long Sasha Levin
@ 2025-05-05 22:32 ` Sasha Levin
  8 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2025-05-05 22:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Johannes Thumshirn, Naohiro Aota, David Sterba, Sasha Levin, clm,
	josef, linux-btrfs

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

[ Upstream commit 26b38e28162ef4ceb1e0482299820fbbd7dbcd92 ]

If BTRFS_FS_NEED_ZONE_FINISH is already set for the whole filesystem, exit
early in btrfs_can_activate_zone(). There's no need to check if
BTRFS_FS_NEED_ZONE_FINISH needs to be set if it is already set.

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/zoned.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 2603c9d60fd21..d1167aeb07354 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -2326,6 +2326,9 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags)
 	if (!btrfs_is_zoned(fs_info))
 		return true;
 
+	if (test_bit(BTRFS_FS_NEED_ZONE_FINISH, &fs_info->flags))
+		return false;
+
 	/* Check if there is a device with active zones left */
 	mutex_lock(&fs_info->chunk_mutex);
 	spin_lock(&fs_info->zone_active_bgs_lock);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-05-05 22:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250505223922.2682012-1-sashal@kernel.org>
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 072/486] btrfs: make btrfs_discard_workfn() block_group ref explicit Sasha Levin
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 073/486] btrfs: avoid linker error in btrfs_find_create_tree_block() Sasha Levin
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 074/486] btrfs: run btrfs_error_commit_super() early Sasha Levin
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 075/486] btrfs: fix non-empty delayed iputs list on unmount due to async workers Sasha Levin
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 076/486] btrfs: properly limit inline data extent according to block size Sasha Levin
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 077/486] btrfs: prevent inline data extents read from touching blocks beyond its range Sasha Levin
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 078/486] btrfs: get zone unusable bytes while holding lock at btrfs_reclaim_bgs_work() Sasha Levin
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 079/486] btrfs: send: return -ENAMETOOLONG when attempting a path that is too long Sasha Levin
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 080/486] btrfs: zoned: exit btrfs_can_activate_zone if BTRFS_FS_NEED_ZONE_FINISH is set Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox