[PATCH v2 0/3] btrfs: unbalanced disks aware per-profile available space estimation

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/3] btrfs: unbalanced disks aware per-profile available space estimation
@ 2026-02-04  2:54 Qu Wenruo
  2026-02-04  2:54 ` [PATCH v2 1/3] btrfs: introduce the device layout aware per-profile available space Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Qu Wenruo @ 2026-02-04  2:54 UTC (permalink / raw)
  To: linux-btrfs

[CHANGELOG]
v2:
- Various grammar fixes

- Fix a u64 division compiling error on ppc64
  Which requires the dedicated div_u64() helper.

- Ignore unallocated space that's too small
  If the unallocated space is not enough to even cover a single stripe
  (64K), don't utilize it.
  This makes the behavior more aligned to the chunk allocator, and
  prevent over-estimation.

- Use U64_MAX to mark the per-profile estimation as unavailable
  This reduce the memory usage by one unsigned long.

- Update the commit message of the 2nd patch
  To include the overhead (runtime of btrfs_update_per_profile_avail())
  in the commit message.

- Minor comment cleanup on the term "balloon"
  The old term "balloon" is no longer utilized and there is a typo.
  ("ballon" -> "balloon").

- Update the estimation examples in the first patch
  As we allows 2 disks raid5 and 3 disks raid6.

v1:
- Revive from the v5.9 era fix

- Make btrfs_update_per_profile_avail() to not return error
  Instead just mark all profiles as unavailable, and
  btrfs_get_per_profile_avail() will return false.

  The caller will need to fallback to the existing factor based
  estimation.

  This greatly simplified the error handling, which is a pain point in
  the original series.

- Remove a lot of refactor/cleanup
  As that's already done in upstream.

- Only make calc_available_free_space() to use the new infrastructure
  That's the main goal, fix can_over_commit().
  Further enhancement can be done later.

There is a long known bug that if metadata is using RAID1 on two
unbalanced disks, btrfs have a very high chance to hit -ENOSPC during
critical paths and flips RO.

The bug dates back to v5.9 (where my last updates ends) and the most
recent bug report is from Christoph.

The idea to fix it is always here, by providing a chunk-allocator-like
available space estimation.
It doesn't need to be as heavy as chunk allocator, but at least it
should not over-estimate.

The demon is always in the details, the previous v5.9 era series
requires a lot of changes in error handling, because the
btrfs_update_per_profile_avail() can fail at critical paths in chunk
allocation/removal and device grow/shrink/add/removal.

But this time that function will no longer fail, but just mark
per-profile available estimation as unreliable, and let the caller to
fallback to the old factor based solution.

In the real world it should not be a big deal, as the only error is
-ENOMEM, but this greatly simplifies the error handling.

Qu Wenruo (3):
  btrfs: introduce the device layout aware per-profile available space
  btrfs: update per-profile available estimation
  btrfs: use per-profile available space in calc_available_free_space()

 fs/btrfs/space-info.c |  27 ++++---
 fs/btrfs/volumes.c    | 183 +++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/volumes.h    |  34 ++++++++
 3 files changed, 231 insertions(+), 13 deletions(-)

-- 
2.52.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/3] btrfs: introduce the device layout aware per-profile available space
  2026-02-04  2:54 [PATCH v2 0/3] btrfs: unbalanced disks aware per-profile available space estimation Qu Wenruo
@ 2026-02-04  2:54 ` Qu Wenruo
  2026-02-04 15:41   ` Filipe Manana
  2026-02-08 15:59   ` Chris Mason
  2026-02-04  2:54 ` [PATCH v2 2/3] btrfs: update per-profile available estimation Qu Wenruo
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 10+ messages in thread
From: Qu Wenruo @ 2026-02-04  2:54 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
There is a long known bug that if metadata is using RAID1 on two disks
with unbalanced sizes, there is a very high chance to hit ENOSPC related
transaction abort.

[CAUSE]
The root cause is in the available space estimation code:

- Factor based calculation
  Just use all unallocated space, divide by the profile factor
  One obvious user is can_overcommit().

This can not handle the following example:

  devid 1 unallocated:	1GiB
  devid 2 unallocated:	50GiB
  metadata type:	RAID1

If using factor based estimation, we can use (1GiB + 50GiB) / 2 = 25.5GiB
free space for metadata.
Thus we can continue allocating metadata (over-commit) way beyond the
1GiB limit.

But this estimation is completely wrong, in reality we can only allocate
one single 1GiB RAID1 block group, thus if we continue over-commit, at
one time we will hit ENOSPC at some critical path and flips the fs
read-only.

[SOLUTION]
This patch will introduce per-profile available space estimation,
which can provide chunk-allocator like behavior to give a (mostly)
accurate result, with under-estimate corner cases.

There are some differences between the estimation and real chunk
allocator:

- No consideration on hole size
  It's fine for most cases, as all data/metadata strips are in 1GiB size
  thus there should not be any hole wasting much space.

  And chunk allocator is able to use smaller stripes when there is
  really no other choice.

  Although in theory this means it can lead to some over-estimation, it
  should not cause too much hassle in the real world.

  The other benefit of such behavior is, we avoid dev-extent tree search
  completely, thus the overhead is very small.

- No true balance for certain cases
  If we have 3 disks RAID1, and each device has 2GiB unallocated space,
  we can load balance the chunk allocation so that we can allocate 3GiB
  RAID1 chunks, and that's what chunk allocator will do.

  But this current estimation code is using the largest available space
  to do a single allocation. Meaning the estimation will be 2GiB, thus
  under estimate.

  Such under estimation is fine and after the first chunk allocation, the
  estimation will be updated and still give a correct 2GiB
  estimation.
  So this only means the estimation will be a little conservative, which
  is safer for call sites like metadata over-commit check.

With that facility, for above 1GiB + 50GiB case, it will give a RAID1
estimation of 1GiB, instead of the incorrect 25.5GiB.

Or for a more complex example:
  devid 1 unallocated:	1T
  devid 2 unallocated:  1T
  devid 3 unallocated:	10T

We will get an array of:
  RAID10:	2T
  RAID1:	2T
  RAID1C3:	1T
  RAID1C4:	0  (not enough devices)
  DUP:		6T
  RAID0:	3T
  SINGLE:	12T
  RAID5:	2T
  RAID6:	1T

[IMPLEMENTATION]
And for the each profile , we go chunk allocator level calculation:
The pseudo code looks like:

  clear_virtual_used_space_of_all_rw_devices();
  do {
  	/*
  	 * The same as chunk allocator, despite used space,
  	 * we also take virtual used space into consideration.
  	 */
  	sort_device_with_virtual_free_space();

  	/*
  	 * Unlike chunk allocator, we don't need to bother hole/stripe
  	 * size, so we use the smallest device to make sure we can
  	 * allocated as many stripes as regular chunk allocator
  	 */
  	stripe_size = device_with_smallest_free->avail_space;
	stripe_size = min(stripe_size, to_alloc / ndevs);

  	/*
  	 * Allocate a virtual chunk, allocated virtual chunk will
  	 * increase virtual used space, allow next iteration to
  	 * properly emulate chunk allocator behavior.
  	 */
  	ret = alloc_virtual_chunk(stripe_size, &allocated_size);
  	if (ret == 0)
  		avail += allocated_size;
  } while (ret == 0)

This minimal available space based calculation is not perfect, but the
important part is, the estimation is never exceeding the real available
space.

This patch just introduces the infrastructure, no hooks are executed
yet.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 163 +++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  34 ++++++++++
 2 files changed, 197 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f281d113519b..a28e7400e8dc 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5372,6 +5372,169 @@ static int btrfs_cmp_device_info(const void *a, const void *b)
 	return 0;
 }
 
+/*
+ * Return 0 if we allocated any virtual(*) chunk, and restore the size to
+ * @allocated.
+ * Return -ENOSPC if we have no more space to allocate virtual chunk
+ *
+ * *: A virtual chunk is a chunk that only exists during per-profile available
+ *    estimation.
+ *    Those numbers won't really take on-disk space, but only to emulate
+ *    chunk allocator behavior to get accurate estimation on available space.
+ *
+ *    Another different is, a virtual chunk has no size limit and doesn't care
+ *    about the hole size in device tree, allowing us to exhause device space
+ *    much faster.
+ */
+static int alloc_virtual_chunk(struct btrfs_fs_info *fs_info,
+			       struct btrfs_device_info *devices_info,
+			       enum btrfs_raid_types type,
+			       u64 *allocated)
+{
+	const struct btrfs_raid_attr *raid_attr = &btrfs_raid_array[type];
+	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
+	struct btrfs_device *device;
+	u64 stripe_size;
+	int ndevs = 0;
+
+	lockdep_assert_held(&fs_info->chunk_mutex);
+
+	/* Go through devices to collect their unallocated space. */
+	list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) {
+		u64 avail;
+
+		if (!test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
+					&device->dev_state) ||
+		    test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state))
+			continue;
+
+		if (device->total_bytes > device->bytes_used +
+				device->per_profile_allocated)
+			avail = device->total_bytes - device->bytes_used -
+				device->per_profile_allocated;
+		else
+			avail = 0;
+
+		avail = round_down(avail, fs_info->sectorsize);
+
+		/* And exclude the [0, 1M) reserved space. */
+		if (avail > BTRFS_DEVICE_RANGE_RESERVED)
+			avail -= BTRFS_DEVICE_RANGE_RESERVED;
+		else
+			avail = 0;
+
+		/*
+		 * Not enough to support a single stripe, this device
+		 * can not be utilized for chunk allocation.
+		 */
+		if (avail < BTRFS_STRIPE_LEN)
+			continue;
+
+		/*
+		 * Unlike chunk allocator, we don't care about stripe or hole
+		 * size, so here we use @avail directly.
+		 */
+		devices_info[ndevs].dev_offset = 0;
+		devices_info[ndevs].total_avail = avail;
+		devices_info[ndevs].max_avail = avail;
+		devices_info[ndevs].dev = device;
+		++ndevs;
+	}
+	sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
+	     btrfs_cmp_device_info, NULL);
+	ndevs = rounddown(ndevs, raid_attr->devs_increment);
+	if (ndevs < raid_attr->devs_min)
+		return -ENOSPC;
+	if (raid_attr->devs_max)
+		ndevs = min(ndevs, (int)raid_attr->devs_max);
+	else
+		ndevs = min(ndevs, (int)BTRFS_MAX_DEVS(fs_info));
+
+	/*
+	 * Stripe size will be determined by the device with the least
+	 * unallocated space.
+	 */
+	stripe_size = devices_info[ndevs - 1].total_avail;
+
+	for (int i = 0; i < ndevs; i++)
+		devices_info[i].dev->per_profile_allocated += stripe_size;
+	*allocated = div_u64(stripe_size * (ndevs - raid_attr->nparity),
+			     raid_attr->ncopies);
+	return 0;
+}
+
+static int calc_one_profile_avail(struct btrfs_fs_info *fs_info,
+				  enum btrfs_raid_types type,
+				  u64 *result_ret)
+{
+	struct btrfs_device_info *devices_info = NULL;
+	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
+	struct btrfs_device *device;
+	u64 allocated;
+	u64 result = 0;
+	int ret = 0;
+
+	lockdep_assert_held(&fs_info->chunk_mutex);
+	ASSERT(type >= 0 && type < BTRFS_NR_RAID_TYPES);
+
+	/* Not enough devices, quick exit, just update the result. */
+	if (fs_devices->rw_devices < btrfs_raid_array[type].devs_min) {
+		ret = -ENOSPC;
+		goto out;
+	}
+
+	devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info),
+			       GFP_NOFS);
+	if (!devices_info) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	/* Clear virtual chunk used space for each device. */
+	list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list)
+		device->per_profile_allocated = 0;
+
+	while (!alloc_virtual_chunk(fs_info, devices_info, type, &allocated))
+		result += allocated;
+
+out:
+	kfree(devices_info);
+	if (ret < 0 && ret != -ENOSPC)
+		return ret;
+	*result_ret = result;
+	return 0;
+}
+
+/* Update the per-profile available space array. */
+void btrfs_update_per_profile_avail(struct btrfs_fs_info *fs_info)
+{
+	u64 results[BTRFS_NR_RAID_TYPES];
+	int ret;
+
+	/*
+	 * Zoned is more complex as we can not simply get the amount of
+	 * available space for each device.
+	 */
+	if (btrfs_is_zoned(fs_info))
+		goto error;
+
+	for (int i = 0; i < BTRFS_NR_RAID_TYPES; i++) {
+		ret = calc_one_profile_avail(fs_info, i, &results[i]);
+		if (ret < 0)
+			goto error;
+	}
+
+	spin_lock(&fs_info->fs_devices->per_profile_lock);
+	for (int i = 0; i < BTRFS_NR_RAID_TYPES; i++)
+		fs_info->fs_devices->per_profile_avail[i] = results[i];
+	spin_unlock(&fs_info->fs_devices->per_profile_lock);
+	return;
+error:
+	spin_lock(&fs_info->fs_devices->per_profile_lock);
+	for (int i = 0; i < BTRFS_NR_RAID_TYPES; i++)
+		fs_info->fs_devices->per_profile_avail[i] = U64_MAX;
+	spin_unlock(&fs_info->fs_devices->per_profile_lock);
+}
+
 static void check_raid56_incompat_flag(struct btrfs_fs_info *info, u64 type)
 {
 	if (!(type & BTRFS_BLOCK_GROUP_RAID56_MASK))
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index ebc85bf53ee7..3dde32143058 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -22,6 +22,7 @@
 #include <uapi/linux/btrfs_tree.h>
 #include "messages.h"
 #include "extent-io-tree.h"
+#include "fs.h"
 
 struct block_device;
 struct bdev_handle;
@@ -213,6 +214,12 @@ struct btrfs_device {
 
 	/* Bandwidth limit for scrub, in bytes */
 	u64 scrub_speed_max;
+
+	/*
+	 * A temporary number of allocated space during per-profile
+	 * available space calculation.
+	 */
+	u64 per_profile_allocated;
 };
 
 /*
@@ -458,6 +465,15 @@ struct btrfs_fs_devices {
 	/* Device to be used for reading in case of RAID1. */
 	u64 read_devid;
 #endif
+
+	/*
+	 * Each value indicates the available space for that profile.
+	 * U64_MAX means the estimation is unavailable.
+	 *
+	 * Protected by per_profile_lock;
+	 */
+	u64 per_profile_avail[BTRFS_NR_RAID_TYPES];
+	spinlock_t per_profile_lock;
 };
 
 #define BTRFS_MAX_DEVS(info) ((BTRFS_MAX_ITEM_SIZE(info)	\
@@ -886,6 +902,24 @@ int btrfs_bg_type_to_factor(u64 flags);
 const char *btrfs_bg_type_to_raid_name(u64 flags);
 int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info);
 bool btrfs_verify_dev_items(const struct btrfs_fs_info *fs_info);
+void btrfs_update_per_profile_avail(struct btrfs_fs_info *fs_info);
+
+static inline bool btrfs_get_per_profile_avail(struct btrfs_fs_info *fs_info,
+					       u64 profile, u64 *avail_ret)
+{
+	enum btrfs_raid_types index = btrfs_bg_flags_to_raid_index(profile);
+	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
+	bool uptodate = false;
+
+	spin_lock(&fs_devices->per_profile_lock);
+	if (fs_devices->per_profile_avail[index] != U64_MAX) {
+		uptodate = true;
+		*avail_ret = fs_devices->per_profile_avail[index];
+	}
+	spin_unlock(&fs_info->fs_devices->per_profile_lock);
+	return uptodate;
+}
+
 bool btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical);
 
 bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/3] btrfs: update per-profile available estimation
  2026-02-04  2:54 [PATCH v2 0/3] btrfs: unbalanced disks aware per-profile available space estimation Qu Wenruo
  2026-02-04  2:54 ` [PATCH v2 1/3] btrfs: introduce the device layout aware per-profile available space Qu Wenruo
@ 2026-02-04  2:54 ` Qu Wenruo
  2026-02-13  1:15   ` kernel test robot
  2026-02-04  2:54 ` [PATCH v2 3/3] btrfs: use per-profile available space in calc_available_free_space() Qu Wenruo
  2026-02-04 15:42 ` [PATCH v2 0/3] btrfs: unbalanced disks aware per-profile available space estimation Filipe Manana
  3 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2026-02-04  2:54 UTC (permalink / raw)
  To: linux-btrfs

This involves the following timing:

- Chunk allocation
- Chunk removal
- After Mount
- New device
- Device removal
- Device shrink
- Device enlarge

And since the function btrfs_update_per_profile_avail() will not return
an error, this won't cause new error handling path.

Although when btrfs_update_per_profile_avail() failed (only ENOSPC
possible) it will mark the per-profile available estimation as
unreliable, so later btrfs_get_per_profile_avail() will return false and
require the caller to have a fallback solution.

The function btrfs_update_per_profile_avail() will be executed with
chunk_mutex hold, thus it will slightly slow down those involved
functions, but not a lot.

As all the core workload is just various u64 calculations inside a loop,
without any tree search, the overhead should be acceptable even for all
supported 9 profiles.

For 4 disks (which exercises all 9 profiles), the execution time of that
function will still be less than 10 us.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a28e7400e8dc..af21af777110 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2339,6 +2339,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info,
 		mutex_lock(&fs_info->chunk_mutex);
 		list_del_init(&device->dev_alloc_list);
 		device->fs_devices->rw_devices--;
+		btrfs_update_per_profile_avail(fs_info);
 		mutex_unlock(&fs_info->chunk_mutex);
 	}
 
@@ -2450,6 +2451,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info,
 		list_add(&device->dev_alloc_list,
 			 &fs_devices->alloc_list);
 		device->fs_devices->rw_devices++;
+		btrfs_update_per_profile_avail(fs_info);
 		mutex_unlock(&fs_info->chunk_mutex);
 	}
 	return ret;
@@ -2937,6 +2939,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	 */
 	btrfs_clear_space_info_full(fs_info);
 
+	btrfs_update_per_profile_avail(fs_info);
 	mutex_unlock(&fs_info->chunk_mutex);
 
 	/* Add sysfs device entry */
@@ -2947,6 +2950,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	if (seeding_dev) {
 		mutex_lock(&fs_info->chunk_mutex);
 		ret = init_first_rw_device(trans);
+		btrfs_update_per_profile_avail(fs_info);
 		mutex_unlock(&fs_info->chunk_mutex);
 		if (unlikely(ret)) {
 			btrfs_abort_transaction(trans, ret);
@@ -3029,6 +3033,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 				    orig_super_total_bytes);
 	btrfs_set_super_num_devices(fs_info->super_copy,
 				    orig_super_num_devices);
+	btrfs_update_per_profile_avail(fs_info);
 	mutex_unlock(&fs_info->chunk_mutex);
 	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
 error_trans:
@@ -3121,6 +3126,7 @@ int btrfs_grow_device(struct btrfs_trans_handle *trans,
 	if (list_empty(&device->post_commit_list))
 		list_add_tail(&device->post_commit_list,
 			      &trans->transaction->dev_update_list);
+	btrfs_update_per_profile_avail(fs_info);
 	mutex_unlock(&fs_info->chunk_mutex);
 
 	btrfs_reserve_chunk_metadata(trans, false);
@@ -3497,6 +3503,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
 		}
 	}
 
+	btrfs_update_per_profile_avail(fs_info);
 	mutex_unlock(&fs_info->chunk_mutex);
 	trans->removing_chunk = false;
 
@@ -5185,6 +5192,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 		atomic64_sub(free_diff, &fs_info->free_chunk_space);
 	}
 
+	btrfs_update_per_profile_avail(fs_info);
 	/*
 	 * Once the device's size has been set to the new size, ensure all
 	 * in-memory chunks are synced to disk so that the loop below sees them
@@ -5300,6 +5308,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	WARN_ON(diff > old_total);
 	btrfs_set_super_total_bytes(super_copy,
 			round_down(old_total - diff, fs_info->sectorsize));
+	btrfs_update_per_profile_avail(fs_info);
 	mutex_unlock(&fs_info->chunk_mutex);
 
 	btrfs_reserve_chunk_metadata(trans, false);
@@ -6012,6 +6021,8 @@ static struct btrfs_block_group *create_chunk(struct btrfs_trans_handle *trans,
 	check_raid56_incompat_flag(info, type);
 	check_raid1c34_incompat_flag(info, type);
 
+	btrfs_update_per_profile_avail(info);
+
 	return block_group;
 }
 
@@ -8584,7 +8595,14 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
 	}
 
 	/* Ensure all chunks have corresponding dev extents */
-	return verify_chunk_dev_extent_mapping(fs_info);
+	ret = verify_chunk_dev_extent_mapping(fs_info);
+	if (ret < 0)
+		return ret;
+
+	mutex_lock(&fs_info->chunk_mutex);
+	btrfs_update_per_profile_avail(fs_info);
+	mutex_unlock(&fs_info->chunk_mutex);
+	return 0;
 }
 
 /*
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/3] btrfs: use per-profile available space in calc_available_free_space()
  2026-02-04  2:54 [PATCH v2 0/3] btrfs: unbalanced disks aware per-profile available space estimation Qu Wenruo
  2026-02-04  2:54 ` [PATCH v2 1/3] btrfs: introduce the device layout aware per-profile available space Qu Wenruo
  2026-02-04  2:54 ` [PATCH v2 2/3] btrfs: update per-profile available estimation Qu Wenruo
@ 2026-02-04  2:54 ` Qu Wenruo
  2026-02-04 15:42 ` [PATCH v2 0/3] btrfs: unbalanced disks aware per-profile available space estimation Filipe Manana
  3 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2026-02-04  2:54 UTC (permalink / raw)
  To: linux-btrfs

For the following disk layout, can_overcommit() can cause false
confidence in available space:

  devid 1 unallocated:	1GiB
  devid 2 unallocated:	50GiB
  metadata type:	RAID1

As can_overcommit() simply uses unallocated space with factor to
calculate the allocatable metadata chunk size, resulting 25.5GiB
available space.

But in reality we can only allocate one 1GiB RAID1 chunk, the remaining
49GiB on devid 2 will never be utilized to fulfill a RAID1 chunk.

This leads to various ENOSPC related transaction abort and flips the fs
read-only.

Now use per-profile available space in calc_available_free_space(), and
only when that failed we fall back to the old factor based estimation.

And for zoned devices or for the very low chance of temporary memory
allocation failure, we will still fallback to factor based estimation.
But I hope in reality it's very rare.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/space-info.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index bb5aac7ee9d2..78b771d656b9 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -444,6 +444,7 @@ static u64 calc_available_free_space(const struct btrfs_space_info *space_info,
 				     enum btrfs_reserve_flush_enum flush)
 {
 	struct btrfs_fs_info *fs_info = space_info->fs_info;
+	bool has_per_profile;
 	u64 profile;
 	u64 avail;
 	u64 data_chunk_size;
@@ -454,19 +455,21 @@ static u64 calc_available_free_space(const struct btrfs_space_info *space_info,
 	else
 		profile = btrfs_metadata_alloc_profile(fs_info);
 
-	avail = atomic64_read(&fs_info->free_chunk_space);
-
-	/*
-	 * If we have dup, raid1 or raid10 then only half of the free
-	 * space is actually usable.  For raid56, the space info used
-	 * doesn't include the parity drive, so we don't have to
-	 * change the math
-	 */
-	factor = btrfs_bg_type_to_factor(profile);
-	avail = div_u64(avail, factor);
-	if (avail == 0)
-		return 0;
+	has_per_profile = btrfs_get_per_profile_avail(fs_info, profile, &avail);
+	if (!has_per_profile) {
+		avail = atomic64_read(&fs_info->free_chunk_space);
 
+		/*
+		 * If we have dup, raid1 or raid10 then only half of the free
+		 * space is actually usable.  For raid56, the space info used
+		 * doesn't include the parity drive, so we don't have to
+		 * change the math
+		 */
+		factor = btrfs_bg_type_to_factor(profile);
+		avail = div_u64(avail, factor);
+		if (avail == 0)
+			return 0;
+	}
 	data_chunk_size = calc_effective_data_chunk_size(fs_info);
 
 	/*
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] btrfs: introduce the device layout aware per-profile available space
  2026-02-04  2:54 ` [PATCH v2 1/3] btrfs: introduce the device layout aware per-profile available space Qu Wenruo
@ 2026-02-04 15:41   ` Filipe Manana
  2026-02-08 15:59   ` Chris Mason
  1 sibling, 0 replies; 10+ messages in thread
From: Filipe Manana @ 2026-02-04 15:41 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Feb 4, 2026 at 2:55 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> There is a long known bug that if metadata is using RAID1 on two disks
> with unbalanced sizes, there is a very high chance to hit ENOSPC related
> transaction abort.
>
> [CAUSE]
> The root cause is in the available space estimation code:
>
> - Factor based calculation
>   Just use all unallocated space, divide by the profile factor
>   One obvious user is can_overcommit().
>
> This can not handle the following example:
>
>   devid 1 unallocated:  1GiB
>   devid 2 unallocated:  50GiB
>   metadata type:        RAID1
>
> If using factor based estimation, we can use (1GiB + 50GiB) / 2 = 25.5GiB
> free space for metadata.
> Thus we can continue allocating metadata (over-commit) way beyond the
> 1GiB limit.
>
> But this estimation is completely wrong, in reality we can only allocate
> one single 1GiB RAID1 block group, thus if we continue over-commit, at
> one time we will hit ENOSPC at some critical path and flips the fs
> read-only.
>
> [SOLUTION]
> This patch will introduce per-profile available space estimation,
> which can provide chunk-allocator like behavior to give a (mostly)
> accurate result, with under-estimate corner cases.
>
> There are some differences between the estimation and real chunk
> allocator:
>
> - No consideration on hole size
>   It's fine for most cases, as all data/metadata strips are in 1GiB size
>   thus there should not be any hole wasting much space.
>
>   And chunk allocator is able to use smaller stripes when there is
>   really no other choice.
>
>   Although in theory this means it can lead to some over-estimation, it
>   should not cause too much hassle in the real world.
>
>   The other benefit of such behavior is, we avoid dev-extent tree search
>   completely, thus the overhead is very small.
>
> - No true balance for certain cases
>   If we have 3 disks RAID1, and each device has 2GiB unallocated space,
>   we can load balance the chunk allocation so that we can allocate 3GiB
>   RAID1 chunks, and that's what chunk allocator will do.
>
>   But this current estimation code is using the largest available space
>   to do a single allocation. Meaning the estimation will be 2GiB, thus
>   under estimate.
>
>   Such under estimation is fine and after the first chunk allocation, the
>   estimation will be updated and still give a correct 2GiB
>   estimation.
>   So this only means the estimation will be a little conservative, which
>   is safer for call sites like metadata over-commit check.
>
> With that facility, for above 1GiB + 50GiB case, it will give a RAID1
> estimation of 1GiB, instead of the incorrect 25.5GiB.
>
> Or for a more complex example:
>   devid 1 unallocated:  1T
>   devid 2 unallocated:  1T
>   devid 3 unallocated:  10T
>
> We will get an array of:
>   RAID10:       2T
>   RAID1:        2T
>   RAID1C3:      1T
>   RAID1C4:      0  (not enough devices)
>   DUP:          6T
>   RAID0:        3T
>   SINGLE:       12T
>   RAID5:        2T
>   RAID6:        1T
>
> [IMPLEMENTATION]
> And for the each profile , we go chunk allocator level calculation:
> The pseudo code looks like:
>
>   clear_virtual_used_space_of_all_rw_devices();
>   do {
>         /*
>          * The same as chunk allocator, despite used space,
>          * we also take virtual used space into consideration.
>          */
>         sort_device_with_virtual_free_space();
>
>         /*
>          * Unlike chunk allocator, we don't need to bother hole/stripe
>          * size, so we use the smallest device to make sure we can
>          * allocated as many stripes as regular chunk allocator
>          */
>         stripe_size = device_with_smallest_free->avail_space;
>         stripe_size = min(stripe_size, to_alloc / ndevs);
>
>         /*
>          * Allocate a virtual chunk, allocated virtual chunk will
>          * increase virtual used space, allow next iteration to
>          * properly emulate chunk allocator behavior.
>          */
>         ret = alloc_virtual_chunk(stripe_size, &allocated_size);
>         if (ret == 0)
>                 avail += allocated_size;
>   } while (ret == 0)
>
> This minimal available space based calculation is not perfect, but the
> important part is, the estimation is never exceeding the real available
> space.
>
> This patch just introduces the infrastructure, no hooks are executed
> yet.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/volumes.c | 163 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/volumes.h |  34 ++++++++++
>  2 files changed, 197 insertions(+)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index f281d113519b..a28e7400e8dc 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5372,6 +5372,169 @@ static int btrfs_cmp_device_info(const void *a, const void *b)
>         return 0;
>  }
>
> +/*
> + * Return 0 if we allocated any virtual(*) chunk, and restore the size to
> + * @allocated.
> + * Return -ENOSPC if we have no more space to allocate virtual chunk
> + *
> + * *: A virtual chunk is a chunk that only exists during per-profile available
> + *    estimation.
> + *    Those numbers won't really take on-disk space, but only to emulate
> + *    chunk allocator behavior to get accurate estimation on available space.
> + *
> + *    Another different is, a virtual chunk has no size limit and doesn't care

different -> difference

> + *    about the hole size in device tree, allowing us to exhause device space

just say "doesn't care about holes in the device tree".

exhause -> exhaust

Otherwise it looks fine, thanks.

Reviewed-by: Filipe Manana <fdmanana@suse.com>

> + *    much faster.
> + */
> +static int alloc_virtual_chunk(struct btrfs_fs_info *fs_info,
> +                              struct btrfs_device_info *devices_info,
> +                              enum btrfs_raid_types type,
> +                              u64 *allocated)
> +{
> +       const struct btrfs_raid_attr *raid_attr = &btrfs_raid_array[type];
> +       struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
> +       struct btrfs_device *device;
> +       u64 stripe_size;
> +       int ndevs = 0;
> +
> +       lockdep_assert_held(&fs_info->chunk_mutex);
> +
> +       /* Go through devices to collect their unallocated space. */
> +       list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) {
> +               u64 avail;
> +
> +               if (!test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
> +                                       &device->dev_state) ||
> +                   test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state))
> +                       continue;
> +
> +               if (device->total_bytes > device->bytes_used +
> +                               device->per_profile_allocated)
> +                       avail = device->total_bytes - device->bytes_used -
> +                               device->per_profile_allocated;
> +               else
> +                       avail = 0;
> +
> +               avail = round_down(avail, fs_info->sectorsize);
> +
> +               /* And exclude the [0, 1M) reserved space. */
> +               if (avail > BTRFS_DEVICE_RANGE_RESERVED)
> +                       avail -= BTRFS_DEVICE_RANGE_RESERVED;
> +               else
> +                       avail = 0;
> +
> +               /*
> +                * Not enough to support a single stripe, this device
> +                * can not be utilized for chunk allocation.
> +                */
> +               if (avail < BTRFS_STRIPE_LEN)
> +                       continue;
> +
> +               /*
> +                * Unlike chunk allocator, we don't care about stripe or hole
> +                * size, so here we use @avail directly.
> +                */
> +               devices_info[ndevs].dev_offset = 0;
> +               devices_info[ndevs].total_avail = avail;
> +               devices_info[ndevs].max_avail = avail;
> +               devices_info[ndevs].dev = device;
> +               ++ndevs;
> +       }
> +       sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
> +            btrfs_cmp_device_info, NULL);
> +       ndevs = rounddown(ndevs, raid_attr->devs_increment);
> +       if (ndevs < raid_attr->devs_min)
> +               return -ENOSPC;
> +       if (raid_attr->devs_max)
> +               ndevs = min(ndevs, (int)raid_attr->devs_max);
> +       else
> +               ndevs = min(ndevs, (int)BTRFS_MAX_DEVS(fs_info));
> +
> +       /*
> +        * Stripe size will be determined by the device with the least
> +        * unallocated space.
> +        */
> +       stripe_size = devices_info[ndevs - 1].total_avail;
> +
> +       for (int i = 0; i < ndevs; i++)
> +               devices_info[i].dev->per_profile_allocated += stripe_size;
> +       *allocated = div_u64(stripe_size * (ndevs - raid_attr->nparity),
> +                            raid_attr->ncopies);
> +       return 0;
> +}
> +
> +static int calc_one_profile_avail(struct btrfs_fs_info *fs_info,
> +                                 enum btrfs_raid_types type,
> +                                 u64 *result_ret)
> +{
> +       struct btrfs_device_info *devices_info = NULL;
> +       struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
> +       struct btrfs_device *device;
> +       u64 allocated;
> +       u64 result = 0;
> +       int ret = 0;
> +
> +       lockdep_assert_held(&fs_info->chunk_mutex);
> +       ASSERT(type >= 0 && type < BTRFS_NR_RAID_TYPES);
> +
> +       /* Not enough devices, quick exit, just update the result. */
> +       if (fs_devices->rw_devices < btrfs_raid_array[type].devs_min) {
> +               ret = -ENOSPC;
> +               goto out;
> +       }
> +
> +       devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info),
> +                              GFP_NOFS);
> +       if (!devices_info) {
> +               ret = -ENOMEM;
> +               goto out;
> +       }
> +       /* Clear virtual chunk used space for each device. */
> +       list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list)
> +               device->per_profile_allocated = 0;
> +
> +       while (!alloc_virtual_chunk(fs_info, devices_info, type, &allocated))
> +               result += allocated;
> +
> +out:
> +       kfree(devices_info);
> +       if (ret < 0 && ret != -ENOSPC)
> +               return ret;
> +       *result_ret = result;
> +       return 0;
> +}
> +
> +/* Update the per-profile available space array. */
> +void btrfs_update_per_profile_avail(struct btrfs_fs_info *fs_info)
> +{
> +       u64 results[BTRFS_NR_RAID_TYPES];
> +       int ret;
> +
> +       /*
> +        * Zoned is more complex as we can not simply get the amount of
> +        * available space for each device.
> +        */
> +       if (btrfs_is_zoned(fs_info))
> +               goto error;
> +
> +       for (int i = 0; i < BTRFS_NR_RAID_TYPES; i++) {
> +               ret = calc_one_profile_avail(fs_info, i, &results[i]);
> +               if (ret < 0)
> +                       goto error;
> +       }
> +
> +       spin_lock(&fs_info->fs_devices->per_profile_lock);
> +       for (int i = 0; i < BTRFS_NR_RAID_TYPES; i++)
> +               fs_info->fs_devices->per_profile_avail[i] = results[i];
> +       spin_unlock(&fs_info->fs_devices->per_profile_lock);
> +       return;
> +error:
> +       spin_lock(&fs_info->fs_devices->per_profile_lock);
> +       for (int i = 0; i < BTRFS_NR_RAID_TYPES; i++)
> +               fs_info->fs_devices->per_profile_avail[i] = U64_MAX;
> +       spin_unlock(&fs_info->fs_devices->per_profile_lock);
> +}
> +
>  static void check_raid56_incompat_flag(struct btrfs_fs_info *info, u64 type)
>  {
>         if (!(type & BTRFS_BLOCK_GROUP_RAID56_MASK))
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index ebc85bf53ee7..3dde32143058 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -22,6 +22,7 @@
>  #include <uapi/linux/btrfs_tree.h>
>  #include "messages.h"
>  #include "extent-io-tree.h"
> +#include "fs.h"
>
>  struct block_device;
>  struct bdev_handle;
> @@ -213,6 +214,12 @@ struct btrfs_device {
>
>         /* Bandwidth limit for scrub, in bytes */
>         u64 scrub_speed_max;
> +
> +       /*
> +        * A temporary number of allocated space during per-profile
> +        * available space calculation.
> +        */
> +       u64 per_profile_allocated;
>  };
>
>  /*
> @@ -458,6 +465,15 @@ struct btrfs_fs_devices {
>         /* Device to be used for reading in case of RAID1. */
>         u64 read_devid;
>  #endif
> +
> +       /*
> +        * Each value indicates the available space for that profile.
> +        * U64_MAX means the estimation is unavailable.
> +        *
> +        * Protected by per_profile_lock;
> +        */
> +       u64 per_profile_avail[BTRFS_NR_RAID_TYPES];
> +       spinlock_t per_profile_lock;
>  };
>
>  #define BTRFS_MAX_DEVS(info) ((BTRFS_MAX_ITEM_SIZE(info)       \
> @@ -886,6 +902,24 @@ int btrfs_bg_type_to_factor(u64 flags);
>  const char *btrfs_bg_type_to_raid_name(u64 flags);
>  int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info);
>  bool btrfs_verify_dev_items(const struct btrfs_fs_info *fs_info);
> +void btrfs_update_per_profile_avail(struct btrfs_fs_info *fs_info);
> +
> +static inline bool btrfs_get_per_profile_avail(struct btrfs_fs_info *fs_info,
> +                                              u64 profile, u64 *avail_ret)
> +{
> +       enum btrfs_raid_types index = btrfs_bg_flags_to_raid_index(profile);
> +       struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
> +       bool uptodate = false;
> +
> +       spin_lock(&fs_devices->per_profile_lock);
> +       if (fs_devices->per_profile_avail[index] != U64_MAX) {
> +               uptodate = true;
> +               *avail_ret = fs_devices->per_profile_avail[index];
> +       }
> +       spin_unlock(&fs_info->fs_devices->per_profile_lock);
> +       return uptodate;
> +}
> +
>  bool btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical);
>
>  bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr);
> --
> 2.52.0
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/3] btrfs: unbalanced disks aware per-profile available space estimation
  2026-02-04  2:54 [PATCH v2 0/3] btrfs: unbalanced disks aware per-profile available space estimation Qu Wenruo
                   ` (2 preceding siblings ...)
  2026-02-04  2:54 ` [PATCH v2 3/3] btrfs: use per-profile available space in calc_available_free_space() Qu Wenruo
@ 2026-02-04 15:42 ` Filipe Manana
  3 siblings, 0 replies; 10+ messages in thread
From: Filipe Manana @ 2026-02-04 15:42 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Feb 4, 2026 at 2:54 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [CHANGELOG]
> v2:
> - Various grammar fixes
>
> - Fix a u64 division compiling error on ppc64
>   Which requires the dedicated div_u64() helper.
>
> - Ignore unallocated space that's too small
>   If the unallocated space is not enough to even cover a single stripe
>   (64K), don't utilize it.
>   This makes the behavior more aligned to the chunk allocator, and
>   prevent over-estimation.
>
> - Use U64_MAX to mark the per-profile estimation as unavailable
>   This reduce the memory usage by one unsigned long.
>
> - Update the commit message of the 2nd patch
>   To include the overhead (runtime of btrfs_update_per_profile_avail())
>   in the commit message.
>
> - Minor comment cleanup on the term "balloon"
>   The old term "balloon" is no longer utilized and there is a typo.
>   ("ballon" -> "balloon").
>
> - Update the estimation examples in the first patch
>   As we allows 2 disks raid5 and 3 disks raid6.
>
> v1:
> - Revive from the v5.9 era fix
>
> - Make btrfs_update_per_profile_avail() to not return error
>   Instead just mark all profiles as unavailable, and
>   btrfs_get_per_profile_avail() will return false.
>
>   The caller will need to fallback to the existing factor based
>   estimation.
>
>   This greatly simplified the error handling, which is a pain point in
>   the original series.
>
> - Remove a lot of refactor/cleanup
>   As that's already done in upstream.
>
> - Only make calc_available_free_space() to use the new infrastructure
>   That's the main goal, fix can_over_commit().
>   Further enhancement can be done later.
>
> There is a long known bug that if metadata is using RAID1 on two
> unbalanced disks, btrfs have a very high chance to hit -ENOSPC during
> critical paths and flips RO.
>
> The bug dates back to v5.9 (where my last updates ends) and the most
> recent bug report is from Christoph.
>
> The idea to fix it is always here, by providing a chunk-allocator-like
> available space estimation.
> It doesn't need to be as heavy as chunk allocator, but at least it
> should not over-estimate.
>
> The demon is always in the details, the previous v5.9 era series
> requires a lot of changes in error handling, because the
> btrfs_update_per_profile_avail() can fail at critical paths in chunk
> allocation/removal and device grow/shrink/add/removal.
>
> But this time that function will no longer fail, but just mark
> per-profile available estimation as unreliable, and let the caller to
> fallback to the old factor based solution.
>
> In the real world it should not be a big deal, as the only error is
> -ENOMEM, but this greatly simplifies the error handling.
>
> Qu Wenruo (3):
>   btrfs: introduce the device layout aware per-profile available space
>   btrfs: update per-profile available estimation
>   btrfs: use per-profile available space in calc_available_free_space()

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Just a few minor typos in patch 1, can be fixed at commit time.

Thanks.

>
>  fs/btrfs/space-info.c |  27 ++++---
>  fs/btrfs/volumes.c    | 183 +++++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/volumes.h    |  34 ++++++++
>  3 files changed, 231 insertions(+), 13 deletions(-)
>
> --
> 2.52.0
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] btrfs: introduce the device layout aware per-profile available space
  2026-02-04  2:54 ` [PATCH v2 1/3] btrfs: introduce the device layout aware per-profile available space Qu Wenruo
  2026-02-04 15:41   ` Filipe Manana
@ 2026-02-08 15:59   ` Chris Mason
  2026-02-08 20:57     ` Qu Wenruo
  1 sibling, 1 reply; 10+ messages in thread
From: Chris Mason @ 2026-02-08 15:59 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Qu Wenruo <wqu@suse.com> wrote:
> [BUG]
> There is a long known bug that if metadata is using RAID1 on two disks
> with unbalanced sizes, there is a very high chance to hit ENOSPC related
> transaction abort.
> 
> [CAUSE]
> The root cause is in the available space estimation code:
> 
> - Factor based calculation
>   Just use all unallocated space, divide by the profile factor
>   One obvious user is can_overcommit().
> 
> This can not handle the following example:
> 
>   devid 1 unallocated:	1GiB
>   devid 2 unallocated:	50GiB
>   metadata type:	RAID1

Hi everyone,

I'm running the btrfs for-next branch, along with the encryption patch set
through my AI review prompts:

https://github.com/masoncl/review-prompts

I'm working on some btrfs specific instructions, but hopefully these reviews
are already useful.  I've kept the headers to make it easier to double check
that I'm actually replying to the right email, but please let me know if any
parts of this are incorrect:

commit 6f42a3863d937025e9670539658409ed668a7d48
Author: Qu Wenruo <wqu@suse.com>

btrfs: introduce the device layout aware per-profile available space

This commit introduces per-profile available space estimation to fix a
long-standing ENOSPC bug with RAID1 on unbalanced device sizes. It adds
infrastructure for chunk-allocator-like calculations to provide accurate
(conservative) estimations.

> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 8288d79372a56..0082c166af91f 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h

[ ... ]

> @@ -458,6 +465,15 @@ struct btrfs_fs_devices {
>  	/* Device to be used for reading in case of RAID1. */
>  	u64 read_devid;
>  #endif
> +
> +	/*
> +	 * Each value indicates the available space for that profile.
> +	 * U64_MAX means the estimation is unavailable.
> +	 *
> +	 * Protected by per_profile_lock;
> +	 */
> +	u64 per_profile_avail[BTRFS_NR_RAID_TYPES];
> +	spinlock_t per_profile_lock;
>  };

Where is this spinlock initialized? Looking at alloc_fs_devices() in
fs/btrfs/volumes.c:381, it uses kzalloc() which zeroes the memory, but
spinlocks require explicit initialization.

The follow-up commit 185737deaf9dc adds calls to
btrfs_update_per_profile_avail() which uses spin_lock() at line 5532, and
btrfs_get_per_profile_avail() also uses this spinlock. Using an uninitialized
spinlock is undefined behavior and will trigger warnings/BUGs with
CONFIG_DEBUG_SPINLOCK.

Other locks in the same struct (device_list_mutex) are properly initialized
with mutex_init() at line 389. Is there a missing spin_lock_init() call for
per_profile_lock?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] btrfs: introduce the device layout aware per-profile available space
  2026-02-08 15:59   ` Chris Mason
@ 2026-02-08 20:57     ` Qu Wenruo
  0 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2026-02-08 20:57 UTC (permalink / raw)
  To: Chris Mason, Qu Wenruo; +Cc: linux-btrfs



在 2026/2/9 02:29, Chris Mason 写道:
> Qu Wenruo <wqu@suse.com> wrote:
>> [BUG]
>> There is a long known bug that if metadata is using RAID1 on two disks
>> with unbalanced sizes, there is a very high chance to hit ENOSPC related
>> transaction abort.
>>
>> [CAUSE]
>> The root cause is in the available space estimation code:
>>
>> - Factor based calculation
>>    Just use all unallocated space, divide by the profile factor
>>    One obvious user is can_overcommit().
>>
>> This can not handle the following example:
>>
>>    devid 1 unallocated:	1GiB
>>    devid 2 unallocated:	50GiB
>>    metadata type:	RAID1
>   
> Hi everyone,
> 
> I'm running the btrfs for-next branch, along with the encryption patch set
> through my AI review prompts:
> 
> https://github.com/masoncl/review-prompts
> 
> I'm working on some btrfs specific instructions, but hopefully these reviews
> are already useful.  I've kept the headers to make it easier to double check
> that I'm actually replying to the right email, but please let me know if any
> parts of this are incorrect:
> 
> commit 6f42a3863d937025e9670539658409ed668a7d48
> Author: Qu Wenruo <wqu@suse.com>
> 
> btrfs: introduce the device layout aware per-profile available space
> 
> This commit introduces per-profile available space estimation to fix a
> long-standing ENOSPC bug with RAID1 on unbalanced device sizes. It adds
> infrastructure for chunk-allocator-like calculations to provide accurate
> (conservative) estimations.
> 
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index 8288d79372a56..0082c166af91f 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
> 
> [ ... ]
> 
>> @@ -458,6 +465,15 @@ struct btrfs_fs_devices {
>>   	/* Device to be used for reading in case of RAID1. */
>>   	u64 read_devid;
>>   #endif
>> +
>> +	/*
>> +	 * Each value indicates the available space for that profile.
>> +	 * U64_MAX means the estimation is unavailable.
>> +	 *
>> +	 * Protected by per_profile_lock;
>> +	 */
>> +	u64 per_profile_avail[BTRFS_NR_RAID_TYPES];
>> +	spinlock_t per_profile_lock;
>>   };
> 
> Where is this spinlock initialized? Looking at alloc_fs_devices() in
> fs/btrfs/volumes.c:381, it uses kzalloc() which zeroes the memory, but
> spinlocks require explicit initialization.
> 
> The follow-up commit 185737deaf9dc adds calls to
> btrfs_update_per_profile_avail() which uses spin_lock() at line 5532, and
> btrfs_get_per_profile_avail() also uses this spinlock. Using an uninitialized
> spinlock is undefined behavior and will trigger warnings/BUGs with
> CONFIG_DEBUG_SPINLOCK.
> 
> Other locks in the same struct (device_list_mutex) are properly initialized
> with mutex_init() at line 389. Is there a missing spin_lock_init() call for
> per_profile_lock?
> 

You're right, thanks for catching this missing initialization.

Thanks,
Qu


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] btrfs: update per-profile available estimation
  2026-02-04  2:54 ` [PATCH v2 2/3] btrfs: update per-profile available estimation Qu Wenruo
@ 2026-02-13  1:15   ` kernel test robot
  2026-02-13  4:24     ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: kernel test robot @ 2026-02-13  1:15 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: oe-lkp, lkp, linux-btrfs, oliver.sang


Hello,

kernel test robot noticed "INFO:trying_to_register_non-static_key" on:

commit: 50b35a50fe83cb7870710b173f8b5ee78dd20107 ("[PATCH v2 2/3] btrfs: update per-profile available estimation")
url: https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-introduce-the-device-layout-aware-per-profile-available-space/20260204-105811
base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git for-next
patch link: https://lore.kernel.org/all/b4d6fcecccd3c2c3b5359131e0493f190d1f5959.1770173615.git.wqu@suse.com/
patch subject: [PATCH v2 2/3] btrfs: update per-profile available estimation

in testcase: perf-sanity-tests
version: 
with following parameters:

	perf_compiler: gcc
	group: group-02



config: x86_64-rhel-9.4-bpf
compiler: gcc-14
test machine: 22 threads 1 sockets Intel(R) Core(TM) Ultra 9 185H @ 4.5GHz (Meteor Lake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202602130252.89b82f3f-lkp@intel.com


kern  :err   : [   91.987109] [   T4552] INFO: trying to register non-static key.
kern  :err   : [   91.988642] [   T4552] The code is fine but needs lockdep annotation, or maybe
kern  :err   : [   91.990349] [   T4552] you didn't initialize this object before use?
kern  :err   : [   91.991930] [   T4552] turning off the locking correctness validator.
kern  :warn  : [   91.993525] [   T4552] CPU: 1 UID: 0 PID: 4552 Comm: mount Tainted: G S      W           6.19.0-rc8-00146-g50b35a50fe83 #1 PREEMPT(full)
kern  :warn  : [   91.993530] [   T4552] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
kern  :warn  : [   91.993531] [   T4552] Hardware name: ASUSTeK COMPUTER INC. NUC14RVS-B/NUC14RVSU9, BIOS RVMTL357.0047.2025.0108.1408 01/08/2025
kern  :warn  : [   91.993532] [   T4552] Call Trace:
kern  :warn  : [   91.993533] [   T4552]  <TASK>
kern  :warn  : [   91.993535] [   T4552]  dump_stack_lvl (lib/dump_stack.c:122)
kern  :warn  : [   91.993541] [   T4552]  register_lock_class (kernel/locking/lockdep.c:985 kernel/locking/lockdep.c:1299)
kern  :warn  : [   91.993545] [   T4552]  __lock_acquire (kernel/locking/lockdep.c:5113)
kern  :warn  : [   91.993549] [   T4552]  lock_acquire (include/linux/preempt.h:469 (discriminator 2) include/trace/events/lock.h:24 (discriminator 2) include/trace/events/lock.h:24 (discriminator 2) kernel/locking/lockdep.c:5831 (discriminator 2))
kern  :warn  : [   91.993551] [   T4552]  ? btrfs_update_per_profile_avail (fs/btrfs/volumes.c:5537) btrfs
kern  :warn  : [   91.993701] [   T4552]  ? rcu_is_watching (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/context_tracking.h:128 kernel/rcu/tree.c:751)
kern  :warn  : [   91.993704] [   T4552]  ? lock_acquire (include/trace/events/lock.h:24 (discriminator 2) kernel/locking/lockdep.c:5831 (discriminator 2))
kern  :warn  : [   91.993706] [   T4552]  _raw_spin_lock (include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
kern  :warn  : [   91.993710] [   T4552]  ? btrfs_update_per_profile_avail (fs/btrfs/volumes.c:5537) btrfs
kern  :warn  : [   91.993849] [   T4552] btrfs_update_per_profile_avail (fs/btrfs/volumes.c:5537) btrfs
kern  :warn  : [   91.993988] [   T4552]  ? __pfx_btrfs_update_per_profile_avail (fs/btrfs/volumes.c:5518) btrfs
kern  :warn  : [   91.994127] [   T4552]  ? btrfs_verify_dev_extents (fs/btrfs/volumes.c:8602) btrfs
kern  :warn  : [   91.994268] [   T4552]  ? __lock_release+0x5d/0x1b0
kern  :warn  : [   91.994270] [   T4552]  ? rcu_is_watching (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/context_tracking.h:128 kernel/rcu/tree.c:751)
kern  :warn  : [   91.994274] [   T4552] btrfs_verify_dev_extents (fs/btrfs/volumes.c:8604) btrfs
kern  :warn  : [   91.994415] [   T4552]  ? __pfx_btrfs_verify_dev_extents (fs/btrfs/volumes.c:8512) btrfs
kern  :warn  : [   91.994562] [   T4552]  ? btrfs_verify_dev_items (fs/btrfs/volumes.c:8641) btrfs
kern  :warn  : [   91.994704] [   T4552] open_ctree (fs/btrfs/disk-io.c:3533) btrfs
kern  :warn  : [   91.994842] [   T4552] btrfs_fill_super.cold (fs/btrfs/super.c:981) btrfs
kern  :warn  : [   91.994976] [   T4552] btrfs_get_tree_super (fs/btrfs/super.c:1945) btrfs
kern  :warn  : [   91.995108] [   T4552] btrfs_get_tree_subvol (fs/btrfs/super.c:2087) btrfs
kern  :warn  : [   91.995241] [   T4552]  vfs_get_tree (fs/super.c:1751)
kern  :warn  : [   91.995245] [   T4552]  vfs_cmd_create (fs/fsopen.c:231)
kern  :warn  : [   91.995249] [   T4552]  __do_sys_fsconfig (fs/fsopen.c:474)
kern  :warn  : [   91.995251] [   T4552]  ? __pfx___do_sys_fsconfig (fs/fsopen.c:356)
kern  :warn  : [   91.995255] [   T4552]  ? lock_release (kernel/locking/lockdep.c:470 (discriminator 4) kernel/locking/lockdep.c:5891 (discriminator 4) kernel/locking/lockdep.c:5875 (discriminator 4))
kern  :warn  : [   91.995257] [   T4552]  ? do_syscall_64 (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:119 include/linux/entry-common.h:108 arch/x86/entry/syscall_64.c:90)
kern  :warn  : [   91.995261] [   T4552]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
kern  :warn  : [   91.995263] [   T4552]  ? __pfx_ksys_read (fs/read_write.c:705)
kern  :warn  : [   91.995265] [   T4552]  ? kfree (mm/slub.c:6674 (discriminator 3) mm/slub.c:6882 (discriminator 3))
kern  :warn  : [   91.995268] [   T4552]  ? do_syscall_64 (include/linux/irq-entry-common.h:298 include/linux/entry-common.h:196 arch/x86/entry/syscall_64.c:100)
kern  :warn  : [   91.995270] [   T4552]  ? do_syscall_64 (arch/x86/entry/syscall_64.c:113)
kern  :warn  : [   91.995272] [   T4552]  ? __do_sys_fsconfig (fs/fsopen.c:499)
kern  :warn  : [   91.995274] [   T4552]  ? __do_sys_fsconfig (fs/fsopen.c:499)
kern  :warn  : [   91.995277] [   T4552]  ? __pfx___do_sys_fsconfig (fs/fsopen.c:356)
kern  :warn  : [   91.995279] [   T4552]  ? do_syscall_64 (include/linux/irq-entry-common.h:298 include/linux/entry-common.h:196 arch/x86/entry/syscall_64.c:100)
kern  :warn  : [   91.995282] [   T4552]  ? do_syscall_64 (arch/x86/entry/syscall_64.c:113)
kern  :warn  : [   91.995284] [   T4552]  ? do_syscall_64 (arch/x86/entry/syscall_64.c:113)
kern  :warn  : [   91.995286] [   T4552]  ? do_syscall_64 (include/linux/irq-entry-common.h:298 include/linux/entry-common.h:196 arch/x86/entry/syscall_64.c:100)
kern  :warn  : [   91.995288] [   T4552]  ? do_syscall_64 (arch/x86/entry/syscall_64.c:113)
kern  :warn  : [   91.995290] [   T4552]  ? do_syscall_64 (arch/x86/entry/syscall_64.c:113)
kern  :warn  : [   91.995292] [   T4552]  ? irqentry_exit (include/linux/irq-entry-common.h:298 include/linux/irq-entry-common.h:341 kernel/entry/common.c:196)
kern  :warn  : [   91.995294] [   T4552]  ? trace_hardirqs_on_prepare (kernel/trace/trace_preemptirq.c:64 (discriminator 4) kernel/trace/trace_preemptirq.c:59 (discriminator 4))
kern  :warn  : [   91.995296] [   T4552]  ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4629 (discriminator 4))
kern  :warn  : [   91.995299] [   T4552]  ? irqentry_exit (arch/x86/include/asm/jump_label.h:37 include/linux/context_tracking_state.h:138 include/linux/context_tracking.h:41 include/linux/irq-entry-common.h:301 include/linux/irq-entry-common.h:341 kernel/entry/common.c:196)
kern  :warn  : [   91.995301] [   T4552]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
kern  :warn  : [   91.995304] [   T4552] RIP: 0033:0x7fb38ba0e4aa
kern  :warn  : [   91.995331] [   T4552] Code: 73 01 c3 48 8b 0d 4e 59 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 af 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1e 59 0d 00 f7 d8 64 89 01 48
All code
========
   0:	73 01                	jae    0x3
   2:	c3                   	ret
   3:	48 8b 0d 4e 59 0d 00 	mov    0xd594e(%rip),%rcx        # 0xd5958
   a:	f7 d8                	neg    %eax
   c:	64 89 01             	mov    %eax,%fs:(%rcx)
   f:	48 83 c8 ff          	or     $0xffffffffffffffff,%rax
  13:	c3                   	ret
  14:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
  1b:	00 00 00 
  1e:	66 90                	xchg   %ax,%ax
  20:	49 89 ca             	mov    %rcx,%r10
  23:	b8 af 01 00 00       	mov    $0x1af,%eax
  28:	0f 05                	syscall
  2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
  30:	73 01                	jae    0x33
  32:	c3                   	ret
  33:	48 8b 0d 1e 59 0d 00 	mov    0xd591e(%rip),%rcx        # 0xd5958
  3a:	f7 d8                	neg    %eax
  3c:	64 89 01             	mov    %eax,%fs:(%rcx)
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
   6:	73 01                	jae    0x9
   8:	c3                   	ret
   9:	48 8b 0d 1e 59 0d 00 	mov    0xd591e(%rip),%rcx        # 0xd592e
  10:	f7 d8                	neg    %eax
  12:	64 89 01             	mov    %eax,%fs:(%rcx)
  15:	48                   	rex.W
kern  :warn  : [   91.995334] [   T4552] RSP: 002b:00007ffd1dd07898 EFLAGS: 00000246 ORIG_RAX: 00000000000001af
kern  :warn  : [   91.995337] [   T4552] RAX: ffffffffffffffda RBX: 000055a8acde41d0 RCX: 00007fb38ba0e4aa
kern  :warn  : [   91.995339] [   T4552] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
kern  :warn  : [   91.995340] [   T4552] RBP: 000055a8acde5d20 R08: 0000000000000000 R09: 0000000000000000
kern  :warn  : [   91.995342] [   T4552] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
kern  :warn  : [   91.995343] [   T4552] R13: 00007fb38bba0580 R14: 00007fb38bba226c R15: 00007fb38bb87a23
kern  :warn  : [   91.995347] [   T4552]  </TASK>
kern  :info  : [   92.094700] [   T4552] BTRFS info (device nvme0n1p5): enabling ssd optimizations
kern  :info  : [   92.096302] [   T4552] BTRFS info (device nvme0n1p5): turning on async discard
kern  :info  : [   92.097968] [   T4552] BTRFS info (device nvme0n1p5): enabling free space tree


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260213/202602130252.89b82f3f-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] btrfs: update per-profile available estimation
  2026-02-13  1:15   ` kernel test robot
@ 2026-02-13  4:24     ` Qu Wenruo
  0 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2026-02-13  4:24 UTC (permalink / raw)
  To: kernel test robot; +Cc: oe-lkp, lkp, linux-btrfs



在 2026/2/13 11:45, kernel test robot 写道:
> 
> Hello,
> 
> kernel test robot noticed "INFO:trying_to_register_non-static_key" on:
> 
> commit: 50b35a50fe83cb7870710b173f8b5ee78dd20107 ("[PATCH v2 2/3] btrfs: update per-profile available estimation")
> url: https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-introduce-the-device-layout-aware-per-profile-available-space/20260204-105811
> base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git for-next
> patch link: https://lore.kernel.org/all/b4d6fcecccd3c2c3b5359131e0493f190d1f5959.1770173615.git.wqu@suse.com/
> patch subject: [PATCH v2 2/3] btrfs: update per-profile available estimation
> 
> in testcase: perf-sanity-tests
> version:
> with following parameters:
> 
> 	perf_compiler: gcc
> 	group: group-02
> 
> 
> 
> config: x86_64-rhel-9.4-bpf
> compiler: gcc-14
> test machine: 22 threads 1 sockets Intel(R) Core(TM) Ultra 9 185H @ 4.5GHz (Meteor Lake) with 32G memory
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)

Thanks for the report, it's already fixed in btrfs/for-next branch.

Thanks,
Qu


> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202602130252.89b82f3f-lkp@intel.com
> 
> 
> kern  :err   : [   91.987109] [   T4552] INFO: trying to register non-static key.
> kern  :err   : [   91.988642] [   T4552] The code is fine but needs lockdep annotation, or maybe
> kern  :err   : [   91.990349] [   T4552] you didn't initialize this object before use?
> kern  :err   : [   91.991930] [   T4552] turning off the locking correctness validator.
> kern  :warn  : [   91.993525] [   T4552] CPU: 1 UID: 0 PID: 4552 Comm: mount Tainted: G S      W           6.19.0-rc8-00146-g50b35a50fe83 #1 PREEMPT(full)
> kern  :warn  : [   91.993530] [   T4552] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
> kern  :warn  : [   91.993531] [   T4552] Hardware name: ASUSTeK COMPUTER INC. NUC14RVS-B/NUC14RVSU9, BIOS RVMTL357.0047.2025.0108.1408 01/08/2025
> kern  :warn  : [   91.993532] [   T4552] Call Trace:
> kern  :warn  : [   91.993533] [   T4552]  <TASK>
> kern  :warn  : [   91.993535] [   T4552]  dump_stack_lvl (lib/dump_stack.c:122)
> kern  :warn  : [   91.993541] [   T4552]  register_lock_class (kernel/locking/lockdep.c:985 kernel/locking/lockdep.c:1299)
> kern  :warn  : [   91.993545] [   T4552]  __lock_acquire (kernel/locking/lockdep.c:5113)
> kern  :warn  : [   91.993549] [   T4552]  lock_acquire (include/linux/preempt.h:469 (discriminator 2) include/trace/events/lock.h:24 (discriminator 2) include/trace/events/lock.h:24 (discriminator 2) kernel/locking/lockdep.c:5831 (discriminator 2))
> kern  :warn  : [   91.993551] [   T4552]  ? btrfs_update_per_profile_avail (fs/btrfs/volumes.c:5537) btrfs
> kern  :warn  : [   91.993701] [   T4552]  ? rcu_is_watching (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/context_tracking.h:128 kernel/rcu/tree.c:751)
> kern  :warn  : [   91.993704] [   T4552]  ? lock_acquire (include/trace/events/lock.h:24 (discriminator 2) kernel/locking/lockdep.c:5831 (discriminator 2))
> kern  :warn  : [   91.993706] [   T4552]  _raw_spin_lock (include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
> kern  :warn  : [   91.993710] [   T4552]  ? btrfs_update_per_profile_avail (fs/btrfs/volumes.c:5537) btrfs
> kern  :warn  : [   91.993849] [   T4552] btrfs_update_per_profile_avail (fs/btrfs/volumes.c:5537) btrfs
> kern  :warn  : [   91.993988] [   T4552]  ? __pfx_btrfs_update_per_profile_avail (fs/btrfs/volumes.c:5518) btrfs
> kern  :warn  : [   91.994127] [   T4552]  ? btrfs_verify_dev_extents (fs/btrfs/volumes.c:8602) btrfs
> kern  :warn  : [   91.994268] [   T4552]  ? __lock_release+0x5d/0x1b0
> kern  :warn  : [   91.994270] [   T4552]  ? rcu_is_watching (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/context_tracking.h:128 kernel/rcu/tree.c:751)
> kern  :warn  : [   91.994274] [   T4552] btrfs_verify_dev_extents (fs/btrfs/volumes.c:8604) btrfs
> kern  :warn  : [   91.994415] [   T4552]  ? __pfx_btrfs_verify_dev_extents (fs/btrfs/volumes.c:8512) btrfs
> kern  :warn  : [   91.994562] [   T4552]  ? btrfs_verify_dev_items (fs/btrfs/volumes.c:8641) btrfs
> kern  :warn  : [   91.994704] [   T4552] open_ctree (fs/btrfs/disk-io.c:3533) btrfs
> kern  :warn  : [   91.994842] [   T4552] btrfs_fill_super.cold (fs/btrfs/super.c:981) btrfs
> kern  :warn  : [   91.994976] [   T4552] btrfs_get_tree_super (fs/btrfs/super.c:1945) btrfs
> kern  :warn  : [   91.995108] [   T4552] btrfs_get_tree_subvol (fs/btrfs/super.c:2087) btrfs
> kern  :warn  : [   91.995241] [   T4552]  vfs_get_tree (fs/super.c:1751)
> kern  :warn  : [   91.995245] [   T4552]  vfs_cmd_create (fs/fsopen.c:231)
> kern  :warn  : [   91.995249] [   T4552]  __do_sys_fsconfig (fs/fsopen.c:474)
> kern  :warn  : [   91.995251] [   T4552]  ? __pfx___do_sys_fsconfig (fs/fsopen.c:356)
> kern  :warn  : [   91.995255] [   T4552]  ? lock_release (kernel/locking/lockdep.c:470 (discriminator 4) kernel/locking/lockdep.c:5891 (discriminator 4) kernel/locking/lockdep.c:5875 (discriminator 4))
> kern  :warn  : [   91.995257] [   T4552]  ? do_syscall_64 (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:119 include/linux/entry-common.h:108 arch/x86/entry/syscall_64.c:90)
> kern  :warn  : [   91.995261] [   T4552]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
> kern  :warn  : [   91.995263] [   T4552]  ? __pfx_ksys_read (fs/read_write.c:705)
> kern  :warn  : [   91.995265] [   T4552]  ? kfree (mm/slub.c:6674 (discriminator 3) mm/slub.c:6882 (discriminator 3))
> kern  :warn  : [   91.995268] [   T4552]  ? do_syscall_64 (include/linux/irq-entry-common.h:298 include/linux/entry-common.h:196 arch/x86/entry/syscall_64.c:100)
> kern  :warn  : [   91.995270] [   T4552]  ? do_syscall_64 (arch/x86/entry/syscall_64.c:113)
> kern  :warn  : [   91.995272] [   T4552]  ? __do_sys_fsconfig (fs/fsopen.c:499)
> kern  :warn  : [   91.995274] [   T4552]  ? __do_sys_fsconfig (fs/fsopen.c:499)
> kern  :warn  : [   91.995277] [   T4552]  ? __pfx___do_sys_fsconfig (fs/fsopen.c:356)
> kern  :warn  : [   91.995279] [   T4552]  ? do_syscall_64 (include/linux/irq-entry-common.h:298 include/linux/entry-common.h:196 arch/x86/entry/syscall_64.c:100)
> kern  :warn  : [   91.995282] [   T4552]  ? do_syscall_64 (arch/x86/entry/syscall_64.c:113)
> kern  :warn  : [   91.995284] [   T4552]  ? do_syscall_64 (arch/x86/entry/syscall_64.c:113)
> kern  :warn  : [   91.995286] [   T4552]  ? do_syscall_64 (include/linux/irq-entry-common.h:298 include/linux/entry-common.h:196 arch/x86/entry/syscall_64.c:100)
> kern  :warn  : [   91.995288] [   T4552]  ? do_syscall_64 (arch/x86/entry/syscall_64.c:113)
> kern  :warn  : [   91.995290] [   T4552]  ? do_syscall_64 (arch/x86/entry/syscall_64.c:113)
> kern  :warn  : [   91.995292] [   T4552]  ? irqentry_exit (include/linux/irq-entry-common.h:298 include/linux/irq-entry-common.h:341 kernel/entry/common.c:196)
> kern  :warn  : [   91.995294] [   T4552]  ? trace_hardirqs_on_prepare (kernel/trace/trace_preemptirq.c:64 (discriminator 4) kernel/trace/trace_preemptirq.c:59 (discriminator 4))
> kern  :warn  : [   91.995296] [   T4552]  ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4629 (discriminator 4))
> kern  :warn  : [   91.995299] [   T4552]  ? irqentry_exit (arch/x86/include/asm/jump_label.h:37 include/linux/context_tracking_state.h:138 include/linux/context_tracking.h:41 include/linux/irq-entry-common.h:301 include/linux/irq-entry-common.h:341 kernel/entry/common.c:196)
> kern  :warn  : [   91.995301] [   T4552]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
> kern  :warn  : [   91.995304] [   T4552] RIP: 0033:0x7fb38ba0e4aa
> kern  :warn  : [   91.995331] [   T4552] Code: 73 01 c3 48 8b 0d 4e 59 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 af 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1e 59 0d 00 f7 d8 64 89 01 48
> All code
> ========
>     0:	73 01                	jae    0x3
>     2:	c3                   	ret
>     3:	48 8b 0d 4e 59 0d 00 	mov    0xd594e(%rip),%rcx        # 0xd5958
>     a:	f7 d8                	neg    %eax
>     c:	64 89 01             	mov    %eax,%fs:(%rcx)
>     f:	48 83 c8 ff          	or     $0xffffffffffffffff,%rax
>    13:	c3                   	ret
>    14:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
>    1b:	00 00 00
>    1e:	66 90                	xchg   %ax,%ax
>    20:	49 89 ca             	mov    %rcx,%r10
>    23:	b8 af 01 00 00       	mov    $0x1af,%eax
>    28:	0f 05                	syscall
>    2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
>    30:	73 01                	jae    0x33
>    32:	c3                   	ret
>    33:	48 8b 0d 1e 59 0d 00 	mov    0xd591e(%rip),%rcx        # 0xd5958
>    3a:	f7 d8                	neg    %eax
>    3c:	64 89 01             	mov    %eax,%fs:(%rcx)
>    3f:	48                   	rex.W
> 
> Code starting with the faulting instruction
> ===========================================
>     0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
>     6:	73 01                	jae    0x9
>     8:	c3                   	ret
>     9:	48 8b 0d 1e 59 0d 00 	mov    0xd591e(%rip),%rcx        # 0xd592e
>    10:	f7 d8                	neg    %eax
>    12:	64 89 01             	mov    %eax,%fs:(%rcx)
>    15:	48                   	rex.W
> kern  :warn  : [   91.995334] [   T4552] RSP: 002b:00007ffd1dd07898 EFLAGS: 00000246 ORIG_RAX: 00000000000001af
> kern  :warn  : [   91.995337] [   T4552] RAX: ffffffffffffffda RBX: 000055a8acde41d0 RCX: 00007fb38ba0e4aa
> kern  :warn  : [   91.995339] [   T4552] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
> kern  :warn  : [   91.995340] [   T4552] RBP: 000055a8acde5d20 R08: 0000000000000000 R09: 0000000000000000
> kern  :warn  : [   91.995342] [   T4552] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> kern  :warn  : [   91.995343] [   T4552] R13: 00007fb38bba0580 R14: 00007fb38bba226c R15: 00007fb38bb87a23
> kern  :warn  : [   91.995347] [   T4552]  </TASK>
> kern  :info  : [   92.094700] [   T4552] BTRFS info (device nvme0n1p5): enabling ssd optimizations
> kern  :info  : [   92.096302] [   T4552] BTRFS info (device nvme0n1p5): turning on async discard
> kern  :info  : [   92.097968] [   T4552] BTRFS info (device nvme0n1p5): enabling free space tree
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20260213/202602130252.89b82f3f-lkp@intel.com
> 
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-02-13  4:24 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-04  2:54 [PATCH v2 0/3] btrfs: unbalanced disks aware per-profile available space estimation Qu Wenruo
2026-02-04  2:54 ` [PATCH v2 1/3] btrfs: introduce the device layout aware per-profile available space Qu Wenruo
2026-02-04 15:41   ` Filipe Manana
2026-02-08 15:59   ` Chris Mason
2026-02-08 20:57     ` Qu Wenruo
2026-02-04  2:54 ` [PATCH v2 2/3] btrfs: update per-profile available estimation Qu Wenruo
2026-02-13  1:15   ` kernel test robot
2026-02-13  4:24     ` Qu Wenruo
2026-02-04  2:54 ` [PATCH v2 3/3] btrfs: use per-profile available space in calc_available_free_space() Qu Wenruo
2026-02-04 15:42 ` [PATCH v2 0/3] btrfs: unbalanced disks aware per-profile available space estimation Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox