public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/14] btrfs: more RST delete fixes
@ 2024-12-12  7:55 Johannes Thumshirn
  2024-12-12  7:55 ` [PATCH 01/14] btrfs: don't try to delete RAID stripe-extents if we don't need to Johannes Thumshirn
                   ` (13 more replies)
  0 siblings, 14 replies; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-12  7:55 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Josef Bacik, Damien Le Moal, David Sterba, Naohiro Aota,
	Qu Wenruo, Filipe Manana, Johannes Thumshirn, Johannes Thumshirn

Here's another set of fixes for the delete path on RAID stripe-tree backed
filesystems.

Josef's CI system started tripping over a bad key order due to the usage
of btrfs_set_item_key_safe() in btrfs_partially_delete_raid_extent() and
while investigating what is happening there I found more bugs and not
handled corner cases, which resulted in more fixes and test-cases.

Unfortunately I couldn't fix the bad key order problem and had to resort
to re-creating the item in btrfs_partially_delete_raid_extent() and insert
the new one after deleting the old.

Fstests btrfs/06* are extremely good in exhibiting these failures and
btrfs/060 has been extensively run while developing this series.

A full CI run is undergoing at the moment:
https://github.com/btrfs/linux/actions/runs/12291668397

Johannes Thumshirn (14):
  btrfs: don't try to delete RAID stripe-extents if we don't need to
  btrfs: assert RAID stripe-extent length is always greater than 0
  btrfs: fix search when deleting a RAID stripe-extent
  btrfs: fix front delete range calculation for RAID stripe extents
  btrfs: fix tail delete of RAID stripe-extents
  btrfs: fix deletion of a range spanning parts two RAID stripe extents
  btrfs: implement hole punching for RAID stripe extents
  btrfs: don't use btrfs_set_item_key_safe on RAID stripe-extents
  btrfs: selftests: check for correct return value of failed lookup
  btrfs: selftests: don't split RAID extents in half
  btrfs: selftests: test RAID stripe-tree deletion spanning two items
  btrfs: selftests: add selftest for punching holes into the RAID stripe
    extents
  btrfs: selftests: add test for punching a hole into 3 RAID
    stripe-extents
  btrfs: selftests: add a selftest for deleting two out of three extents

 fs/btrfs/ctree.c                        |   1 +
 fs/btrfs/raid-stripe-tree.c             | 154 +++++-
 fs/btrfs/tests/raid-stripe-tree-tests.c | 653 +++++++++++++++++++++++-
 3 files changed, 776 insertions(+), 32 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/14] btrfs: don't try to delete RAID stripe-extents if we don't need to
  2024-12-12  7:55 [PATCH 00/14] btrfs: more RST delete fixes Johannes Thumshirn
@ 2024-12-12  7:55 ` Johannes Thumshirn
  2024-12-17 14:40   ` Filipe Manana
  2024-12-12  7:55 ` [PATCH 02/14] btrfs: assert RAID stripe-extent length is always greater than 0 Johannes Thumshirn
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-12  7:55 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Josef Bacik, Damien Le Moal, David Sterba, Naohiro Aota,
	Qu Wenruo, Filipe Manana, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Don't try to delete RAID stripe-extents if we don't need to.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/raid-stripe-tree.c             | 13 ++++++++++++-
 fs/btrfs/tests/raid-stripe-tree-tests.c |  3 ++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 9ffc79f250fb..2c4ee5a9449a 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -59,9 +59,20 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 	int slot;
 	int ret;
 
-	if (!stripe_root)
+	if (!btrfs_fs_incompat(fs_info, RAID_STRIPE_TREE) || !stripe_root)
 		return 0;
 
+	if (!btrfs_is_testing(fs_info)) {
+		struct btrfs_chunk_map *map;
+		bool use_rst;
+
+		map = btrfs_find_chunk_map(fs_info, start, length);
+		use_rst = btrfs_need_stripe_tree_update(fs_info, map->type);
+		btrfs_free_chunk_map(map);
+		if (!use_rst)
+			return 0;
+	}
+
 	path = btrfs_alloc_path();
 	if (!path)
 		return -ENOMEM;
diff --git a/fs/btrfs/tests/raid-stripe-tree-tests.c b/fs/btrfs/tests/raid-stripe-tree-tests.c
index 30f17eb7b6a8..f060c04c7f76 100644
--- a/fs/btrfs/tests/raid-stripe-tree-tests.c
+++ b/fs/btrfs/tests/raid-stripe-tree-tests.c
@@ -478,8 +478,9 @@ static int run_test(test_func_t test, u32 sectorsize, u32 nodesize)
 		ret = PTR_ERR(root);
 		goto out;
 	}
-	btrfs_set_super_compat_ro_flags(root->fs_info->super_copy,
+	btrfs_set_super_incompat_flags(root->fs_info->super_copy,
 					BTRFS_FEATURE_INCOMPAT_RAID_STRIPE_TREE);
+	btrfs_set_fs_incompat(root->fs_info, RAID_STRIPE_TREE);
 	root->root_key.objectid = BTRFS_RAID_STRIPE_TREE_OBJECTID;
 	root->root_key.type = BTRFS_ROOT_ITEM_KEY;
 	root->root_key.offset = 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/14] btrfs: assert RAID stripe-extent length is always greater than 0
  2024-12-12  7:55 [PATCH 00/14] btrfs: more RST delete fixes Johannes Thumshirn
  2024-12-12  7:55 ` [PATCH 01/14] btrfs: don't try to delete RAID stripe-extents if we don't need to Johannes Thumshirn
@ 2024-12-12  7:55 ` Johannes Thumshirn
  2024-12-17 14:46   ` Filipe Manana
  2024-12-12  7:55 ` [PATCH 03/14] btrfs: fix search when deleting a RAID stripe-extent Johannes Thumshirn
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-12  7:55 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Josef Bacik, Damien Le Moal, David Sterba, Naohiro Aota,
	Qu Wenruo, Filipe Manana, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

When modifying a RAID stripe-extent, ASSERT() that the length of the new
RAID stripe-extent is always greater than 0.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/raid-stripe-tree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 2c4ee5a9449a..d341fc2a8a4f 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -28,6 +28,7 @@ static void btrfs_partially_delete_raid_extent(struct btrfs_trans_handle *trans,
 		.offset = newlen,
 	};
 
+	ASSERT(newlen > 0);
 	ASSERT(oldkey->type == BTRFS_RAID_STRIPE_KEY);
 
 	leaf = path->nodes[0];
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/14] btrfs: fix search when deleting a RAID stripe-extent
  2024-12-12  7:55 [PATCH 00/14] btrfs: more RST delete fixes Johannes Thumshirn
  2024-12-12  7:55 ` [PATCH 01/14] btrfs: don't try to delete RAID stripe-extents if we don't need to Johannes Thumshirn
  2024-12-12  7:55 ` [PATCH 02/14] btrfs: assert RAID stripe-extent length is always greater than 0 Johannes Thumshirn
@ 2024-12-12  7:55 ` Johannes Thumshirn
  2024-12-17 14:56   ` Filipe Manana
  2024-12-12  7:55 ` [PATCH 04/14] btrfs: fix front delete range calculation for RAID stripe extents Johannes Thumshirn
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-12  7:55 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Josef Bacik, Damien Le Moal, David Sterba, Naohiro Aota,
	Qu Wenruo, Filipe Manana, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

When searching for a RAID stripe-extent for deletion, use an offset of -1
to always get the "left" slot in the btree and correctly handle the slot
selection.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/raid-stripe-tree.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index d341fc2a8a4f..d6f7d7d60e76 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -81,14 +81,18 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 	while (1) {
 		key.objectid = start;
 		key.type = BTRFS_RAID_STRIPE_KEY;
-		key.offset = 0;
+		key.offset = (u64)-1;
 
 		ret = btrfs_search_slot(trans, stripe_root, &key, path, -1, 1);
 		if (ret < 0)
 			break;
 
-		if (path->slots[0] == btrfs_header_nritems(path->nodes[0]))
-			path->slots[0]--;
+		if (ret == 1) {
+			ret = 0;
+			if (path->slots[0] ==
+			    btrfs_header_nritems(path->nodes[0]))
+				path->slots[0]--;
+		}
 
 		leaf = path->nodes[0];
 		slot = path->slots[0];
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/14] btrfs: fix front delete range calculation for RAID stripe extents
  2024-12-12  7:55 [PATCH 00/14] btrfs: more RST delete fixes Johannes Thumshirn
                   ` (2 preceding siblings ...)
  2024-12-12  7:55 ` [PATCH 03/14] btrfs: fix search when deleting a RAID stripe-extent Johannes Thumshirn
@ 2024-12-12  7:55 ` Johannes Thumshirn
  2024-12-17 15:02   ` Filipe Manana
  2024-12-12  7:55 ` [PATCH 05/14] btrfs: fix tail delete of RAID stripe-extents Johannes Thumshirn
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-12  7:55 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Josef Bacik, Damien Le Moal, David Sterba, Naohiro Aota,
	Qu Wenruo, Filipe Manana, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

When deleting the front of a RAID stripe-extent the delete code
miscalculates the size on how much to pad the remaining extent part in the
front.

Fix the calculation so we're always having the sizes we expect.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/raid-stripe-tree.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index d6f7d7d60e76..092e24e1de32 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -138,10 +138,13 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 		 * length to the new size and then re-insert the item.
 		 */
 		if (found_end > end) {
-			u64 diff = found_end - end;
+			u64 diff_end = found_end - end;
 
 			btrfs_partially_delete_raid_extent(trans, path, &key,
-							   diff, diff);
+							   key.offset - length,
+							   length);
+			ASSERT(key.offset - diff_end == length);
+			length = 0;
 			break;
 		}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/14] btrfs: fix tail delete of RAID stripe-extents
  2024-12-12  7:55 [PATCH 00/14] btrfs: more RST delete fixes Johannes Thumshirn
                   ` (3 preceding siblings ...)
  2024-12-12  7:55 ` [PATCH 04/14] btrfs: fix front delete range calculation for RAID stripe extents Johannes Thumshirn
@ 2024-12-12  7:55 ` Johannes Thumshirn
  2024-12-17 15:45   ` Filipe Manana
  2024-12-12  7:55 ` [PATCH 06/14] btrfs: fix deletion of a range spanning parts two RAID stripe extents Johannes Thumshirn
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-12  7:55 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Josef Bacik, Damien Le Moal, David Sterba, Naohiro Aota,
	Qu Wenruo, Filipe Manana, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Fix tail delete of RAID stripe-extents, if there is a range to be deleted
as well after the tail delete of the extent.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/raid-stripe-tree.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 092e24e1de32..6246ab4c1a21 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -121,11 +121,18 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 		 * length to the new size and then re-insert the item.
 		 */
 		if (found_start < start) {
-			u64 diff = start - found_start;
+			u64 diff_start = start - found_start;
 
 			btrfs_partially_delete_raid_extent(trans, path, &key,
-							   diff, 0);
-			break;
+							   diff_start, 0);
+
+			start += (key.offset - diff_start);
+			length -= (key.offset - diff_start);
+			if (length == 0)
+				break;
+
+			btrfs_release_path(path);
+			continue;
 		}
 
 		/*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/14] btrfs: fix deletion of a range spanning parts two RAID stripe extents
  2024-12-12  7:55 [PATCH 00/14] btrfs: more RST delete fixes Johannes Thumshirn
                   ` (4 preceding siblings ...)
  2024-12-12  7:55 ` [PATCH 05/14] btrfs: fix tail delete of RAID stripe-extents Johannes Thumshirn
@ 2024-12-12  7:55 ` Johannes Thumshirn
  2024-12-17 15:58   ` Filipe Manana
  2024-12-12  7:55 ` [PATCH 07/14] btrfs: implement hole punching for " Johannes Thumshirn
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-12  7:55 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Josef Bacik, Damien Le Moal, David Sterba, Naohiro Aota,
	Qu Wenruo, Filipe Manana, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

When a user requests the deletion of a range that spans multiple stripe
extents and btrfs_search_slot() returns us the second RAID stripe extent,
we need to pick the previous item and truncate it, if there's still a
range to delete left, move on to the next item.

The following diagram illustrates the operation:

 |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---|
        |--- keep  ---|--- drop ---|

While at it, comment the trivial case of a whole item delete as well.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/raid-stripe-tree.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 6246ab4c1a21..ccf455b30314 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -101,6 +101,33 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 		found_end = found_start + key.offset;
 		ret = 0;
 
+		/*
+		 * The stripe extent starts before the range we want to delete,
+		 * but the range spans more than one stripe extent:
+		 *
+		 * |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---|
+		 *        |--- keep  ---|--- drop ---|
+		 *
+		 * This means we have to get the previous item, truncate its
+		 * length and then restart the search.
+		 */
+		if (found_start > start) {
+			if (slot == 0)
+				break;
+
+			ret = btrfs_previous_item(stripe_root, path, start,
+						  BTRFS_RAID_STRIPE_KEY);
+			if (ret < 0)
+				break;
+			ret = 0;
+
+			leaf = path->nodes[0];
+			slot = path->slots[0];
+			btrfs_item_key_to_cpu(leaf, &key, slot);
+			found_start = key.objectid;
+			found_end = found_start + key.offset;
+		}
+
 		if (key.type != BTRFS_RAID_STRIPE_KEY)
 			break;
 
@@ -155,6 +182,9 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 			break;
 		}
 
+		/*
+		 * Finally we can delete the whole item, no more special cases.
+		 */
 		ret = btrfs_del_item(trans, stripe_root, path);
 		if (ret)
 			break;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/14] btrfs: implement hole punching for RAID stripe extents
  2024-12-12  7:55 [PATCH 00/14] btrfs: more RST delete fixes Johannes Thumshirn
                   ` (5 preceding siblings ...)
  2024-12-12  7:55 ` [PATCH 06/14] btrfs: fix deletion of a range spanning parts two RAID stripe extents Johannes Thumshirn
@ 2024-12-12  7:55 ` Johannes Thumshirn
  2024-12-17 16:05   ` Filipe Manana
  2024-12-12  7:55 ` [PATCH 08/14] btrfs: don't use btrfs_set_item_key_safe on RAID stripe-extents Johannes Thumshirn
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-12  7:55 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Josef Bacik, Damien Le Moal, David Sterba, Naohiro Aota,
	Qu Wenruo, Filipe Manana, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

If the stripe extent we want to delete starts before the range we want to
delete and ends after the range we want to delete we're punching a
hole in the stripe extent:

  |--- RAID Stripe Extent ---|
  | keep |--- drop ---| keep |

This means we need to a) truncate the existing item and b)
create a second item for the remaining range.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/ctree.c            |  1 +
 fs/btrfs/raid-stripe-tree.c | 54 +++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 693dc27ffb89..5682692b5ba5 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -3903,6 +3903,7 @@ static noinline int setup_leaf_for_split(struct btrfs_trans_handle *trans,
 	btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
 
 	BUG_ON(key.type != BTRFS_EXTENT_DATA_KEY &&
+	       key.type != BTRFS_RAID_STRIPE_KEY &&
 	       key.type != BTRFS_EXTENT_CSUM_KEY);
 
 	if (btrfs_leaf_free_space(leaf) >= ins_len)
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index ccf455b30314..6ec72732c4ad 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -138,6 +138,60 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 		trace_btrfs_raid_extent_delete(fs_info, start, end,
 					       found_start, found_end);
 
+		/*
+		 * The stripe extent starts before the range we want to delete
+		 * and ends after the range we want to delete, i.e. we're
+		 * punching a hole in the stripe extent:
+		 *
+		 *  |--- RAID Stripe Extent ---|
+		 *  | keep |--- drop ---| keep |
+		 *
+		 * This means we need to a) truncate the existing item and b)
+		 * create a second item for the remaining range.
+		 */
+		if (found_start < start && found_end > end) {
+			size_t item_size;
+			u64 diff_start = start - found_start;
+			u64 diff_end = found_end - end;
+			struct btrfs_stripe_extent *extent;
+			struct btrfs_key newkey = {
+				.objectid = end,
+				.type = BTRFS_RAID_STRIPE_KEY,
+				.offset = diff_end,
+			};
+
+			/* "right" item */
+			ret = btrfs_duplicate_item(trans, stripe_root, path,
+						   &newkey);
+			if (ret)
+				break;
+
+			item_size = btrfs_item_size(leaf, path->slots[0]);
+			extent = btrfs_item_ptr(leaf, path->slots[0],
+						struct btrfs_stripe_extent);
+
+			for (int i = 0; i < btrfs_num_raid_stripes(item_size);
+			     i++) {
+				struct btrfs_raid_stride *stride =
+					&extent->strides[i];
+				u64 phys;
+
+				phys = btrfs_raid_stride_physical(leaf, stride);
+				phys += diff_start + length;
+				btrfs_set_raid_stride_physical(leaf, stride,
+							       phys);
+			}
+			btrfs_mark_buffer_dirty(trans, leaf);
+
+			/* "left" item */
+			path->slots[0]--;
+			btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
+			btrfs_partially_delete_raid_extent(trans, path, &key,
+							   diff_start, 0);
+			length = 0;
+			break;
+		}
+
 		/*
 		 * The stripe extent starts before the range we want to delete:
 		 *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/14] btrfs: don't use btrfs_set_item_key_safe on RAID stripe-extents
  2024-12-12  7:55 [PATCH 00/14] btrfs: more RST delete fixes Johannes Thumshirn
                   ` (6 preceding siblings ...)
  2024-12-12  7:55 ` [PATCH 07/14] btrfs: implement hole punching for " Johannes Thumshirn
@ 2024-12-12  7:55 ` Johannes Thumshirn
  2024-12-17 16:23   ` Filipe Manana
  2024-12-12  7:55 ` [PATCH 09/14] btrfs: selftests: check for correct return value of failed lookup Johannes Thumshirn
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-12  7:55 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Josef Bacik, Damien Le Moal, David Sterba, Naohiro Aota,
	Qu Wenruo, Filipe Manana, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Don't use btrfs_set_item_key_safe() to modify the keys in the RAID
stripe-tree as this can lead to corruption of the tree, which is caught by
the checks in btrfs_set_item_key_safe():

 BTRFS critical (device nvme1n1): slot 201 key (5679448064 230 32768) new key (5680439296 230 1028096)
 ------------[ cut here ]------------
 kernel BUG at fs/btrfs/ctree.c:2672!
 Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
 CPU: 1 UID: 0 PID: 1055 Comm: fsstress Not tainted 6.13.0-rc1+ #1464
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
 RIP: 0010:btrfs_set_item_key_safe+0xf7/0x270
 Code: <snip>
 RSP: 0018:ffffc90001337ab0 EFLAGS: 00010287
 RAX: 0000000000000000 RBX: ffff8881115fd000 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000ffffffff
 RBP: ffff888110ed6f50 R08: 00000000ffffefff R09: ffffffff8244c500
 R10: 00000000ffffefff R11: 00000000ffffffff R12: ffff888100586000
 R13: 00000000000000c9 R14: ffffc90001337b1f R15: ffff888110f23b58
 FS:  00007f7d75c72740(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fa811652c60 CR3: 0000000111398001 CR4: 0000000000370eb0
 Call Trace:
  <TASK>
  ? __die_body.cold+0x14/0x1a
  ? die+0x2e/0x50
  ? do_trap+0xca/0x110
  ? do_error_trap+0x65/0x80
  ? btrfs_set_item_key_safe+0xf7/0x270
  ? exc_invalid_op+0x50/0x70
  ? btrfs_set_item_key_safe+0xf7/0x270
  ? asm_exc_invalid_op+0x1a/0x20
  ? btrfs_set_item_key_safe+0xf7/0x270
  btrfs_partially_delete_raid_extent+0xc4/0xe0
  btrfs_delete_raid_extent+0x227/0x240
  __btrfs_free_extent.isra.0+0x57f/0x9c0
  ? exc_coproc_segment_overrun+0x40/0x40
  __btrfs_run_delayed_refs+0x2fa/0xe80
  btrfs_run_delayed_refs+0x81/0xe0
  btrfs_commit_transaction+0x2dd/0xbe0
  ? preempt_count_add+0x52/0xb0
  btrfs_sync_file+0x375/0x4c0
  do_fsync+0x39/0x70
  __x64_sys_fsync+0x13/0x20
  do_syscall_64+0x54/0x110
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7f7d7550ef90
 Code: <snip>
 RSP: 002b:00007ffd70237248 EFLAGS: 00000202 ORIG_RAX: 000000000000004a
 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f7d7550ef90
 RDX: 000000000000013a RSI: 000000000040eb28 RDI: 0000000000000004
 RBP: 000000000000001b R08: 0000000000000078 R09: 00007ffd7023725c
 R10: 00007f7d75400390 R11: 0000000000000202 R12: 028f5c28f5c28f5c
 R13: 8f5c28f5c28f5c29 R14: 000000000040b520 R15: 00007f7d75c726c8
  </TASK>

Instead copy the item, adjust the key and per-device physical addresses
and re-insert it into the tree.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/raid-stripe-tree.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 6ec72732c4ad..f713d8417681 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -13,12 +13,13 @@
 #include "volumes.h"
 #include "print-tree.h"
 
-static void btrfs_partially_delete_raid_extent(struct btrfs_trans_handle *trans,
+static int btrfs_partially_delete_raid_extent(struct btrfs_trans_handle *trans,
 					       struct btrfs_path *path,
 					       const struct btrfs_key *oldkey,
 					       u64 newlen, u64 frontpad)
 {
-	struct btrfs_stripe_extent *extent;
+	struct btrfs_root *stripe_root = trans->fs_info->stripe_root;
+	struct btrfs_stripe_extent *extent, *new;
 	struct extent_buffer *leaf;
 	int slot;
 	size_t item_size;
@@ -27,6 +28,7 @@ static void btrfs_partially_delete_raid_extent(struct btrfs_trans_handle *trans,
 		.type = BTRFS_RAID_STRIPE_KEY,
 		.offset = newlen,
 	};
+	int ret;
 
 	ASSERT(newlen > 0);
 	ASSERT(oldkey->type == BTRFS_RAID_STRIPE_KEY);
@@ -34,17 +36,31 @@ static void btrfs_partially_delete_raid_extent(struct btrfs_trans_handle *trans,
 	leaf = path->nodes[0];
 	slot = path->slots[0];
 	item_size = btrfs_item_size(leaf, slot);
+
+	new = kzalloc(item_size, GFP_NOFS);
+	if (!new)
+		return -ENOMEM;
+
 	extent = btrfs_item_ptr(leaf, slot, struct btrfs_stripe_extent);
 
 	for (int i = 0; i < btrfs_num_raid_stripes(item_size); i++) {
 		struct btrfs_raid_stride *stride = &extent->strides[i];
 		u64 phys;
 
-		phys = btrfs_raid_stride_physical(leaf, stride);
-		btrfs_set_raid_stride_physical(leaf, stride, phys + frontpad);
+		phys = btrfs_raid_stride_physical(leaf, stride) + frontpad;
+		btrfs_set_stack_raid_stride_physical(&new->strides[i], phys);
 	}
 
-	btrfs_set_item_key_safe(trans, path, &newkey);
+	ret = btrfs_del_item(trans, stripe_root, path);
+	if (ret)
+		goto out;
+
+	btrfs_release_path(path);
+	ret = btrfs_insert_item(trans, stripe_root, &newkey, new, item_size);
+
+out:
+	kfree(new);
+	return ret;
 }
 
 int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 length)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/14] btrfs: selftests: check for correct return value of failed lookup
  2024-12-12  7:55 [PATCH 00/14] btrfs: more RST delete fixes Johannes Thumshirn
                   ` (7 preceding siblings ...)
  2024-12-12  7:55 ` [PATCH 08/14] btrfs: don't use btrfs_set_item_key_safe on RAID stripe-extents Johannes Thumshirn
@ 2024-12-12  7:55 ` Johannes Thumshirn
  2024-12-17 16:24   ` Filipe Manana
  2024-12-12  7:55 ` [PATCH 10/14] btrfs: selftests: don't split RAID extents in half Johannes Thumshirn
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-12  7:55 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Josef Bacik, Damien Le Moal, David Sterba, Naohiro Aota,
	Qu Wenruo, Filipe Manana, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Commit 5e72aabc1fff ("btrfs: return ENODATA in case RST lookup fails")
changed btrfs_get_raid_extent_offset()'s return value to ENODATA in case
the RAID stripe-tree lookup failed.

Adjust the test cases which check for absence of a given range to check
for ENODATA as return value in this case.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/tests/raid-stripe-tree-tests.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/tests/raid-stripe-tree-tests.c b/fs/btrfs/tests/raid-stripe-tree-tests.c
index f060c04c7f76..19f6147a38a5 100644
--- a/fs/btrfs/tests/raid-stripe-tree-tests.c
+++ b/fs/btrfs/tests/raid-stripe-tree-tests.c
@@ -125,7 +125,7 @@ static int test_front_delete(struct btrfs_trans_handle *trans)
 	}
 
 	ret = btrfs_get_raid_extent_offset(fs_info, logical, &len, map_type, 0, &io_stripe);
-	if (!ret) {
+	if (ret != -ENODATA) {
 		ret = -EINVAL;
 		test_err("lookup of RAID extent [%llu, %llu] succeeded, should fail",
 			 logical, logical + SZ_32K);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/14] btrfs: selftests: don't split RAID extents in half
  2024-12-12  7:55 [PATCH 00/14] btrfs: more RST delete fixes Johannes Thumshirn
                   ` (8 preceding siblings ...)
  2024-12-12  7:55 ` [PATCH 09/14] btrfs: selftests: check for correct return value of failed lookup Johannes Thumshirn
@ 2024-12-12  7:55 ` Johannes Thumshirn
  2024-12-17 16:31   ` Filipe Manana
  2024-12-12  7:55 ` [PATCH 11/14] btrfs: selftests: test RAID stripe-tree deletion spanning two items Johannes Thumshirn
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-12  7:55 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Josef Bacik, Damien Le Moal, David Sterba, Naohiro Aota,
	Qu Wenruo, Filipe Manana, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

The selftests for partially deleting the start or tail of RAID
stripe-extents split these extents in half.

This can hide errors in the calculation, so don't split the RAID
stripe-extents in half but delete the first or last 16K of the 64K
extents.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/tests/raid-stripe-tree-tests.c | 44 ++++++++++++++++---------
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/tests/raid-stripe-tree-tests.c b/fs/btrfs/tests/raid-stripe-tree-tests.c
index 19f6147a38a5..12f3dbb23a64 100644
--- a/fs/btrfs/tests/raid-stripe-tree-tests.c
+++ b/fs/btrfs/tests/raid-stripe-tree-tests.c
@@ -14,6 +14,8 @@
 #define RST_TEST_NUM_DEVICES	(2)
 #define RST_TEST_RAID1_TYPE	(BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_RAID1)
 
+#define SZ_48K (SZ_32K + SZ_16K)
+
 typedef int (*test_func_t)(struct btrfs_trans_handle *trans);
 
 static struct btrfs_device *btrfs_device_by_devid(struct btrfs_fs_devices *fs_devices,
@@ -94,32 +96,32 @@ static int test_front_delete(struct btrfs_trans_handle *trans)
 		goto out;
 	}
 
-	ret = btrfs_delete_raid_extent(trans, logical, SZ_32K);
+	ret = btrfs_delete_raid_extent(trans, logical, SZ_16K);
 	if (ret) {
 		test_err("deleting RAID extent [%llu, %llu] failed", logical,
-			 logical + SZ_32K);
+			 logical + SZ_16K);
 		goto out;
 	}
 
-	len = SZ_32K;
-	ret = btrfs_get_raid_extent_offset(fs_info, logical + SZ_32K, &len,
+	len -= SZ_16K;
+	ret = btrfs_get_raid_extent_offset(fs_info, logical + SZ_16K, &len,
 					   map_type, 0, &io_stripe);
 	if (ret) {
 		test_err("lookup of RAID extent [%llu, %llu] failed",
-			 logical + SZ_32K, logical + SZ_32K + len);
+			 logical + SZ_16K, logical + SZ_64K);
 		goto out;
 	}
 
-	if (io_stripe.physical != logical + SZ_32K) {
+	if (io_stripe.physical != logical + SZ_16K) {
 		test_err("invalid physical address, expected %llu, got %llu",
-			 logical + SZ_32K, io_stripe.physical);
+			 logical + SZ_16K, io_stripe.physical);
 		ret = -EINVAL;
 		goto out;
 	}
 
-	if (len != SZ_32K) {
+	if (len != SZ_48K) {
 		test_err("invalid stripe length, expected %llu, got %llu",
-			 (u64)SZ_32K, len);
+			 (u64)SZ_48K, len);
 		ret = -EINVAL;
 		goto out;
 	}
@@ -128,11 +130,11 @@ static int test_front_delete(struct btrfs_trans_handle *trans)
 	if (ret != -ENODATA) {
 		ret = -EINVAL;
 		test_err("lookup of RAID extent [%llu, %llu] succeeded, should fail",
-			 logical, logical + SZ_32K);
+			 logical, logical + SZ_16K);
 		goto out;
 	}
 
-	ret = btrfs_delete_raid_extent(trans, logical + SZ_32K, SZ_32K);
+	ret = btrfs_delete_raid_extent(trans, logical + SZ_16K, SZ_48K);
 out:
 	btrfs_put_bioc(bioc);
 	return ret;
@@ -209,14 +211,14 @@ static int test_tail_delete(struct btrfs_trans_handle *trans)
 		goto out;
 	}
 
-	ret = btrfs_delete_raid_extent(trans, logical + SZ_32K, SZ_32K);
+	ret = btrfs_delete_raid_extent(trans, logical + SZ_48K, SZ_16K);
 	if (ret) {
 		test_err("deleting RAID extent [%llu, %llu] failed",
-			 logical + SZ_32K, logical + SZ_64K);
+			 logical + SZ_48K, logical + SZ_64K);
 		goto out;
 	}
 
-	len = SZ_32K;
+	len = SZ_48K;
 	ret = btrfs_get_raid_extent_offset(fs_info, logical, &len, map_type, 0, &io_stripe);
 	if (ret) {
 		test_err("lookup of RAID extent [%llu, %llu] failed", logical,
@@ -231,9 +233,19 @@ static int test_tail_delete(struct btrfs_trans_handle *trans)
 		goto out;
 	}
 
-	if (len != SZ_32K) {
+	if (len != SZ_48K) {
 		test_err("invalid stripe length, expected %llu, got %llu",
-			 (u64)SZ_32K, len);
+			 (u64)SZ_48K, len);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	len = SZ_16K;
+	ret = btrfs_get_raid_extent_offset(fs_info, logical + SZ_48K, &len,
+					   map_type, 0, &io_stripe);
+	if (ret != -ENODATA) {
+		test_err("lookup of RAID extent [%llu, %llu] succeeded should fail",
+			 logical + SZ_48K, logical + SZ_64K);
 		ret = -EINVAL;
 		goto out;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/14] btrfs: selftests: test RAID stripe-tree deletion spanning two items
  2024-12-12  7:55 [PATCH 00/14] btrfs: more RST delete fixes Johannes Thumshirn
                   ` (9 preceding siblings ...)
  2024-12-12  7:55 ` [PATCH 10/14] btrfs: selftests: don't split RAID extents in half Johannes Thumshirn
@ 2024-12-12  7:55 ` Johannes Thumshirn
  2024-12-17 16:43   ` Filipe Manana
  2024-12-12  7:55 ` [PATCH 12/14] btrfs: selftests: add selftest for punching holes into the RAID stripe extents Johannes Thumshirn
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-12  7:55 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Josef Bacik, Damien Le Moal, David Sterba, Naohiro Aota,
	Qu Wenruo, Filipe Manana, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Add a selftest for RAID stripe-tree deletion with a delete range spanning
two items, so that we're punching a hole into two adjacent RAID stripe
extents truncating the first and "moving" the second to the right.

The following diagram illustrates the operation:

 |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---|
 |-----  keep  -----|--- drop ---|-----  keep  ----|

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/tests/raid-stripe-tree-tests.c | 144 ++++++++++++++++++++++++
 1 file changed, 144 insertions(+)

diff --git a/fs/btrfs/tests/raid-stripe-tree-tests.c b/fs/btrfs/tests/raid-stripe-tree-tests.c
index 12f3dbb23a64..a815fc5c4dd3 100644
--- a/fs/btrfs/tests/raid-stripe-tree-tests.c
+++ b/fs/btrfs/tests/raid-stripe-tree-tests.c
@@ -31,6 +31,149 @@ static struct btrfs_device *btrfs_device_by_devid(struct btrfs_fs_devices *fs_de
 	return NULL;
 }
 
+/*
+ * Test a 1M RST write that spans two adjecent RST items on disk and then
+ * delete a portion starting in the first item and spanning into the second
+ * item. This is similar to test_front_delete(), but spanning multiple items.
+ */
+static int test_front_delete_prev_item(struct btrfs_trans_handle *trans)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_io_context *bioc;
+	struct btrfs_io_stripe io_stripe = { 0 };
+	u64 map_type = RST_TEST_RAID1_TYPE;
+	u64 logical1 = SZ_1M;
+	u64 logical2 = SZ_2M;
+	u64 len = SZ_1M;
+	int ret;
+
+	bioc = alloc_btrfs_io_context(fs_info, logical1, RST_TEST_NUM_DEVICES);
+	if (!bioc) {
+		test_std_err(TEST_ALLOC_IO_CONTEXT);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	io_stripe.dev = btrfs_device_by_devid(fs_info->fs_devices, 0);
+	bioc->map_type = map_type;
+	bioc->size = len;
+
+	/* insert RAID extent 1. */
+	for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
+		struct btrfs_io_stripe *stripe = &bioc->stripes[i];
+
+		stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
+		if (!stripe->dev) {
+			test_err("cannot find device with devid %d", i);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		stripe->physical = logical1 + i * SZ_1G;
+	}
+
+	ret = btrfs_insert_one_raid_extent(trans, bioc);
+	if (ret) {
+		test_err("inserting RAID extent failed: %d", ret);
+		goto out;
+	}
+
+	bioc->logical = logical2;
+	/* Insert RAID extent 2, directly adjacent to it. */
+	for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
+		struct btrfs_io_stripe *stripe = &bioc->stripes[i];
+
+		stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
+		if (!stripe->dev) {
+			test_err("cannot find device with devid %d", i);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		stripe->physical = logical2 + i * SZ_1G;
+	}
+
+	ret = btrfs_insert_one_raid_extent(trans, bioc);
+	if (ret) {
+		test_err("inserting RAID extent failed: %d", ret);
+		goto out;
+	}
+
+	ret = btrfs_delete_raid_extent(trans, logical1 + SZ_512K, SZ_1M);
+	if (ret) {
+		test_err("deleting RAID extent [%llu, %llu] failed",
+			 logical1 + SZ_512K, (u64)SZ_1M);
+		goto out;
+	}
+
+	/* Verify item 1 is truncated to 512K. */
+	ret = btrfs_get_raid_extent_offset(fs_info, logical1, &len, map_type, 0,
+					   &io_stripe);
+	if (ret) {
+		test_err("lookup of RAID extent [%llu, %llu] failed", logical1,
+			 logical1 + len);
+		goto out;
+	}
+
+	if (io_stripe.physical != logical1) {
+		test_err("invalid physical address, expected %llu got %llu",
+			 logical1, io_stripe.physical);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (len != SZ_512K) {
+		test_err("invalid stripe length, expected %llu got %llu",
+			 (u64)SZ_512K, len);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* Verify item 2's start is moved by 512K. */
+	ret = btrfs_get_raid_extent_offset(fs_info, logical2 + SZ_512K, &len,
+					   map_type, 0, &io_stripe);
+	if (ret) {
+		test_err("lookup of RAID extent [%llu, %llu] failed",
+			 logical2 + SZ_512K, logical2 + len);
+		goto out;
+	}
+
+	if (io_stripe.physical != logical2 + SZ_512K) {
+		test_err("invalid physical address, expected %llu got %llu",
+			 logical2 + SZ_512K, io_stripe.physical);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (len != SZ_512K) {
+		test_err("invalid stripe length, expected %llu got %llu",
+			 (u64)SZ_512K, len);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* Verify there's a hole at [1M+512K, 2M+512K] */
+	len = SZ_1M;
+	ret = btrfs_get_raid_extent_offset(fs_info, logical1 + SZ_512K, &len,
+					   map_type, 0, &io_stripe);
+	if (ret != -ENODATA) {
+		test_err("lookup of RAID [%llu, %llu] succeeded, should fail",
+			 logical1 + SZ_512K, logical1 + SZ_512K + len);
+		goto out;
+	}
+
+	/* Clean up after us. */
+	ret = btrfs_delete_raid_extent(trans, logical1, SZ_512K);
+	if (ret)
+		goto out;
+
+	ret = btrfs_delete_raid_extent(trans, logical2 + SZ_512K, SZ_512K);
+
+out:
+	btrfs_put_bioc(bioc);
+	return ret;
+}
+
 /*
  * Test a 64K RST write on a 2 disk RAID1 at a logical address of 1M and then
  * delete the 1st 32K, making the new start address 1M+32K.
@@ -468,6 +611,7 @@ static const test_func_t tests[] = {
 	test_create_update_delete,
 	test_tail_delete,
 	test_front_delete,
+	test_front_delete_prev_item,
 };
 
 static int run_test(test_func_t test, u32 sectorsize, u32 nodesize)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 12/14] btrfs: selftests: add selftest for punching holes into the RAID stripe extents
  2024-12-12  7:55 [PATCH 00/14] btrfs: more RST delete fixes Johannes Thumshirn
                   ` (10 preceding siblings ...)
  2024-12-12  7:55 ` [PATCH 11/14] btrfs: selftests: test RAID stripe-tree deletion spanning two items Johannes Thumshirn
@ 2024-12-12  7:55 ` Johannes Thumshirn
  2024-12-17 16:50   ` Filipe Manana
  2024-12-12  7:55 ` [PATCH 13/14] btrfs: selftests: add test for punching a hole into 3 RAID stripe-extents Johannes Thumshirn
  2024-12-12  7:55 ` [PATCH 14/14] btrfs: selftests: add a selftest for deleting two out of three extents Johannes Thumshirn
  13 siblings, 1 reply; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-12  7:55 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Josef Bacik, Damien Le Moal, David Sterba, Naohiro Aota,
	Qu Wenruo, Filipe Manana, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Add a selftest for punching a hole into a RAID stripe extent. The test
create an 1M extent and punches a 64k bytes long hole at offset of 32k from
the start of the extent.

Afterwards it verifies the start and length of both resulting new extents
"left" and "right" as well as the absence of the hole.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/tests/raid-stripe-tree-tests.c | 133 ++++++++++++++++++++++++
 1 file changed, 133 insertions(+)

diff --git a/fs/btrfs/tests/raid-stripe-tree-tests.c b/fs/btrfs/tests/raid-stripe-tree-tests.c
index a815fc5c4dd3..37b87ccb5858 100644
--- a/fs/btrfs/tests/raid-stripe-tree-tests.c
+++ b/fs/btrfs/tests/raid-stripe-tree-tests.c
@@ -31,6 +31,138 @@ static struct btrfs_device *btrfs_device_by_devid(struct btrfs_fs_devices *fs_de
 	return NULL;
 }
 
+/* Test punching a hole into a single RAID stripe-extent. */
+static int test_punch_hole(struct btrfs_trans_handle *trans)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_io_context *bioc;
+	struct btrfs_io_stripe io_stripe = { 0 };
+	u64 map_type = RST_TEST_RAID1_TYPE;
+	u64 logical1 = SZ_1M;
+	u64 hole_start = logical1 + SZ_32K;
+	u64 hole_len = SZ_64K;
+	u64 logical2 = hole_start + hole_len;
+	u64 len = SZ_1M;
+	u64 len1 = SZ_32K;
+	u64 len2 = len - len1 - hole_len;
+	int ret;
+
+	bioc = alloc_btrfs_io_context(fs_info, logical1, RST_TEST_NUM_DEVICES);
+	if (!bioc) {
+		test_std_err(TEST_ALLOC_IO_CONTEXT);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	io_stripe.dev = btrfs_device_by_devid(fs_info->fs_devices, 0);
+	bioc->map_type = map_type;
+	bioc->size = len;
+
+	for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
+		struct btrfs_io_stripe *stripe = &bioc->stripes[i];
+
+		stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
+		if (!stripe->dev) {
+			test_err("cannot find device with devid %d", i);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		stripe->physical = logical1 + i * SZ_1G;
+	}
+
+	ret = btrfs_insert_one_raid_extent(trans, bioc);
+	if (ret) {
+		test_err("inserting RAID extent failed: %d", ret);
+		goto out;
+	}
+
+	ret = btrfs_get_raid_extent_offset(fs_info, logical1, &len, map_type, 0,
+					   &io_stripe);
+	if (ret) {
+		test_err("lookup of RAID extent [%llu, %llu] failed", logical1,
+			 logical1 + len);
+		goto out;
+	}
+
+	if (io_stripe.physical != logical1) {
+		test_err("invalid physical address, expected %llu got %llu",
+			 logical1, io_stripe.physical);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (len != SZ_1M) {
+		test_err("invalid stripe length, expected %llu got %llu",
+			 (u64)SZ_1M, len);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = btrfs_delete_raid_extent(trans, hole_start, hole_len);
+	if (ret) {
+		test_err("deleting RAID extent [%llu, %llu] failed",
+			 hole_start, hole_start + hole_len);
+		goto out;
+	}
+
+	ret = btrfs_get_raid_extent_offset(fs_info, logical1, &len1, map_type,
+					   0, &io_stripe);
+	if (ret) {
+		test_err("lookup of RAID extent [%llu, %llu] failed",
+			 logical1, logical1 + len1);
+		goto out;
+	}
+
+	if (io_stripe.physical != logical1) {
+		test_err("invalid physical address, expected %llu, got %llu",
+			 logical1, io_stripe.physical);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (len1 != SZ_32K) {
+		test_err("invalid stripe length, expected %llu, got %llu",
+			 (u64)SZ_32K, len1);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = btrfs_get_raid_extent_offset(fs_info, logical2, &len2, map_type,
+					   0, &io_stripe);
+	if (ret) {
+		test_err("lookup of RAID extent [%llu, %llu] failed", logical2,
+			 logical2 + len2);
+		goto out;
+	}
+
+	if (io_stripe.physical != logical2) {
+		test_err("invalid physical address, expected %llu, got %llu",
+			 logical2, io_stripe.physical);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* Check for the absence of the hole. */
+	ret = btrfs_get_raid_extent_offset(fs_info, hole_start, &hole_len,
+					   map_type, 0, &io_stripe);
+	if (ret != -ENODATA) {
+		ret = -EINVAL;
+		test_err("lookup of RAID extent [%llu, %llu] succeeded, should fail",
+			 hole_start, hole_start + SZ_64K);
+		goto out;
+	}
+
+	ret = btrfs_delete_raid_extent(trans, logical1, len1);
+	if (ret)
+		goto out;
+
+	ret = btrfs_delete_raid_extent(trans, logical2, len2);
+out:
+	btrfs_put_bioc(bioc);
+	return ret;
+}
+
 /*
  * Test a 1M RST write that spans two adjecent RST items on disk and then
  * delete a portion starting in the first item and spanning into the second
@@ -612,6 +744,7 @@ static const test_func_t tests[] = {
 	test_tail_delete,
 	test_front_delete,
 	test_front_delete_prev_item,
+	test_punch_hole,
 };
 
 static int run_test(test_func_t test, u32 sectorsize, u32 nodesize)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 13/14] btrfs: selftests: add test for punching a hole into 3 RAID stripe-extents
  2024-12-12  7:55 [PATCH 00/14] btrfs: more RST delete fixes Johannes Thumshirn
                   ` (11 preceding siblings ...)
  2024-12-12  7:55 ` [PATCH 12/14] btrfs: selftests: add selftest for punching holes into the RAID stripe extents Johannes Thumshirn
@ 2024-12-12  7:55 ` Johannes Thumshirn
  2024-12-17 16:55   ` Filipe Manana
  2024-12-12  7:55 ` [PATCH 14/14] btrfs: selftests: add a selftest for deleting two out of three extents Johannes Thumshirn
  13 siblings, 1 reply; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-12  7:55 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Josef Bacik, Damien Le Moal, David Sterba, Naohiro Aota,
	Qu Wenruo, Filipe Manana, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/tests/raid-stripe-tree-tests.c | 183 ++++++++++++++++++++++++
 1 file changed, 183 insertions(+)

diff --git a/fs/btrfs/tests/raid-stripe-tree-tests.c b/fs/btrfs/tests/raid-stripe-tree-tests.c
index 37b87ccb5858..8d74e95a8a75 100644
--- a/fs/btrfs/tests/raid-stripe-tree-tests.c
+++ b/fs/btrfs/tests/raid-stripe-tree-tests.c
@@ -31,6 +31,188 @@ static struct btrfs_device *btrfs_device_by_devid(struct btrfs_fs_devices *fs_de
 	return NULL;
 }
 
+/*
+ * Test creating a range of three extents and then punch a hole in the middle,
+ * deleting all of the middle extents and partially deleting the "book ends"
+ */
+static int test_punch_hole_3extents(struct btrfs_trans_handle *trans)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_io_context *bioc;
+	struct btrfs_io_stripe io_stripe = { 0 };
+	u64 map_type = RST_TEST_RAID1_TYPE;
+	u64 logical1 = SZ_1M;
+	u64 len1 = SZ_1M;
+	u64 logical2 = logical1 + len1;
+	u64 len2 = SZ_1M;
+	u64 logical3 = logical2 + len2;
+	u64 len3 = SZ_1M;
+	u64 hole_start = logical1 + SZ_256K;
+	u64 hole_len = SZ_2M;
+	int ret;
+
+	bioc = alloc_btrfs_io_context(fs_info, logical1, RST_TEST_NUM_DEVICES);
+	if (!bioc) {
+		test_std_err(TEST_ALLOC_IO_CONTEXT);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	io_stripe.dev = btrfs_device_by_devid(fs_info->fs_devices, 0);
+
+	/* prepare for the test, 1st create 3 x 1M extents */
+	bioc->map_type = map_type;
+	bioc->size = len1;
+
+	for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
+		struct btrfs_io_stripe *stripe = &bioc->stripes[i];
+
+		stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
+		if (!stripe->dev) {
+			test_err("cannot find device with devid %d", i);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		stripe->physical = logical1 + i * SZ_1G;
+	}
+
+	ret = btrfs_insert_one_raid_extent(trans, bioc);
+	if (ret) {
+		test_err("inserting RAID extent failed: %d", ret);
+		goto out;
+	}
+
+	bioc->logical = logical2;
+	bioc->size = len2;
+	for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
+		struct btrfs_io_stripe *stripe = &bioc->stripes[i];
+
+		stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
+		if (!stripe->dev) {
+			test_err("cannot find device with devid %d", i);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		stripe->physical = logical2 + i * SZ_1G;
+	}
+
+	ret = btrfs_insert_one_raid_extent(trans, bioc);
+	if (ret) {
+		test_err("inserting RAID extent failed: %d", ret);
+		goto out;
+	}
+
+	bioc->logical = logical3;
+	bioc->size = len3;
+	for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
+		struct btrfs_io_stripe *stripe = &bioc->stripes[i];
+
+		stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
+		if (!stripe->dev) {
+			test_err("cannot find device with devid %d", i);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		stripe->physical = logical3 + i * SZ_1G;
+	}
+
+	ret = btrfs_insert_one_raid_extent(trans, bioc);
+	if (ret) {
+		test_err("inserting RAID extent failed: %d", ret);
+		goto out;
+	}
+
+	/*
+	 * Delete a range starting at logical1 + 256K and 2M in length. Extent
+	 * 1 is truncated to 256k length, extent 2 is completely dropped and
+	 * extent 3 is moved 256K to the right.
+	 */
+	ret = btrfs_delete_raid_extent(trans, hole_start, hole_len);
+	if (ret) {
+		test_err("deleting RAID extent [%llu, %llu] failed",
+			 hole_start, hole_start + hole_len);
+		goto out;
+	}
+
+	/* get the first extent and check its size */
+	ret = btrfs_get_raid_extent_offset(fs_info, logical1, &len1, map_type,
+					   0, &io_stripe);
+	if (ret) {
+		test_err("lookup of RAID extent [%llu, %llu] failed",
+			 logical1, logical1 + len1);
+		goto out;
+	}
+
+	if (io_stripe.physical != logical1) {
+		test_err("invalid physical address, expected %llu, got %llu",
+			 logical1, io_stripe.physical);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (len1 != SZ_256K) {
+		test_err("invalid stripe length, expected %llu, got %llu",
+			 (u64)SZ_256K, len1);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* get the second extent and check it's absent */
+	ret = btrfs_get_raid_extent_offset(fs_info, logical2, &len2, map_type,
+					   0, &io_stripe);
+	if (ret != -ENODATA) {
+		test_err("lookup of RAID extent [%llu, %llu] succeeded should fail",
+			 logical2, logical2 + len2);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* get the third extent and check its size */
+	logical3 += SZ_256K;
+	ret = btrfs_get_raid_extent_offset(fs_info, logical3, &len3, map_type,
+					   0, &io_stripe);
+	if (ret) {
+		test_err("lookup of RAID extent [%llu, %llu] failed",
+			 logical3, logical3 + len3);
+		goto out;
+	}
+
+	if (io_stripe.physical != logical3) {
+		test_err("invalid physical address, expected %llu, got %llu",
+			 logical3 + SZ_256K, io_stripe.physical);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (len3 != SZ_1M - SZ_256K) {
+		test_err("invalid stripe length, expected %llu, got %llu",
+			 (u64)SZ_1M - SZ_256K, len3);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = btrfs_delete_raid_extent(trans, logical1, len1);
+	if (ret) {
+		test_err("deleting RAID extent [%llu, %llu] failed",
+			 logical1, logical1 + len1);
+		goto out;
+	}
+
+	ret = btrfs_delete_raid_extent(trans, logical3, len3);
+	if (ret) {
+		test_err("deleting RAID extent [%llu, %llu] failed",
+			 logical1, logical1 + len1);
+		goto out;
+	}
+
+out:
+	btrfs_put_bioc(bioc);
+	return ret;
+}
+
 /* Test punching a hole into a single RAID stripe-extent. */
 static int test_punch_hole(struct btrfs_trans_handle *trans)
 {
@@ -745,6 +927,7 @@ static const test_func_t tests[] = {
 	test_front_delete,
 	test_front_delete_prev_item,
 	test_punch_hole,
+	test_punch_hole_3extents,
 };
 
 static int run_test(test_func_t test, u32 sectorsize, u32 nodesize)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 14/14] btrfs: selftests: add a selftest for deleting two out of three extents
  2024-12-12  7:55 [PATCH 00/14] btrfs: more RST delete fixes Johannes Thumshirn
                   ` (12 preceding siblings ...)
  2024-12-12  7:55 ` [PATCH 13/14] btrfs: selftests: add test for punching a hole into 3 RAID stripe-extents Johannes Thumshirn
@ 2024-12-12  7:55 ` Johannes Thumshirn
  2024-12-17 16:59   ` Filipe Manana
  13 siblings, 1 reply; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-12  7:55 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Josef Bacik, Damien Le Moal, David Sterba, Naohiro Aota,
	Qu Wenruo, Filipe Manana, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Add a selftest creating three extents and then deleting two out of the
three extents.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/tests/raid-stripe-tree-tests.c | 144 ++++++++++++++++++++++++
 1 file changed, 144 insertions(+)

diff --git a/fs/btrfs/tests/raid-stripe-tree-tests.c b/fs/btrfs/tests/raid-stripe-tree-tests.c
index 8d74e95a8a75..29c45f75941f 100644
--- a/fs/btrfs/tests/raid-stripe-tree-tests.c
+++ b/fs/btrfs/tests/raid-stripe-tree-tests.c
@@ -213,6 +213,149 @@ static int test_punch_hole_3extents(struct btrfs_trans_handle *trans)
 	return ret;
 }
 
+static int test_delete_two_extents(struct btrfs_trans_handle *trans)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_io_context *bioc;
+	struct btrfs_io_stripe io_stripe = { 0 };
+	u64 map_type = RST_TEST_RAID1_TYPE;
+	u64 logical1 = SZ_1M;
+	u64 len1 = SZ_1M;
+	u64 logical2 = logical1 + len1;
+	u64 len2 = SZ_1M;
+	u64 logical3 = logical2 + len2;
+	u64 len3 = SZ_1M;
+	int ret;
+
+	bioc = alloc_btrfs_io_context(fs_info, logical1, RST_TEST_NUM_DEVICES);
+	if (!bioc) {
+		test_std_err(TEST_ALLOC_IO_CONTEXT);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	io_stripe.dev = btrfs_device_by_devid(fs_info->fs_devices, 0);
+
+	/* prepare for the test, 1st create 3 x 1M extents */
+	bioc->map_type = map_type;
+	bioc->size = len1;
+
+	for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
+		struct btrfs_io_stripe *stripe = &bioc->stripes[i];
+
+		stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
+		if (!stripe->dev) {
+			test_err("cannot find device with devid %d", i);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		stripe->physical = logical1 + i * SZ_1G;
+	}
+
+	ret = btrfs_insert_one_raid_extent(trans, bioc);
+	if (ret) {
+		test_err("inserting RAID extent failed: %d", ret);
+		goto out;
+	}
+
+	bioc->logical = logical2;
+	bioc->size = len2;
+	for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
+		struct btrfs_io_stripe *stripe = &bioc->stripes[i];
+
+		stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
+		if (!stripe->dev) {
+			test_err("cannot find device with devid %d", i);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		stripe->physical = logical2 + i * SZ_1G;
+	}
+
+	ret = btrfs_insert_one_raid_extent(trans, bioc);
+	if (ret) {
+		test_err("inserting RAID extent failed: %d", ret);
+		goto out;
+	}
+
+	bioc->logical = logical3;
+	bioc->size = len3;
+	for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
+		struct btrfs_io_stripe *stripe = &bioc->stripes[i];
+
+		stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
+		if (!stripe->dev) {
+			test_err("cannot find device with devid %d", i);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		stripe->physical = logical3 + i * SZ_1G;
+	}
+
+	ret = btrfs_insert_one_raid_extent(trans, bioc);
+	if (ret) {
+		test_err("inserting RAID extent failed: %d", ret);
+		goto out;
+	}
+
+	/*
+	 * Delete a range starting at logical1 and 2M in length. Extents 1 and 2
+	 * are is dropped and extent 3 is kept as is.
+	 */
+	ret = btrfs_delete_raid_extent(trans, logical1, len1 + len2);
+	if (ret) {
+		test_err("deleting RAID extent [%llu, %llu] failed",
+			 logical1, logical1 + len1 + len2);
+		goto out;
+	}
+
+	ret = btrfs_get_raid_extent_offset(fs_info, logical1, &len1, map_type,
+					   0, &io_stripe);
+	if (ret != -ENODATA) {
+		test_err("lookup of RAID extent [%llu, %llu] suceeded, should fail\n",
+			 logical1, len1);
+		goto out;
+	}
+
+	ret = btrfs_get_raid_extent_offset(fs_info, logical2, &len2, map_type,
+					   0, &io_stripe);
+	if (ret != -ENODATA) {
+		test_err("lookup of RAID extent [%llu, %llu] suceeded, should fail\n",
+			 logical2, len2);
+		goto out;
+	}
+
+	ret = btrfs_get_raid_extent_offset(fs_info, logical3, &len3, map_type,
+					   0, &io_stripe);
+	if (ret) {
+		test_err("lookup of RAID extent [%llu, %llu] failed\n",
+			 logical3, len3);
+		goto out;
+	}
+
+	if (io_stripe.physical != logical3) {
+		test_err("invalid physical address, expected %llu, got %llu",
+			 logical3, io_stripe.physical);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (len3 != SZ_1M) {
+		test_err("invalid stripe length, expected %llu, got %llu",
+			 (u64)SZ_1M, len3);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = btrfs_delete_raid_extent(trans, logical3, len3);
+out:
+	btrfs_put_bioc(bioc);
+	return ret;
+}
+
 /* Test punching a hole into a single RAID stripe-extent. */
 static int test_punch_hole(struct btrfs_trans_handle *trans)
 {
@@ -928,6 +1071,7 @@ static const test_func_t tests[] = {
 	test_front_delete_prev_item,
 	test_punch_hole,
 	test_punch_hole_3extents,
+	test_delete_two_extents,
 };
 
 static int run_test(test_func_t test, u32 sectorsize, u32 nodesize)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/14] btrfs: don't try to delete RAID stripe-extents if we don't need to
  2024-12-12  7:55 ` [PATCH 01/14] btrfs: don't try to delete RAID stripe-extents if we don't need to Johannes Thumshirn
@ 2024-12-17 14:40   ` Filipe Manana
  0 siblings, 0 replies; 32+ messages in thread
From: Filipe Manana @ 2024-12-17 14:40 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, Josef Bacik, Damien Le Moal, David Sterba,
	Naohiro Aota, Qu Wenruo, Filipe Manana, Johannes Thumshirn

On Thu, Dec 12, 2024 at 7:57 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Don't try to delete RAID stripe-extents if we don't need to.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/raid-stripe-tree.c             | 13 ++++++++++++-
>  fs/btrfs/tests/raid-stripe-tree-tests.c |  3 ++-
>  2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
> index 9ffc79f250fb..2c4ee5a9449a 100644
> --- a/fs/btrfs/raid-stripe-tree.c
> +++ b/fs/btrfs/raid-stripe-tree.c
> @@ -59,9 +59,20 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>         int slot;
>         int ret;
>
> -       if (!stripe_root)
> +       if (!btrfs_fs_incompat(fs_info, RAID_STRIPE_TREE) || !stripe_root)
>                 return 0;
>
> +       if (!btrfs_is_testing(fs_info)) {
> +               struct btrfs_chunk_map *map;
> +               bool use_rst;
> +
> +               map = btrfs_find_chunk_map(fs_info, start, length);

We should handle the case where we can't find an extent map (NULL).

Otherwise it looks good, thanks.

> +               use_rst = btrfs_need_stripe_tree_update(fs_info, map->type);
> +               btrfs_free_chunk_map(map);
> +               if (!use_rst)
> +                       return 0;
> +       }
> +
>         path = btrfs_alloc_path();
>         if (!path)
>                 return -ENOMEM;
> diff --git a/fs/btrfs/tests/raid-stripe-tree-tests.c b/fs/btrfs/tests/raid-stripe-tree-tests.c
> index 30f17eb7b6a8..f060c04c7f76 100644
> --- a/fs/btrfs/tests/raid-stripe-tree-tests.c
> +++ b/fs/btrfs/tests/raid-stripe-tree-tests.c
> @@ -478,8 +478,9 @@ static int run_test(test_func_t test, u32 sectorsize, u32 nodesize)
>                 ret = PTR_ERR(root);
>                 goto out;
>         }
> -       btrfs_set_super_compat_ro_flags(root->fs_info->super_copy,
> +       btrfs_set_super_incompat_flags(root->fs_info->super_copy,
>                                         BTRFS_FEATURE_INCOMPAT_RAID_STRIPE_TREE);
> +       btrfs_set_fs_incompat(root->fs_info, RAID_STRIPE_TREE);
>         root->root_key.objectid = BTRFS_RAID_STRIPE_TREE_OBJECTID;
>         root->root_key.type = BTRFS_ROOT_ITEM_KEY;
>         root->root_key.offset = 0;
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/14] btrfs: assert RAID stripe-extent length is always greater than 0
  2024-12-12  7:55 ` [PATCH 02/14] btrfs: assert RAID stripe-extent length is always greater than 0 Johannes Thumshirn
@ 2024-12-17 14:46   ` Filipe Manana
  0 siblings, 0 replies; 32+ messages in thread
From: Filipe Manana @ 2024-12-17 14:46 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, Josef Bacik, Damien Le Moal, David Sterba,
	Naohiro Aota, Qu Wenruo, Filipe Manana, Johannes Thumshirn

On Thu, Dec 12, 2024 at 7:56 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> When modifying a RAID stripe-extent, ASSERT() that the length of the new
> RAID stripe-extent is always greater than 0.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  fs/btrfs/raid-stripe-tree.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
> index 2c4ee5a9449a..d341fc2a8a4f 100644
> --- a/fs/btrfs/raid-stripe-tree.c
> +++ b/fs/btrfs/raid-stripe-tree.c
> @@ -28,6 +28,7 @@ static void btrfs_partially_delete_raid_extent(struct btrfs_trans_handle *trans,
>                 .offset = newlen,
>         };
>
> +       ASSERT(newlen > 0);
>         ASSERT(oldkey->type == BTRFS_RAID_STRIPE_KEY);
>
>         leaf = path->nodes[0];
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 03/14] btrfs: fix search when deleting a RAID stripe-extent
  2024-12-12  7:55 ` [PATCH 03/14] btrfs: fix search when deleting a RAID stripe-extent Johannes Thumshirn
@ 2024-12-17 14:56   ` Filipe Manana
  0 siblings, 0 replies; 32+ messages in thread
From: Filipe Manana @ 2024-12-17 14:56 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, Josef Bacik, Damien Le Moal, David Sterba,
	Naohiro Aota, Qu Wenruo, Filipe Manana, Johannes Thumshirn

On Thu, Dec 12, 2024 at 7:56 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> When searching for a RAID stripe-extent for deletion, use an offset of -1
> to always get the "left" slot in the btree and correctly handle the slot
> selection.

Can you elaborate?

It's not clear from this very brief change log what the problem is, or
if there's actually a problem.

Is this a bug, and what are the consequences? Are we missing stripe
extent items and causing some corruption or just unnecessary items?

Or is it just an optimization?

>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/raid-stripe-tree.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
> index d341fc2a8a4f..d6f7d7d60e76 100644
> --- a/fs/btrfs/raid-stripe-tree.c
> +++ b/fs/btrfs/raid-stripe-tree.c
> @@ -81,14 +81,18 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>         while (1) {
>                 key.objectid = start;
>                 key.type = BTRFS_RAID_STRIPE_KEY;
> -               key.offset = 0;
> +               key.offset = (u64)-1;

On a first glance, having the 0 seems to be what leaves us at what you
mention in the change log as "the most left slot" and not -1.

For example if there's an item with key (X BTRFS_RAID_STRIPE_KEY 1M)
and the input arguments are start == X and length == 1M,
doing the search with offset 0 leaves at that exact slot where the
item is, but with an offset of (u64)-1 it leaves us on the next slot.

So please provide an example leaf layout and an example range passed
to btrfs_delete_raid_extent() that results in a bug or non-optimal
behaviour, or whatever is the problem you found.

Thanks.

>
>                 ret = btrfs_search_slot(trans, stripe_root, &key, path, -1, 1);
>                 if (ret < 0)
>                         break;
>
> -               if (path->slots[0] == btrfs_header_nritems(path->nodes[0]))
> -                       path->slots[0]--;
> +               if (ret == 1) {
> +                       ret = 0;
> +                       if (path->slots[0] ==
> +                           btrfs_header_nritems(path->nodes[0]))
> +                               path->slots[0]--;
> +               }
>
>                 leaf = path->nodes[0];
>                 slot = path->slots[0];
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/14] btrfs: fix front delete range calculation for RAID stripe extents
  2024-12-12  7:55 ` [PATCH 04/14] btrfs: fix front delete range calculation for RAID stripe extents Johannes Thumshirn
@ 2024-12-17 15:02   ` Filipe Manana
  2024-12-18  9:55     ` Johannes Thumshirn
  0 siblings, 1 reply; 32+ messages in thread
From: Filipe Manana @ 2024-12-17 15:02 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, Josef Bacik, Damien Le Moal, David Sterba,
	Naohiro Aota, Qu Wenruo, Filipe Manana, Johannes Thumshirn

On Thu, Dec 12, 2024 at 8:06 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> When deleting the front of a RAID stripe-extent the delete code
> miscalculates the size on how much to pad the remaining extent part in the
> front.
>
> Fix the calculation so we're always having the sizes we expect.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/raid-stripe-tree.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
> index d6f7d7d60e76..092e24e1de32 100644
> --- a/fs/btrfs/raid-stripe-tree.c
> +++ b/fs/btrfs/raid-stripe-tree.c
> @@ -138,10 +138,13 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>                  * length to the new size and then re-insert the item.
>                  */
>                 if (found_end > end) {
> -                       u64 diff = found_end - end;
> +                       u64 diff_end = found_end - end;
>
>                         btrfs_partially_delete_raid_extent(trans, path, &key,
> -                                                          diff, diff);
> +                                                          key.offset - length,
> +                                                          length);
> +                       ASSERT(key.offset - diff_end == length);
> +                       length = 0;
>                         break;

What's the length = 0 for? We break out of the loop right after the
assignment and don't use length anymore.

Otherwise it looks good:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

>                 }
>
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/14] btrfs: fix tail delete of RAID stripe-extents
  2024-12-12  7:55 ` [PATCH 05/14] btrfs: fix tail delete of RAID stripe-extents Johannes Thumshirn
@ 2024-12-17 15:45   ` Filipe Manana
  0 siblings, 0 replies; 32+ messages in thread
From: Filipe Manana @ 2024-12-17 15:45 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, Josef Bacik, Damien Le Moal, David Sterba,
	Naohiro Aota, Qu Wenruo, Filipe Manana, Johannes Thumshirn

On Thu, Dec 12, 2024 at 7:56 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Fix tail delete of RAID stripe-extents, if there is a range to be deleted
> as well after the tail delete of the extent.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/raid-stripe-tree.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
> index 092e24e1de32..6246ab4c1a21 100644
> --- a/fs/btrfs/raid-stripe-tree.c
> +++ b/fs/btrfs/raid-stripe-tree.c
> @@ -121,11 +121,18 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>                  * length to the new size and then re-insert the item.
>                  */
>                 if (found_start < start) {
> -                       u64 diff = start - found_start;
> +                       u64 diff_start = start - found_start;
>
>                         btrfs_partially_delete_raid_extent(trans, path, &key,
> -                                                          diff, 0);
> -                       break;
> +                                                          diff_start, 0);
> +
> +                       start += (key.offset - diff_start);
> +                       length -= (key.offset - diff_start);

So this can underflow in case the item we found covers a range that
spans 'end', that is in case found_end > end.

For example if the length argument was originally 1M, the item has a
found_start < start, diff_start is 512K and item's offset is 2M, in
which case we get:

length -= 2M - 512K
length -= 1.5M
1M -= 1.5M
-512K

So it should be:

start += min(found_end, end) - start;
length += min(found_end, end) - start;

But in this case we would also need to split the item into two, as
we're "punching a hole".
Or is this an impossible case, to have a single stripe extent with
found_start < start and found_end > end?

Thanks.


> +                       if (length == 0)
> +                               break;
> +
> +                       btrfs_release_path(path);
> +                       continue;
>                 }
>
>                 /*
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/14] btrfs: fix deletion of a range spanning parts two RAID stripe extents
  2024-12-12  7:55 ` [PATCH 06/14] btrfs: fix deletion of a range spanning parts two RAID stripe extents Johannes Thumshirn
@ 2024-12-17 15:58   ` Filipe Manana
  0 siblings, 0 replies; 32+ messages in thread
From: Filipe Manana @ 2024-12-17 15:58 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, Josef Bacik, Damien Le Moal, David Sterba,
	Naohiro Aota, Qu Wenruo, Filipe Manana, Johannes Thumshirn

On Thu, Dec 12, 2024 at 7:56 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> When a user requests the deletion of a range that spans multiple stripe
> extents and btrfs_search_slot() returns us the second RAID stripe extent,
> we need to pick the previous item and truncate it, if there's still a
> range to delete left, move on to the next item.
>
> The following diagram illustrates the operation:
>
>  |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---|
>         |--- keep  ---|--- drop ---|
>
> While at it, comment the trivial case of a whole item delete as well.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/raid-stripe-tree.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
>
> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
> index 6246ab4c1a21..ccf455b30314 100644
> --- a/fs/btrfs/raid-stripe-tree.c
> +++ b/fs/btrfs/raid-stripe-tree.c
> @@ -101,6 +101,33 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>                 found_end = found_start + key.offset;
>                 ret = 0;
>
> +               /*
> +                * The stripe extent starts before the range we want to delete,
> +                * but the range spans more than one stripe extent:
> +                *
> +                * |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---|
> +                *        |--- keep  ---|--- drop ---|
> +                *
> +                * This means we have to get the previous item, truncate its
> +                * length and then restart the search.
> +                */
> +               if (found_start > start) {
> +                       if (slot == 0)
> +                               break;

Why?
The last item on the previous leaf may be the first stripe extent in
the diagram, it may partially cover the range we want to delete.
Or there may be no previous item at all with a range that partially
overlaps but the current item partially overlaps, so we shouldn't
break and stop - we should process the current item.

Remember that btrfs_previous_items() goes to the previous leaf in case
the current slot is 0.

Thanks.

> +
> +                       ret = btrfs_previous_item(stripe_root, path, start,
> +                                                 BTRFS_RAID_STRIPE_KEY);
> +                       if (ret < 0)
> +                               break;
> +                       ret = 0;
> +
> +                       leaf = path->nodes[0];
> +                       slot = path->slots[0];
> +                       btrfs_item_key_to_cpu(leaf, &key, slot);
> +                       found_start = key.objectid;
> +                       found_end = found_start + key.offset;
> +               }
> +
>                 if (key.type != BTRFS_RAID_STRIPE_KEY)
>                         break;
>
> @@ -155,6 +182,9 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>                         break;
>                 }
>
> +               /*
> +                * Finally we can delete the whole item, no more special cases.
> +                */
>                 ret = btrfs_del_item(trans, stripe_root, path);
>                 if (ret)
>                         break;
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/14] btrfs: implement hole punching for RAID stripe extents
  2024-12-12  7:55 ` [PATCH 07/14] btrfs: implement hole punching for " Johannes Thumshirn
@ 2024-12-17 16:05   ` Filipe Manana
  0 siblings, 0 replies; 32+ messages in thread
From: Filipe Manana @ 2024-12-17 16:05 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, Josef Bacik, Damien Le Moal, David Sterba,
	Naohiro Aota, Qu Wenruo, Filipe Manana, Johannes Thumshirn

On Thu, Dec 12, 2024 at 7:56 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> If the stripe extent we want to delete starts before the range we want to
> delete and ends after the range we want to delete we're punching a
> hole in the stripe extent:
>
>   |--- RAID Stripe Extent ---|
>   | keep |--- drop ---| keep |
>
> This means we need to a) truncate the existing item and b)
> create a second item for the remaining range.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/ctree.c            |  1 +
>  fs/btrfs/raid-stripe-tree.c | 54 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 55 insertions(+)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 693dc27ffb89..5682692b5ba5 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -3903,6 +3903,7 @@ static noinline int setup_leaf_for_split(struct btrfs_trans_handle *trans,
>         btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
>
>         BUG_ON(key.type != BTRFS_EXTENT_DATA_KEY &&
> +              key.type != BTRFS_RAID_STRIPE_KEY &&
>                key.type != BTRFS_EXTENT_CSUM_KEY);
>
>         if (btrfs_leaf_free_space(leaf) >= ins_len)
> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
> index ccf455b30314..6ec72732c4ad 100644
> --- a/fs/btrfs/raid-stripe-tree.c
> +++ b/fs/btrfs/raid-stripe-tree.c
> @@ -138,6 +138,60 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>                 trace_btrfs_raid_extent_delete(fs_info, start, end,
>                                                found_start, found_end);
>
> +               /*
> +                * The stripe extent starts before the range we want to delete
> +                * and ends after the range we want to delete, i.e. we're
> +                * punching a hole in the stripe extent:
> +                *
> +                *  |--- RAID Stripe Extent ---|
> +                *  | keep |--- drop ---| keep |
> +                *
> +                * This means we need to a) truncate the existing item and b)
> +                * create a second item for the remaining range.
> +                */
> +               if (found_start < start && found_end > end) {
> +                       size_t item_size;
> +                       u64 diff_start = start - found_start;
> +                       u64 diff_end = found_end - end;
> +                       struct btrfs_stripe_extent *extent;
> +                       struct btrfs_key newkey = {
> +                               .objectid = end,
> +                               .type = BTRFS_RAID_STRIPE_KEY,
> +                               .offset = diff_end,
> +                       };
> +
> +                       /* "right" item */
> +                       ret = btrfs_duplicate_item(trans, stripe_root, path,
> +                                                  &newkey);
> +                       if (ret)
> +                               break;
> +
> +                       item_size = btrfs_item_size(leaf, path->slots[0]);
> +                       extent = btrfs_item_ptr(leaf, path->slots[0],
> +                                               struct btrfs_stripe_extent);
> +
> +                       for (int i = 0; i < btrfs_num_raid_stripes(item_size);
> +                            i++) {

We can place this in a single line, nowadays we tolerate more than 80
characters, anything up to 90 is fine.
It makes things more readable.

> +                               struct btrfs_raid_stride *stride =
> +                                       &extent->strides[i];

Same here.

> +                               u64 phys;
> +
> +                               phys = btrfs_raid_stride_physical(leaf, stride);
> +                               phys += diff_start + length;
> +                               btrfs_set_raid_stride_physical(leaf, stride,
> +                                                              phys);

Same here.
> +                       }
> +                       btrfs_mark_buffer_dirty(trans, leaf);

Not needed, already done down the call chain of btrfs_duplicate_item().

> +
> +                       /* "left" item */
> +                       path->slots[0]--;
> +                       btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
> +                       btrfs_partially_delete_raid_extent(trans, path, &key,
> +                                                          diff_start, 0);
> +                       length = 0;
> +                       break;

There's the 'break', no need to assign 0 to the length as it's not
used anymore after breaking out of the loop.

Otherwise it looks good, thanks.

> +               }
> +
>                 /*
>                  * The stripe extent starts before the range we want to delete:
>                  *
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/14] btrfs: don't use btrfs_set_item_key_safe on RAID stripe-extents
  2024-12-12  7:55 ` [PATCH 08/14] btrfs: don't use btrfs_set_item_key_safe on RAID stripe-extents Johannes Thumshirn
@ 2024-12-17 16:23   ` Filipe Manana
  2024-12-18 10:00     ` Johannes Thumshirn
  2024-12-19 10:03     ` Johannes Thumshirn
  0 siblings, 2 replies; 32+ messages in thread
From: Filipe Manana @ 2024-12-17 16:23 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, Josef Bacik, Damien Le Moal, David Sterba,
	Naohiro Aota, Qu Wenruo, Filipe Manana, Johannes Thumshirn

On Thu, Dec 12, 2024 at 7:56 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Don't use btrfs_set_item_key_safe() to modify the keys in the RAID
> stripe-tree as this can lead to corruption of the tree, which is caught by
> the checks in btrfs_set_item_key_safe():
>
>  BTRFS critical (device nvme1n1): slot 201 key (5679448064 230 32768) new key (5680439296 230 1028096)

Ok, so you cut one of the most interesting parts, the leaf dump which
would allow us to see the wrong key order.
It's interesting because it can tell us if we're in the wrong slot by
a shift of 1, 2, 3, etc.

>  ------------[ cut here ]------------
>  kernel BUG at fs/btrfs/ctree.c:2672!
>  Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
>  CPU: 1 UID: 0 PID: 1055 Comm: fsstress Not tainted 6.13.0-rc1+ #1464
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
>  RIP: 0010:btrfs_set_item_key_safe+0xf7/0x270
>  Code: <snip>
>  RSP: 0018:ffffc90001337ab0 EFLAGS: 00010287
>  RAX: 0000000000000000 RBX: ffff8881115fd000 RCX: 0000000000000000
>  RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000ffffffff
>  RBP: ffff888110ed6f50 R08: 00000000ffffefff R09: ffffffff8244c500
>  R10: 00000000ffffefff R11: 00000000ffffffff R12: ffff888100586000
>  R13: 00000000000000c9 R14: ffffc90001337b1f R15: ffff888110f23b58
>  FS:  00007f7d75c72740(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007fa811652c60 CR3: 0000000111398001 CR4: 0000000000370eb0
>  Call Trace:
>   <TASK>
>   ? __die_body.cold+0x14/0x1a
>   ? die+0x2e/0x50
>   ? do_trap+0xca/0x110
>   ? do_error_trap+0x65/0x80
>   ? btrfs_set_item_key_safe+0xf7/0x270
>   ? exc_invalid_op+0x50/0x70
>   ? btrfs_set_item_key_safe+0xf7/0x270
>   ? asm_exc_invalid_op+0x1a/0x20
>   ? btrfs_set_item_key_safe+0xf7/0x270
>   btrfs_partially_delete_raid_extent+0xc4/0xe0
>   btrfs_delete_raid_extent+0x227/0x240
>   __btrfs_free_extent.isra.0+0x57f/0x9c0
>   ? exc_coproc_segment_overrun+0x40/0x40
>   __btrfs_run_delayed_refs+0x2fa/0xe80
>   btrfs_run_delayed_refs+0x81/0xe0
>   btrfs_commit_transaction+0x2dd/0xbe0
>   ? preempt_count_add+0x52/0xb0
>   btrfs_sync_file+0x375/0x4c0
>   do_fsync+0x39/0x70
>   __x64_sys_fsync+0x13/0x20
>   do_syscall_64+0x54/0x110
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>  RIP: 0033:0x7f7d7550ef90
>  Code: <snip>
>  RSP: 002b:00007ffd70237248 EFLAGS: 00000202 ORIG_RAX: 000000000000004a
>  RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f7d7550ef90
>  RDX: 000000000000013a RSI: 000000000040eb28 RDI: 0000000000000004
>  RBP: 000000000000001b R08: 0000000000000078 R09: 00007ffd7023725c
>  R10: 00007f7d75400390 R11: 0000000000000202 R12: 028f5c28f5c28f5c
>  R13: 8f5c28f5c28f5c29 R14: 000000000040b520 R15: 00007f7d75c726c8
>   </TASK>
>
> Instead copy the item, adjust the key and per-device physical addresses
> and re-insert it into the tree.

Ok, but why does the bug happen?

If we get a corruption due to a bad key order when calling
btrfs_set_item_key_safe(), based on past experience with
btrfs_drop_extents() on subvolume trees and log trees,
it means we have one of the following 2 bugs (or we have both):

1) We computed a wrong new key which we pass to btrfs_set_item_key_safe();

2) We are calling btrfs_set_item_key_safe() with the path pointing to
an incorrect slot.

Does this still happen after all the previous fixes?

I would like to have the changelog explain why this bug happens.
Give an example leaf layout and an example range passed to
btrfs_delete_raid_extent(), and go through the steps that lead to the
bug.


Also,looking through all these fixes, and previous ones from the past,
I can't stop thinking:

This is the same problem solved by btrfs_drop_extents(), but instead
of file extent items, it's for stripe extent items.
But it's basically the same - we have items that represent ranges and
we want to delete and trim/adjust items that overlap a given range.

In fact this is a simplified case of btrfs_drop_extents(), because
unlike file extent items, there are no references (in the extent tree)
for stripe extent items, or inline extents or items representing holes
(disk_bytenr == 0 in the case of file extent items).

So it seems btrfs_delete_raid_extent() could be literally a rip-off of
btrfs_drop_extents(), using stripe extent items instead of file extent
items and removing all that fat that doesn't exist for strip extent
items.

This way we would get all the correctness and optimizations from the
btrfs_drop_extents() algorithm which has been tested for well over a
decade.
Does it make sense?

Thanks.

>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/raid-stripe-tree.c | 26 +++++++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
> index 6ec72732c4ad..f713d8417681 100644
> --- a/fs/btrfs/raid-stripe-tree.c
> +++ b/fs/btrfs/raid-stripe-tree.c
> @@ -13,12 +13,13 @@
>  #include "volumes.h"
>  #include "print-tree.h"
>
> -static void btrfs_partially_delete_raid_extent(struct btrfs_trans_handle *trans,
> +static int btrfs_partially_delete_raid_extent(struct btrfs_trans_handle *trans,
>                                                struct btrfs_path *path,
>                                                const struct btrfs_key *oldkey,
>                                                u64 newlen, u64 frontpad)
>  {
> -       struct btrfs_stripe_extent *extent;
> +       struct btrfs_root *stripe_root = trans->fs_info->stripe_root;
> +       struct btrfs_stripe_extent *extent, *new;
>         struct extent_buffer *leaf;
>         int slot;
>         size_t item_size;
> @@ -27,6 +28,7 @@ static void btrfs_partially_delete_raid_extent(struct btrfs_trans_handle *trans,
>                 .type = BTRFS_RAID_STRIPE_KEY,
>                 .offset = newlen,
>         };
> +       int ret;
>
>         ASSERT(newlen > 0);
>         ASSERT(oldkey->type == BTRFS_RAID_STRIPE_KEY);
> @@ -34,17 +36,31 @@ static void btrfs_partially_delete_raid_extent(struct btrfs_trans_handle *trans,
>         leaf = path->nodes[0];
>         slot = path->slots[0];
>         item_size = btrfs_item_size(leaf, slot);
> +
> +       new = kzalloc(item_size, GFP_NOFS);
> +       if (!new)
> +               return -ENOMEM;
> +
>         extent = btrfs_item_ptr(leaf, slot, struct btrfs_stripe_extent);
>
>         for (int i = 0; i < btrfs_num_raid_stripes(item_size); i++) {
>                 struct btrfs_raid_stride *stride = &extent->strides[i];
>                 u64 phys;
>
> -               phys = btrfs_raid_stride_physical(leaf, stride);
> -               btrfs_set_raid_stride_physical(leaf, stride, phys + frontpad);
> +               phys = btrfs_raid_stride_physical(leaf, stride) + frontpad;
> +               btrfs_set_stack_raid_stride_physical(&new->strides[i], phys);
>         }
>
> -       btrfs_set_item_key_safe(trans, path, &newkey);
> +       ret = btrfs_del_item(trans, stripe_root, path);
> +       if (ret)
> +               goto out;
> +
> +       btrfs_release_path(path);
> +       ret = btrfs_insert_item(trans, stripe_root, &newkey, new, item_size);
> +
> +out:
> +       kfree(new);
> +       return ret;
>  }
>
>  int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 length)
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/14] btrfs: selftests: check for correct return value of failed lookup
  2024-12-12  7:55 ` [PATCH 09/14] btrfs: selftests: check for correct return value of failed lookup Johannes Thumshirn
@ 2024-12-17 16:24   ` Filipe Manana
  0 siblings, 0 replies; 32+ messages in thread
From: Filipe Manana @ 2024-12-17 16:24 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, Josef Bacik, Damien Le Moal, David Sterba,
	Naohiro Aota, Qu Wenruo, Filipe Manana, Johannes Thumshirn

On Thu, Dec 12, 2024 at 7:56 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Commit 5e72aabc1fff ("btrfs: return ENODATA in case RST lookup fails")
> changed btrfs_get_raid_extent_offset()'s return value to ENODATA in case
> the RAID stripe-tree lookup failed.
>
> Adjust the test cases which check for absence of a given range to check
> for ENODATA as return value in this case.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  fs/btrfs/tests/raid-stripe-tree-tests.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/tests/raid-stripe-tree-tests.c b/fs/btrfs/tests/raid-stripe-tree-tests.c
> index f060c04c7f76..19f6147a38a5 100644
> --- a/fs/btrfs/tests/raid-stripe-tree-tests.c
> +++ b/fs/btrfs/tests/raid-stripe-tree-tests.c
> @@ -125,7 +125,7 @@ static int test_front_delete(struct btrfs_trans_handle *trans)
>         }
>
>         ret = btrfs_get_raid_extent_offset(fs_info, logical, &len, map_type, 0, &io_stripe);
> -       if (!ret) {
> +       if (ret != -ENODATA) {
>                 ret = -EINVAL;
>                 test_err("lookup of RAID extent [%llu, %llu] succeeded, should fail",
>                          logical, logical + SZ_32K);
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/14] btrfs: selftests: don't split RAID extents in half
  2024-12-12  7:55 ` [PATCH 10/14] btrfs: selftests: don't split RAID extents in half Johannes Thumshirn
@ 2024-12-17 16:31   ` Filipe Manana
  0 siblings, 0 replies; 32+ messages in thread
From: Filipe Manana @ 2024-12-17 16:31 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, Josef Bacik, Damien Le Moal, David Sterba,
	Naohiro Aota, Qu Wenruo, Filipe Manana, Johannes Thumshirn

On Thu, Dec 12, 2024 at 7:56 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> The selftests for partially deleting the start or tail of RAID
> stripe-extents split these extents in half.
>
> This can hide errors in the calculation, so don't split the RAID
> stripe-extents in half but delete the first or last 16K of the 64K
> extents.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  fs/btrfs/tests/raid-stripe-tree-tests.c | 44 ++++++++++++++++---------
>  1 file changed, 28 insertions(+), 16 deletions(-)
>
> diff --git a/fs/btrfs/tests/raid-stripe-tree-tests.c b/fs/btrfs/tests/raid-stripe-tree-tests.c
> index 19f6147a38a5..12f3dbb23a64 100644
> --- a/fs/btrfs/tests/raid-stripe-tree-tests.c
> +++ b/fs/btrfs/tests/raid-stripe-tree-tests.c
> @@ -14,6 +14,8 @@
>  #define RST_TEST_NUM_DEVICES   (2)
>  #define RST_TEST_RAID1_TYPE    (BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_RAID1)
>
> +#define SZ_48K (SZ_32K + SZ_16K)
> +
>  typedef int (*test_func_t)(struct btrfs_trans_handle *trans);
>
>  static struct btrfs_device *btrfs_device_by_devid(struct btrfs_fs_devices *fs_devices,
> @@ -94,32 +96,32 @@ static int test_front_delete(struct btrfs_trans_handle *trans)
>                 goto out;
>         }
>
> -       ret = btrfs_delete_raid_extent(trans, logical, SZ_32K);
> +       ret = btrfs_delete_raid_extent(trans, logical, SZ_16K);
>         if (ret) {
>                 test_err("deleting RAID extent [%llu, %llu] failed", logical,
> -                        logical + SZ_32K);
> +                        logical + SZ_16K);
>                 goto out;
>         }
>
> -       len = SZ_32K;
> -       ret = btrfs_get_raid_extent_offset(fs_info, logical + SZ_32K, &len,
> +       len -= SZ_16K;
> +       ret = btrfs_get_raid_extent_offset(fs_info, logical + SZ_16K, &len,
>                                            map_type, 0, &io_stripe);
>         if (ret) {
>                 test_err("lookup of RAID extent [%llu, %llu] failed",
> -                        logical + SZ_32K, logical + SZ_32K + len);
> +                        logical + SZ_16K, logical + SZ_64K);
>                 goto out;
>         }
>
> -       if (io_stripe.physical != logical + SZ_32K) {
> +       if (io_stripe.physical != logical + SZ_16K) {
>                 test_err("invalid physical address, expected %llu, got %llu",
> -                        logical + SZ_32K, io_stripe.physical);
> +                        logical + SZ_16K, io_stripe.physical);
>                 ret = -EINVAL;
>                 goto out;
>         }
>
> -       if (len != SZ_32K) {
> +       if (len != SZ_48K) {
>                 test_err("invalid stripe length, expected %llu, got %llu",
> -                        (u64)SZ_32K, len);
> +                        (u64)SZ_48K, len);
>                 ret = -EINVAL;
>                 goto out;
>         }
> @@ -128,11 +130,11 @@ static int test_front_delete(struct btrfs_trans_handle *trans)
>         if (ret != -ENODATA) {
>                 ret = -EINVAL;
>                 test_err("lookup of RAID extent [%llu, %llu] succeeded, should fail",
> -                        logical, logical + SZ_32K);
> +                        logical, logical + SZ_16K);
>                 goto out;
>         }
>
> -       ret = btrfs_delete_raid_extent(trans, logical + SZ_32K, SZ_32K);
> +       ret = btrfs_delete_raid_extent(trans, logical + SZ_16K, SZ_48K);
>  out:
>         btrfs_put_bioc(bioc);
>         return ret;
> @@ -209,14 +211,14 @@ static int test_tail_delete(struct btrfs_trans_handle *trans)
>                 goto out;
>         }
>
> -       ret = btrfs_delete_raid_extent(trans, logical + SZ_32K, SZ_32K);
> +       ret = btrfs_delete_raid_extent(trans, logical + SZ_48K, SZ_16K);
>         if (ret) {
>                 test_err("deleting RAID extent [%llu, %llu] failed",
> -                        logical + SZ_32K, logical + SZ_64K);
> +                        logical + SZ_48K, logical + SZ_64K);
>                 goto out;
>         }
>
> -       len = SZ_32K;
> +       len = SZ_48K;
>         ret = btrfs_get_raid_extent_offset(fs_info, logical, &len, map_type, 0, &io_stripe);
>         if (ret) {
>                 test_err("lookup of RAID extent [%llu, %llu] failed", logical,
> @@ -231,9 +233,19 @@ static int test_tail_delete(struct btrfs_trans_handle *trans)
>                 goto out;
>         }
>
> -       if (len != SZ_32K) {
> +       if (len != SZ_48K) {
>                 test_err("invalid stripe length, expected %llu, got %llu",
> -                        (u64)SZ_32K, len);
> +                        (u64)SZ_48K, len);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       len = SZ_16K;
> +       ret = btrfs_get_raid_extent_offset(fs_info, logical + SZ_48K, &len,
> +                                          map_type, 0, &io_stripe);
> +       if (ret != -ENODATA) {
> +               test_err("lookup of RAID extent [%llu, %llu] succeeded should fail",
> +                        logical + SZ_48K, logical + SZ_64K);
>                 ret = -EINVAL;
>                 goto out;
>         }
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/14] btrfs: selftests: test RAID stripe-tree deletion spanning two items
  2024-12-12  7:55 ` [PATCH 11/14] btrfs: selftests: test RAID stripe-tree deletion spanning two items Johannes Thumshirn
@ 2024-12-17 16:43   ` Filipe Manana
  0 siblings, 0 replies; 32+ messages in thread
From: Filipe Manana @ 2024-12-17 16:43 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, Josef Bacik, Damien Le Moal, David Sterba,
	Naohiro Aota, Qu Wenruo, Filipe Manana, Johannes Thumshirn

On Thu, Dec 12, 2024 at 7:56 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Add a selftest for RAID stripe-tree deletion with a delete range spanning
> two items, so that we're punching a hole into two adjacent RAID stripe
> extents truncating the first and "moving" the second to the right.
>
> The following diagram illustrates the operation:
>
>  |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---|
>  |-----  keep  -----|--- drop ---|-----  keep  ----|
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  fs/btrfs/tests/raid-stripe-tree-tests.c | 144 ++++++++++++++++++++++++
>  1 file changed, 144 insertions(+)
>
> diff --git a/fs/btrfs/tests/raid-stripe-tree-tests.c b/fs/btrfs/tests/raid-stripe-tree-tests.c
> index 12f3dbb23a64..a815fc5c4dd3 100644
> --- a/fs/btrfs/tests/raid-stripe-tree-tests.c
> +++ b/fs/btrfs/tests/raid-stripe-tree-tests.c
> @@ -31,6 +31,149 @@ static struct btrfs_device *btrfs_device_by_devid(struct btrfs_fs_devices *fs_de
>         return NULL;
>  }
>
> +/*
> + * Test a 1M RST write that spans two adjecent RST items on disk and then
> + * delete a portion starting in the first item and spanning into the second
> + * item. This is similar to test_front_delete(), but spanning multiple items.
> + */
> +static int test_front_delete_prev_item(struct btrfs_trans_handle *trans)
> +{
> +       struct btrfs_fs_info *fs_info = trans->fs_info;
> +       struct btrfs_io_context *bioc;
> +       struct btrfs_io_stripe io_stripe = { 0 };
> +       u64 map_type = RST_TEST_RAID1_TYPE;
> +       u64 logical1 = SZ_1M;
> +       u64 logical2 = SZ_2M;
> +       u64 len = SZ_1M;
> +       int ret;
> +
> +       bioc = alloc_btrfs_io_context(fs_info, logical1, RST_TEST_NUM_DEVICES);
> +       if (!bioc) {
> +               test_std_err(TEST_ALLOC_IO_CONTEXT);
> +               ret = -ENOMEM;
> +               goto out;
> +       }
> +
> +       io_stripe.dev = btrfs_device_by_devid(fs_info->fs_devices, 0);
> +       bioc->map_type = map_type;
> +       bioc->size = len;
> +
> +       /* insert RAID extent 1. */
> +       for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
> +               struct btrfs_io_stripe *stripe = &bioc->stripes[i];
> +
> +               stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
> +               if (!stripe->dev) {
> +                       test_err("cannot find device with devid %d", i);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               stripe->physical = logical1 + i * SZ_1G;
> +       }
> +
> +       ret = btrfs_insert_one_raid_extent(trans, bioc);
> +       if (ret) {
> +               test_err("inserting RAID extent failed: %d", ret);
> +               goto out;
> +       }
> +
> +       bioc->logical = logical2;
> +       /* Insert RAID extent 2, directly adjacent to it. */
> +       for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
> +               struct btrfs_io_stripe *stripe = &bioc->stripes[i];
> +
> +               stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
> +               if (!stripe->dev) {
> +                       test_err("cannot find device with devid %d", i);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               stripe->physical = logical2 + i * SZ_1G;
> +       }
> +
> +       ret = btrfs_insert_one_raid_extent(trans, bioc);
> +       if (ret) {
> +               test_err("inserting RAID extent failed: %d", ret);
> +               goto out;
> +       }
> +
> +       ret = btrfs_delete_raid_extent(trans, logical1 + SZ_512K, SZ_1M);
> +       if (ret) {
> +               test_err("deleting RAID extent [%llu, %llu] failed",
> +                        logical1 + SZ_512K, (u64)SZ_1M);
> +               goto out;
> +       }
> +
> +       /* Verify item 1 is truncated to 512K. */
> +       ret = btrfs_get_raid_extent_offset(fs_info, logical1, &len, map_type, 0,
> +                                          &io_stripe);
> +       if (ret) {
> +               test_err("lookup of RAID extent [%llu, %llu] failed", logical1,
> +                        logical1 + len);
> +               goto out;
> +       }
> +
> +       if (io_stripe.physical != logical1) {
> +               test_err("invalid physical address, expected %llu got %llu",
> +                        logical1, io_stripe.physical);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       if (len != SZ_512K) {
> +               test_err("invalid stripe length, expected %llu got %llu",
> +                        (u64)SZ_512K, len);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       /* Verify item 2's start is moved by 512K. */
> +       ret = btrfs_get_raid_extent_offset(fs_info, logical2 + SZ_512K, &len,
> +                                          map_type, 0, &io_stripe);
> +       if (ret) {
> +               test_err("lookup of RAID extent [%llu, %llu] failed",
> +                        logical2 + SZ_512K, logical2 + len);
> +               goto out;
> +       }
> +
> +       if (io_stripe.physical != logical2 + SZ_512K) {
> +               test_err("invalid physical address, expected %llu got %llu",
> +                        logical2 + SZ_512K, io_stripe.physical);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       if (len != SZ_512K) {
> +               test_err("invalid stripe length, expected %llu got %llu",
> +                        (u64)SZ_512K, len);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       /* Verify there's a hole at [1M+512K, 2M+512K] */
> +       len = SZ_1M;
> +       ret = btrfs_get_raid_extent_offset(fs_info, logical1 + SZ_512K, &len,
> +                                          map_type, 0, &io_stripe);
> +       if (ret != -ENODATA) {
> +               test_err("lookup of RAID [%llu, %llu] succeeded, should fail",
> +                        logical1 + SZ_512K, logical1 + SZ_512K + len);
> +               goto out;
> +       }
> +
> +       /* Clean up after us. */
> +       ret = btrfs_delete_raid_extent(trans, logical1, SZ_512K);
> +       if (ret)
> +               goto out;
> +
> +       ret = btrfs_delete_raid_extent(trans, logical2 + SZ_512K, SZ_512K);
> +
> +out:
> +       btrfs_put_bioc(bioc);
> +       return ret;
> +}
> +
>  /*
>   * Test a 64K RST write on a 2 disk RAID1 at a logical address of 1M and then
>   * delete the 1st 32K, making the new start address 1M+32K.
> @@ -468,6 +611,7 @@ static const test_func_t tests[] = {
>         test_create_update_delete,
>         test_tail_delete,
>         test_front_delete,
> +       test_front_delete_prev_item,
>  };
>
>  static int run_test(test_func_t test, u32 sectorsize, u32 nodesize)
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 12/14] btrfs: selftests: add selftest for punching holes into the RAID stripe extents
  2024-12-12  7:55 ` [PATCH 12/14] btrfs: selftests: add selftest for punching holes into the RAID stripe extents Johannes Thumshirn
@ 2024-12-17 16:50   ` Filipe Manana
  0 siblings, 0 replies; 32+ messages in thread
From: Filipe Manana @ 2024-12-17 16:50 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, Josef Bacik, Damien Le Moal, David Sterba,
	Naohiro Aota, Qu Wenruo, Filipe Manana, Johannes Thumshirn

On Thu, Dec 12, 2024 at 8:06 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Add a selftest for punching a hole into a RAID stripe extent. The test
> create an 1M extent and punches a 64k bytes long hole at offset of 32k from
> the start of the extent.
>
> Afterwards it verifies the start and length of both resulting new extents
> "left" and "right" as well as the absence of the hole.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/tests/raid-stripe-tree-tests.c | 133 ++++++++++++++++++++++++
>  1 file changed, 133 insertions(+)
>
> diff --git a/fs/btrfs/tests/raid-stripe-tree-tests.c b/fs/btrfs/tests/raid-stripe-tree-tests.c
> index a815fc5c4dd3..37b87ccb5858 100644
> --- a/fs/btrfs/tests/raid-stripe-tree-tests.c
> +++ b/fs/btrfs/tests/raid-stripe-tree-tests.c
> @@ -31,6 +31,138 @@ static struct btrfs_device *btrfs_device_by_devid(struct btrfs_fs_devices *fs_de
>         return NULL;
>  }
>
> +/* Test punching a hole into a single RAID stripe-extent. */
> +static int test_punch_hole(struct btrfs_trans_handle *trans)
> +{
> +       struct btrfs_fs_info *fs_info = trans->fs_info;
> +       struct btrfs_io_context *bioc;
> +       struct btrfs_io_stripe io_stripe = { 0 };
> +       u64 map_type = RST_TEST_RAID1_TYPE;
> +       u64 logical1 = SZ_1M;
> +       u64 hole_start = logical1 + SZ_32K;
> +       u64 hole_len = SZ_64K;
> +       u64 logical2 = hole_start + hole_len;
> +       u64 len = SZ_1M;
> +       u64 len1 = SZ_32K;
> +       u64 len2 = len - len1 - hole_len;
> +       int ret;
> +
> +       bioc = alloc_btrfs_io_context(fs_info, logical1, RST_TEST_NUM_DEVICES);
> +       if (!bioc) {
> +               test_std_err(TEST_ALLOC_IO_CONTEXT);
> +               ret = -ENOMEM;
> +               goto out;
> +       }
> +
> +       io_stripe.dev = btrfs_device_by_devid(fs_info->fs_devices, 0);
> +       bioc->map_type = map_type;
> +       bioc->size = len;
> +
> +       for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
> +               struct btrfs_io_stripe *stripe = &bioc->stripes[i];
> +
> +               stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
> +               if (!stripe->dev) {
> +                       test_err("cannot find device with devid %d", i);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               stripe->physical = logical1 + i * SZ_1G;
> +       }
> +
> +       ret = btrfs_insert_one_raid_extent(trans, bioc);
> +       if (ret) {
> +               test_err("inserting RAID extent failed: %d", ret);
> +               goto out;
> +       }
> +
> +       ret = btrfs_get_raid_extent_offset(fs_info, logical1, &len, map_type, 0,
> +                                          &io_stripe);
> +       if (ret) {
> +               test_err("lookup of RAID extent [%llu, %llu] failed", logical1,
> +                        logical1 + len);
> +               goto out;
> +       }
> +
> +       if (io_stripe.physical != logical1) {
> +               test_err("invalid physical address, expected %llu got %llu",
> +                        logical1, io_stripe.physical);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       if (len != SZ_1M) {
> +               test_err("invalid stripe length, expected %llu got %llu",
> +                        (u64)SZ_1M, len);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       ret = btrfs_delete_raid_extent(trans, hole_start, hole_len);
> +       if (ret) {
> +               test_err("deleting RAID extent [%llu, %llu] failed",
> +                        hole_start, hole_start + hole_len);
> +               goto out;
> +       }
> +
> +       ret = btrfs_get_raid_extent_offset(fs_info, logical1, &len1, map_type,
> +                                          0, &io_stripe);
> +       if (ret) {
> +               test_err("lookup of RAID extent [%llu, %llu] failed",
> +                        logical1, logical1 + len1);
> +               goto out;
> +       }
> +
> +       if (io_stripe.physical != logical1) {
> +               test_err("invalid physical address, expected %llu, got %llu",
> +                        logical1, io_stripe.physical);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       if (len1 != SZ_32K) {
> +               test_err("invalid stripe length, expected %llu, got %llu",
> +                        (u64)SZ_32K, len1);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       ret = btrfs_get_raid_extent_offset(fs_info, logical2, &len2, map_type,
> +                                          0, &io_stripe);
> +       if (ret) {
> +               test_err("lookup of RAID extent [%llu, %llu] failed", logical2,
> +                        logical2 + len2);
> +               goto out;
> +       }
> +
> +       if (io_stripe.physical != logical2) {
> +               test_err("invalid physical address, expected %llu, got %llu",
> +                        logical2, io_stripe.physical);
> +               ret = -EINVAL;
> +               goto out;
> +       }

Should we also check that len2 remains with a value of len - len1 - hole_len?

Otherwise it looks fine, thanks.

> +
> +       /* Check for the absence of the hole. */
> +       ret = btrfs_get_raid_extent_offset(fs_info, hole_start, &hole_len,
> +                                          map_type, 0, &io_stripe);
> +       if (ret != -ENODATA) {
> +               ret = -EINVAL;
> +               test_err("lookup of RAID extent [%llu, %llu] succeeded, should fail",
> +                        hole_start, hole_start + SZ_64K);
> +               goto out;
> +       }
> +
> +       ret = btrfs_delete_raid_extent(trans, logical1, len1);
> +       if (ret)
> +               goto out;
> +
> +       ret = btrfs_delete_raid_extent(trans, logical2, len2);
> +out:
> +       btrfs_put_bioc(bioc);
> +       return ret;
> +}
> +
>  /*
>   * Test a 1M RST write that spans two adjecent RST items on disk and then
>   * delete a portion starting in the first item and spanning into the second
> @@ -612,6 +744,7 @@ static const test_func_t tests[] = {
>         test_tail_delete,
>         test_front_delete,
>         test_front_delete_prev_item,
> +       test_punch_hole,
>  };
>
>  static int run_test(test_func_t test, u32 sectorsize, u32 nodesize)
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/14] btrfs: selftests: add test for punching a hole into 3 RAID stripe-extents
  2024-12-12  7:55 ` [PATCH 13/14] btrfs: selftests: add test for punching a hole into 3 RAID stripe-extents Johannes Thumshirn
@ 2024-12-17 16:55   ` Filipe Manana
  0 siblings, 0 replies; 32+ messages in thread
From: Filipe Manana @ 2024-12-17 16:55 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, Josef Bacik, Damien Le Moal, David Sterba,
	Naohiro Aota, Qu Wenruo, Filipe Manana, Johannes Thumshirn

On Thu, Dec 12, 2024 at 7:56 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Missing a change log.

> ---
>  fs/btrfs/tests/raid-stripe-tree-tests.c | 183 ++++++++++++++++++++++++
>  1 file changed, 183 insertions(+)
>
> diff --git a/fs/btrfs/tests/raid-stripe-tree-tests.c b/fs/btrfs/tests/raid-stripe-tree-tests.c
> index 37b87ccb5858..8d74e95a8a75 100644
> --- a/fs/btrfs/tests/raid-stripe-tree-tests.c
> +++ b/fs/btrfs/tests/raid-stripe-tree-tests.c
> @@ -31,6 +31,188 @@ static struct btrfs_device *btrfs_device_by_devid(struct btrfs_fs_devices *fs_de
>         return NULL;
>  }
>
> +/*
> + * Test creating a range of three extents and then punch a hole in the middle,
> + * deleting all of the middle extents and partially deleting the "book ends"
> + */
> +static int test_punch_hole_3extents(struct btrfs_trans_handle *trans)
> +{
> +       struct btrfs_fs_info *fs_info = trans->fs_info;
> +       struct btrfs_io_context *bioc;
> +       struct btrfs_io_stripe io_stripe = { 0 };
> +       u64 map_type = RST_TEST_RAID1_TYPE;
> +       u64 logical1 = SZ_1M;
> +       u64 len1 = SZ_1M;
> +       u64 logical2 = logical1 + len1;
> +       u64 len2 = SZ_1M;
> +       u64 logical3 = logical2 + len2;
> +       u64 len3 = SZ_1M;
> +       u64 hole_start = logical1 + SZ_256K;
> +       u64 hole_len = SZ_2M;
> +       int ret;
> +
> +       bioc = alloc_btrfs_io_context(fs_info, logical1, RST_TEST_NUM_DEVICES);
> +       if (!bioc) {
> +               test_std_err(TEST_ALLOC_IO_CONTEXT);
> +               ret = -ENOMEM;
> +               goto out;
> +       }
> +
> +       io_stripe.dev = btrfs_device_by_devid(fs_info->fs_devices, 0);
> +
> +       /* prepare for the test, 1st create 3 x 1M extents */

Our preferred style is to capitalize the first word and end with punctuation.
Same goes for 2 other comments below.

Other than that and the empty changelog, it looks good.

So at least after adding a changelog:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> +       bioc->map_type = map_type;
> +       bioc->size = len1;
> +
> +       for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
> +               struct btrfs_io_stripe *stripe = &bioc->stripes[i];
> +
> +               stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
> +               if (!stripe->dev) {
> +                       test_err("cannot find device with devid %d", i);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               stripe->physical = logical1 + i * SZ_1G;
> +       }
> +
> +       ret = btrfs_insert_one_raid_extent(trans, bioc);
> +       if (ret) {
> +               test_err("inserting RAID extent failed: %d", ret);
> +               goto out;
> +       }
> +
> +       bioc->logical = logical2;
> +       bioc->size = len2;
> +       for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
> +               struct btrfs_io_stripe *stripe = &bioc->stripes[i];
> +
> +               stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
> +               if (!stripe->dev) {
> +                       test_err("cannot find device with devid %d", i);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               stripe->physical = logical2 + i * SZ_1G;
> +       }
> +
> +       ret = btrfs_insert_one_raid_extent(trans, bioc);
> +       if (ret) {
> +               test_err("inserting RAID extent failed: %d", ret);
> +               goto out;
> +       }
> +
> +       bioc->logical = logical3;
> +       bioc->size = len3;
> +       for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
> +               struct btrfs_io_stripe *stripe = &bioc->stripes[i];
> +
> +               stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
> +               if (!stripe->dev) {
> +                       test_err("cannot find device with devid %d", i);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               stripe->physical = logical3 + i * SZ_1G;
> +       }
> +
> +       ret = btrfs_insert_one_raid_extent(trans, bioc);
> +       if (ret) {
> +               test_err("inserting RAID extent failed: %d", ret);
> +               goto out;
> +       }
> +
> +       /*
> +        * Delete a range starting at logical1 + 256K and 2M in length. Extent
> +        * 1 is truncated to 256k length, extent 2 is completely dropped and
> +        * extent 3 is moved 256K to the right.
> +        */
> +       ret = btrfs_delete_raid_extent(trans, hole_start, hole_len);
> +       if (ret) {
> +               test_err("deleting RAID extent [%llu, %llu] failed",
> +                        hole_start, hole_start + hole_len);
> +               goto out;
> +       }
> +
> +       /* get the first extent and check its size */
> +       ret = btrfs_get_raid_extent_offset(fs_info, logical1, &len1, map_type,
> +                                          0, &io_stripe);
> +       if (ret) {
> +               test_err("lookup of RAID extent [%llu, %llu] failed",
> +                        logical1, logical1 + len1);
> +               goto out;
> +       }
> +
> +       if (io_stripe.physical != logical1) {
> +               test_err("invalid physical address, expected %llu, got %llu",
> +                        logical1, io_stripe.physical);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       if (len1 != SZ_256K) {
> +               test_err("invalid stripe length, expected %llu, got %llu",
> +                        (u64)SZ_256K, len1);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       /* get the second extent and check it's absent */
> +       ret = btrfs_get_raid_extent_offset(fs_info, logical2, &len2, map_type,
> +                                          0, &io_stripe);
> +       if (ret != -ENODATA) {
> +               test_err("lookup of RAID extent [%llu, %llu] succeeded should fail",
> +                        logical2, logical2 + len2);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       /* get the third extent and check its size */
> +       logical3 += SZ_256K;
> +       ret = btrfs_get_raid_extent_offset(fs_info, logical3, &len3, map_type,
> +                                          0, &io_stripe);
> +       if (ret) {
> +               test_err("lookup of RAID extent [%llu, %llu] failed",
> +                        logical3, logical3 + len3);
> +               goto out;
> +       }
> +
> +       if (io_stripe.physical != logical3) {
> +               test_err("invalid physical address, expected %llu, got %llu",
> +                        logical3 + SZ_256K, io_stripe.physical);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       if (len3 != SZ_1M - SZ_256K) {
> +               test_err("invalid stripe length, expected %llu, got %llu",
> +                        (u64)SZ_1M - SZ_256K, len3);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       ret = btrfs_delete_raid_extent(trans, logical1, len1);
> +       if (ret) {
> +               test_err("deleting RAID extent [%llu, %llu] failed",
> +                        logical1, logical1 + len1);
> +               goto out;
> +       }
> +
> +       ret = btrfs_delete_raid_extent(trans, logical3, len3);
> +       if (ret) {
> +               test_err("deleting RAID extent [%llu, %llu] failed",
> +                        logical1, logical1 + len1);
> +               goto out;
> +       }
> +
> +out:
> +       btrfs_put_bioc(bioc);
> +       return ret;
> +}
> +
>  /* Test punching a hole into a single RAID stripe-extent. */
>  static int test_punch_hole(struct btrfs_trans_handle *trans)
>  {
> @@ -745,6 +927,7 @@ static const test_func_t tests[] = {
>         test_front_delete,
>         test_front_delete_prev_item,
>         test_punch_hole,
> +       test_punch_hole_3extents,
>  };
>
>  static int run_test(test_func_t test, u32 sectorsize, u32 nodesize)
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 14/14] btrfs: selftests: add a selftest for deleting two out of three extents
  2024-12-12  7:55 ` [PATCH 14/14] btrfs: selftests: add a selftest for deleting two out of three extents Johannes Thumshirn
@ 2024-12-17 16:59   ` Filipe Manana
  0 siblings, 0 replies; 32+ messages in thread
From: Filipe Manana @ 2024-12-17 16:59 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, Josef Bacik, Damien Le Moal, David Sterba,
	Naohiro Aota, Qu Wenruo, Filipe Manana, Johannes Thumshirn

On Thu, Dec 12, 2024 at 7:57 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Add a selftest creating three extents and then deleting two out of the
> three extents.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/tests/raid-stripe-tree-tests.c | 144 ++++++++++++++++++++++++
>  1 file changed, 144 insertions(+)
>
> diff --git a/fs/btrfs/tests/raid-stripe-tree-tests.c b/fs/btrfs/tests/raid-stripe-tree-tests.c
> index 8d74e95a8a75..29c45f75941f 100644
> --- a/fs/btrfs/tests/raid-stripe-tree-tests.c
> +++ b/fs/btrfs/tests/raid-stripe-tree-tests.c
> @@ -213,6 +213,149 @@ static int test_punch_hole_3extents(struct btrfs_trans_handle *trans)
>         return ret;
>  }
>
> +static int test_delete_two_extents(struct btrfs_trans_handle *trans)
> +{
> +       struct btrfs_fs_info *fs_info = trans->fs_info;
> +       struct btrfs_io_context *bioc;
> +       struct btrfs_io_stripe io_stripe = { 0 };
> +       u64 map_type = RST_TEST_RAID1_TYPE;
> +       u64 logical1 = SZ_1M;
> +       u64 len1 = SZ_1M;
> +       u64 logical2 = logical1 + len1;
> +       u64 len2 = SZ_1M;
> +       u64 logical3 = logical2 + len2;
> +       u64 len3 = SZ_1M;
> +       int ret;
> +
> +       bioc = alloc_btrfs_io_context(fs_info, logical1, RST_TEST_NUM_DEVICES);
> +       if (!bioc) {
> +               test_std_err(TEST_ALLOC_IO_CONTEXT);
> +               ret = -ENOMEM;
> +               goto out;
> +       }
> +
> +       io_stripe.dev = btrfs_device_by_devid(fs_info->fs_devices, 0);
> +
> +       /* prepare for the test, 1st create 3 x 1M extents */

Same thing about the comment style as before (sorry).

> +       bioc->map_type = map_type;
> +       bioc->size = len1;
> +
> +       for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
> +               struct btrfs_io_stripe *stripe = &bioc->stripes[i];
> +
> +               stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
> +               if (!stripe->dev) {
> +                       test_err("cannot find device with devid %d", i);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               stripe->physical = logical1 + i * SZ_1G;
> +       }
> +
> +       ret = btrfs_insert_one_raid_extent(trans, bioc);
> +       if (ret) {
> +               test_err("inserting RAID extent failed: %d", ret);
> +               goto out;
> +       }
> +
> +       bioc->logical = logical2;
> +       bioc->size = len2;
> +       for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
> +               struct btrfs_io_stripe *stripe = &bioc->stripes[i];
> +
> +               stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
> +               if (!stripe->dev) {
> +                       test_err("cannot find device with devid %d", i);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               stripe->physical = logical2 + i * SZ_1G;
> +       }
> +
> +       ret = btrfs_insert_one_raid_extent(trans, bioc);
> +       if (ret) {
> +               test_err("inserting RAID extent failed: %d", ret);
> +               goto out;
> +       }
> +
> +       bioc->logical = logical3;
> +       bioc->size = len3;
> +       for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
> +               struct btrfs_io_stripe *stripe = &bioc->stripes[i];
> +
> +               stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
> +               if (!stripe->dev) {
> +                       test_err("cannot find device with devid %d", i);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               stripe->physical = logical3 + i * SZ_1G;
> +       }
> +
> +       ret = btrfs_insert_one_raid_extent(trans, bioc);
> +       if (ret) {
> +               test_err("inserting RAID extent failed: %d", ret);
> +               goto out;
> +       }
> +
> +       /*
> +        * Delete a range starting at logical1 and 2M in length. Extents 1 and 2
> +        * are is dropped and extent 3 is kept as is.

Extra "is" after "are".

Otherwise it looks good, thanks.

Reviewed-by: Filipe Manana <fdmanana@suse.com>


> +        */
> +       ret = btrfs_delete_raid_extent(trans, logical1, len1 + len2);
> +       if (ret) {
> +               test_err("deleting RAID extent [%llu, %llu] failed",
> +                        logical1, logical1 + len1 + len2);
> +               goto out;
> +       }
> +
> +       ret = btrfs_get_raid_extent_offset(fs_info, logical1, &len1, map_type,
> +                                          0, &io_stripe);
> +       if (ret != -ENODATA) {
> +               test_err("lookup of RAID extent [%llu, %llu] suceeded, should fail\n",
> +                        logical1, len1);
> +               goto out;
> +       }
> +
> +       ret = btrfs_get_raid_extent_offset(fs_info, logical2, &len2, map_type,
> +                                          0, &io_stripe);
> +       if (ret != -ENODATA) {
> +               test_err("lookup of RAID extent [%llu, %llu] suceeded, should fail\n",
> +                        logical2, len2);
> +               goto out;
> +       }
> +
> +       ret = btrfs_get_raid_extent_offset(fs_info, logical3, &len3, map_type,
> +                                          0, &io_stripe);
> +       if (ret) {
> +               test_err("lookup of RAID extent [%llu, %llu] failed\n",
> +                        logical3, len3);
> +               goto out;
> +       }
> +
> +       if (io_stripe.physical != logical3) {
> +               test_err("invalid physical address, expected %llu, got %llu",
> +                        logical3, io_stripe.physical);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       if (len3 != SZ_1M) {
> +               test_err("invalid stripe length, expected %llu, got %llu",
> +                        (u64)SZ_1M, len3);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       ret = btrfs_delete_raid_extent(trans, logical3, len3);
> +out:
> +       btrfs_put_bioc(bioc);
> +       return ret;
> +}
> +
>  /* Test punching a hole into a single RAID stripe-extent. */
>  static int test_punch_hole(struct btrfs_trans_handle *trans)
>  {
> @@ -928,6 +1071,7 @@ static const test_func_t tests[] = {
>         test_front_delete_prev_item,
>         test_punch_hole,
>         test_punch_hole_3extents,
> +       test_delete_two_extents,
>  };
>
>  static int run_test(test_func_t test, u32 sectorsize, u32 nodesize)
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/14] btrfs: fix front delete range calculation for RAID stripe extents
  2024-12-17 15:02   ` Filipe Manana
@ 2024-12-18  9:55     ` Johannes Thumshirn
  0 siblings, 0 replies; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-18  9:55 UTC (permalink / raw)
  To: Filipe Manana, Johannes Thumshirn
  Cc: linux-btrfs@vger.kernel.org, Josef Bacik, Damien Le Moal,
	David Sterba, Naohiro Aota, WenRuo Qu, Filipe Manana

On 17.12.24 16:03, Filipe Manana wrote:
                          btrfs_partially_delete_raid_extent(trans, 
path, &key,
>> -                                                          diff, diff);
>> +                                                          key.offset - length,
>> +                                                          length);
>> +                       ASSERT(key.offset - diff_end == length);
>> +                       length = 0;
>>                          break;
> 
> What's the length = 0 for? We break out of the loop right after the
> assignment and don't use length anymore.

I originally planned on factoring out the loop body into a 'delete one 
extent' function and thus I've set length to 0 here so I know we're 
breaking out of the loop.

But then I stared too long at this code and decided to not do the 
refactoring in this series. Forgot to remove the line though.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/14] btrfs: don't use btrfs_set_item_key_safe on RAID stripe-extents
  2024-12-17 16:23   ` Filipe Manana
@ 2024-12-18 10:00     ` Johannes Thumshirn
  2024-12-19 10:03     ` Johannes Thumshirn
  1 sibling, 0 replies; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-18 10:00 UTC (permalink / raw)
  To: Filipe Manana, Johannes Thumshirn
  Cc: linux-btrfs@vger.kernel.org, Josef Bacik, Damien Le Moal,
	David Sterba, Naohiro Aota, WenRuo Qu, Filipe Manana

On 17.12.24 17:23, Filipe Manana wrote:
> Also,looking through all these fixes, and previous ones from the past,
> I can't stop thinking:
> 
> This is the same problem solved by btrfs_drop_extents(), but instead
> of file extent items, it's for stripe extent items.
> But it's basically the same - we have items that represent ranges and
> we want to delete and trim/adjust items that overlap a given range.
> 
> In fact this is a simplified case of btrfs_drop_extents(), because
> unlike file extent items, there are no references (in the extent tree)
> for stripe extent items, or inline extents or items representing holes
> (disk_bytenr == 0 in the case of file extent items).
> 
> So it seems btrfs_delete_raid_extent() could be literally a rip-off of
> btrfs_drop_extents(), using stripe extent items instead of file extent
> items and removing all that fat that doesn't exist for strip extent
> items.
> 
> This way we would get all the correctness and optimizations from the
> btrfs_drop_extents() algorithm which has been tested for well over a
> decade.
> Does it make sense?

Totally and because of the test cases later in this series we can check 
for correctness. Let me try doing a copy & paste of btrfs_drop_extents().

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/14] btrfs: don't use btrfs_set_item_key_safe on RAID stripe-extents
  2024-12-17 16:23   ` Filipe Manana
  2024-12-18 10:00     ` Johannes Thumshirn
@ 2024-12-19 10:03     ` Johannes Thumshirn
  1 sibling, 0 replies; 32+ messages in thread
From: Johannes Thumshirn @ 2024-12-19 10:03 UTC (permalink / raw)
  To: Filipe Manana, Johannes Thumshirn
  Cc: linux-btrfs@vger.kernel.org, Josef Bacik, Damien Le Moal,
	David Sterba, Naohiro Aota, WenRuo Qu, Filipe Manana

On 17.12.24 17:23, Filipe Manana wrote:
> On Thu, Dec 12, 2024 at 7:56 AM Johannes Thumshirn <jth@kernel.org> wrote:
>>
>> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>
>> Don't use btrfs_set_item_key_safe() to modify the keys in the RAID
>> stripe-tree as this can lead to corruption of the tree, which is caught by
>> the checks in btrfs_set_item_key_safe():
>>
>>   BTRFS critical (device nvme1n1): slot 201 key (5679448064 230 32768) new key (5680439296 230 1028096)
> 
> Ok, so you cut one of the most interesting parts, the leaf dump which
> would allow us to see the wrong key order.
> It's interesting because it can tell us if we're in the wrong slot by
> a shift of 1, 2, 3, etc.

For completeness here's the whole stack trace (this is with btrfs/060 on a RST RAID0):

[   10.478745] BTRFS: device fsid 7b4f20de-961f-4df8-a3e2-76a033c80ea4 devid 1 transid 7 /dev/nvme1n1 (259:0) scanned by mkfs.btrfs (989)
[   10.483768] BTRFS: device fsid 7b4f20de-961f-4df8-a3e2-76a033c80ea4 devid 2 transid 7 /dev/nvme2n1 (259:1) scanned by mkfs.btrfs (989)
[   10.489942] BTRFS: device fsid 7b4f20de-961f-4df8-a3e2-76a033c80ea4 devid 3 transid 7 /dev/nvme3n1 (259:2) scanned by mkfs.btrfs (989)
[   10.493846] BTRFS: device fsid 7b4f20de-961f-4df8-a3e2-76a033c80ea4 devid 4 transid 7 /dev/nvme4n1 (259:3) scanned by mkfs.btrfs (989)
[   10.498846] BTRFS: device fsid 7b4f20de-961f-4df8-a3e2-76a033c80ea4 devid 5 transid 7 /dev/nvme5n1 (259:6) scanned by mkfs.btrfs (989)
[   10.526573] BTRFS info (device nvme1n1): first mount of filesystem 7b4f20de-961f-4df8-a3e2-76a033c80ea4
[   10.528077] BTRFS info (device nvme1n1): using crc32c (crc32c-intel) checksum algorithm
[   10.529282] BTRFS info (device nvme1n1): using free-space-tree
[   10.536585] BTRFS info (device nvme1n1): checking UUID tree
[   10.689742] BTRFS info (device nvme1n1): balance: start -d -m -s
[   10.752513] BTRFS info (device nvme1n1): relocating block group 5675810816 flags data|raid0
[   10.797470] BTRFS info (device nvme1n1): balance: ended with status: -5
[   10.800304] BTRFS info (device nvme1n1): leaf 49168384 gen 15 total ptrs 194 free space 8329 owner 12
[   10.802062] BTRFS info (device nvme1n1): refs 2 lock_owner 1030 current 1030
[   10.803400]  item 0 key (307105792 230 4096) itemoff 16267 itemsize 16
[   10.804627]                  stride 0 devid 1 physical 79040512
[   10.805522]  item 1 key (307113984 230 4096) itemoff 16251 itemsize 16
[   10.806783]                  stride 0 devid 1 physical 79048704
[   10.807687]  item 2 key (307118080 230 4096) itemoff 16235 itemsize 16
[   10.808927]                  stride 0 devid 1 physical 79052800
[   10.809849]  item 3 key (307122176 230 4096) itemoff 16219 itemsize 16
[   10.811073]                  stride 0 devid 1 physical 79056896
[   10.811965]  item 4 key (307126272 230 40960) itemoff 16203 itemsize 16
[   10.813204]                  stride 0 devid 1 physical 79060992
[   10.814114]  item 5 key (307167232 230 4096) itemoff 16187 itemsize 16
[   10.815272]                  stride 0 devid 2 physical 58064896
[   10.816122]  item 6 key (307171328 230 4096) itemoff 16171 itemsize 16
[   10.817302]                  stride 0 devid 2 physical 58068992
[   10.818154]  item 7 key (307175424 230 4096) itemoff 16155 itemsize 16
[   10.819321]                  stride 0 devid 2 physical 58073088
[   10.820188]  item 8 key (307179520 230 4096) itemoff 16139 itemsize 16
[   10.821345]                  stride 0 devid 2 physical 58077184
[   10.822204]  item 9 key (307183616 230 4096) itemoff 16123 itemsize 16
[   10.823386]                  stride 0 devid 2 physical 58081280
[   10.824233]  item 10 key (307187712 230 4096) itemoff 16107 itemsize 16
[   10.825404]                  stride 0 devid 2 physical 58085376
[   10.826277]  item 11 key (307191808 230 4096) itemoff 16091 itemsize 16
[   10.827446]                  stride 0 devid 2 physical 58089472
[   10.828295]  item 12 key (307195904 230 4096) itemoff 16075 itemsize 16
[   10.829471]                  stride 0 devid 2 physical 58093568
[   10.830259]  item 13 key (307200000 230 4096) itemoff 16059 itemsize 16
[   10.831328]                  stride 0 devid 2 physical 58097664
[   10.832107]  item 14 key (307204096 230 4096) itemoff 16043 itemsize 16
[   10.833171]                  stride 0 devid 2 physical 58101760
[   10.834004]  item 15 key (307208192 230 4096) itemoff 16027 itemsize 16
[   10.835267]                  stride 0 devid 2 physical 58105856
[   10.836146]  item 16 key (307212288 230 4096) itemoff 16011 itemsize 16
[   10.837388]                  stride 0 devid 2 physical 58109952
[   10.838268]  item 17 key (307216384 230 4096) itemoff 15995 itemsize 16
[   10.839506]                  stride 0 devid 2 physical 58114048
[   10.840395]  item 18 key (307220480 230 4096) itemoff 15979 itemsize 16
[   10.841641]                  stride 0 devid 2 physical 58118144
[   10.842533]  item 19 key (307257344 230 4096) itemoff 15963 itemsize 16
[   10.843780]                  stride 0 devid 3 physical 58089472
[   10.844668]  item 20 key (307261440 230 20480) itemoff 15947 itemsize 16
[   10.845952]                  stride 0 devid 3 physical 58093568
[   10.846857]  item 21 key (307281920 230 4096) itemoff 15931 itemsize 16
[   10.848099]                  stride 0 devid 3 physical 58114048
[   10.848995]  item 22 key (307286016 230 4096) itemoff 15915 itemsize 16
[   10.850257]                  stride 0 devid 3 physical 58118144
[   10.851144]  item 23 key (307290112 230 4096) itemoff 15899 itemsize 16
[   10.852385]                  stride 0 devid 3 physical 58122240
[   10.853281]  item 24 key (307298304 230 4096) itemoff 15883 itemsize 16
[   10.854522]                  stride 0 devid 4 physical 58064896
[   10.855411]  item 25 key (307302400 230 4096) itemoff 15867 itemsize 16
[   10.856663]                  stride 0 devid 4 physical 58068992
[   10.857555]  item 26 key (307306496 230 8192) itemoff 15851 itemsize 16
[   10.858821]                  stride 0 devid 4 physical 58073088
[   10.859715]  item 27 key (307314688 230 4096) itemoff 15835 itemsize 16
[   10.860952]                  stride 0 devid 4 physical 58081280
[   10.861860]  item 28 key (307318784 230 4096) itemoff 15819 itemsize 16
[   10.863091]                  stride 0 devid 4 physical 58085376
[   10.863985]  item 29 key (307322880 230 4096) itemoff 15803 itemsize 16
[   10.865234]                  stride 0 devid 4 physical 58089472
[   10.866136]  item 30 key (307326976 230 4096) itemoff 15787 itemsize 16
[   10.867385]                  stride 0 devid 4 physical 58093568
[   10.868281]  item 31 key (307331072 230 4096) itemoff 15771 itemsize 16
[   10.869538]                  stride 0 devid 4 physical 58097664
[   10.870452]  item 32 key (307335168 230 4096) itemoff 15755 itemsize 16
[   10.871692]                  stride 0 devid 4 physical 58101760
[   10.872591]  item 33 key (307339264 230 4096) itemoff 15739 itemsize 16
[   10.873856]                  stride 0 devid 4 physical 58105856
[   10.874761]  item 34 key (307343360 230 20480) itemoff 15723 itemsize 16
[   10.876022]                  stride 0 devid 4 physical 58109952
[   10.876927]  item 35 key (307363840 230 65536) itemoff 15707 itemsize 16
[   10.878145]                  stride 0 devid 5 physical 58064896
[   10.879002]  item 36 key (307429376 230 4096) itemoff 15691 itemsize 16
[   10.880093]                  stride 0 devid 1 physical 79101952
[   10.880863]  item 37 key (307433472 230 4096) itemoff 15675 itemsize 16
[   10.882052]                  stride 0 devid 1 physical 79106048
[   10.882905]  item 38 key (307437568 230 4096) itemoff 15659 itemsize 16
[   10.884078]                  stride 0 devid 1 physical 79110144
[   10.884927]  item 39 key (307441664 230 4096) itemoff 15643 itemsize 16
[   10.886074]                  stride 0 devid 1 physical 79114240
[   10.886918]  item 40 key (307445760 230 4096) itemoff 15627 itemsize 16
[   10.888097]                  stride 0 devid 1 physical 79118336
[   10.888949]  item 41 key (307449856 230 4096) itemoff 15611 itemsize 16
[   10.890123]                  stride 0 devid 1 physical 79122432
[   10.890972]  item 42 key (307453952 230 40960) itemoff 15595 itemsize 16
[   10.892159]                  stride 0 devid 1 physical 79126528
[   10.893011]  item 43 key (307494912 230 65536) itemoff 15579 itemsize 16
[   10.894219]                  stride 0 devid 2 physical 58130432
[   10.895057]  item 44 key (307560448 230 20480) itemoff 15563 itemsize 16
[   10.896244]                  stride 0 devid 3 physical 58130432
[   10.897090]  item 45 key (307580928 230 4096) itemoff 15547 itemsize 16
[   10.898272]                  stride 0 devid 3 physical 58150912
[   10.899127]  item 46 key (307585024 230 4096) itemoff 15531 itemsize 16
[   10.900303]                  stride 0 devid 3 physical 58155008
[   10.901140]  item 47 key (307589120 230 4096) itemoff 15515 itemsize 16
[   10.902323]                  stride 0 devid 3 physical 58159104
[   10.903164]  item 48 key (307601408 230 24576) itemoff 15499 itemsize 16
[   10.904346]                  stride 0 devid 3 physical 58171392
[   10.905199]  item 49 key (307625984 230 32768) itemoff 15483 itemsize 16
[   10.906401]                  stride 0 devid 4 physical 58130432
[   10.907253]  item 50 key (307658752 230 4096) itemoff 15467 itemsize 16
[   10.908421]                  stride 0 devid 4 physical 58163200
[   10.909271]  item 51 key (307662848 230 4096) itemoff 15451 itemsize 16
[   10.910465]                  stride 0 devid 4 physical 58167296
[   10.911311]  item 52 key (307666944 230 24576) itemoff 15435 itemsize 16
[   10.912498]                  stride 0 devid 4 physical 58171392
[   10.913348]  item 53 key (307691520 230 4096) itemoff 15419 itemsize 16
[   10.914541]                  stride 0 devid 5 physical 58130432
[   10.915371]  item 54 key (307695616 230 4096) itemoff 15403 itemsize 16
[   10.916561]                  stride 0 devid 5 physical 58134528
[   10.917344]  item 55 key (307699712 230 4096) itemoff 15387 itemsize 16
[   10.918435]                  stride 0 devid 5 physical 58138624
[   10.919255]  item 56 key (307703808 230 36864) itemoff 15371 itemsize 16
[   10.920362]                  stride 0 devid 5 physical 58142720
[   10.921131]  item 57 key (307740672 230 16384) itemoff 15355 itemsize 16
[   10.922070]                  stride 0 devid 5 physical 58179584
[   10.922647]  item 58 key (307757056 230 20480) itemoff 15339 itemsize 16
[   10.923719]                  stride 0 devid 1 physical 79167488
[   10.924540]  item 59 key (307777536 230 4096) itemoff 15323 itemsize 16
[   10.925643]                  stride 0 devid 1 physical 79187968
[   10.926431]  item 60 key (352665600 230 16384) itemoff 15307 itemsize 16
[   10.927521]                  stride 0 devid 1 physical 88162304
[   10.928294]  item 61 key (352681984 230 12288) itemoff 15291 itemsize 16
[   10.929377]                  stride 0 devid 1 physical 88178688
[   10.930172]  item 62 key (352718848 230 40960) itemoff 15275 itemsize 16
[   10.931312]                  stride 0 devid 2 physical 67178496
[   10.932081]  item 63 key (352759808 230 16384) itemoff 15259 itemsize 16
[   10.933144]                  stride 0 devid 2 physical 67219456
[   10.933788]  item 64 key (352858112 230 53248) itemoff 15243 itemsize 16
[   10.934914]                  stride 0 devid 4 physical 67186688
[   10.935706]  item 65 key (352911360 230 65536) itemoff 15227 itemsize 16
[   10.936894]                  stride 0 devid 5 physical 67174400
[   10.937705]  item 66 key (352976896 230 12288) itemoff 15211 itemsize 16
[   10.938804]                  stride 0 devid 1 physical 88211456
[   10.939584]  item 67 key (352989184 230 40960) itemoff 15195 itemsize 16
[   10.940673]                  stride 0 devid 1 physical 88223744
[   10.941449]  item 68 key (353107968 230 65536) itemoff 15179 itemsize 16
[   10.942543]                  stride 0 devid 3 physical 67239936
[   10.943298]  item 69 key (353173504 230 61440) itemoff 15163 itemsize 16
[   10.944360]                  stride 0 devid 4 physical 67239936
[   10.945103]  item 70 key (353394688 230 16384) itemoff 15147 itemsize 16
[   10.946170]                  stride 0 devid 2 physical 67330048
[   10.946908]  item 71 key (353460224 230 4096) itemoff 15131 itemsize 16
[   10.947969]                  stride 0 devid 3 physical 67330048
[   10.948746]  item 72 key (353464320 230 36864) itemoff 15115 itemsize 16
[   10.949808]                  stride 0 devid 3 physical 67334144
[   10.950548]  item 73 key (353501184 230 20480) itemoff 15099 itemsize 16
[   10.951592]                  stride 0 devid 4 physical 67305472
[   10.952360]  item 74 key (353521664 230 4096) itemoff 15083 itemsize 16
[   10.953395]                  stride 0 devid 4 physical 67325952
[   10.954132]  item 75 key (353525760 230 28672) itemoff 15067 itemsize 16
[   10.955186]                  stride 0 devid 4 physical 67330048
[   10.955931]  item 76 key (353554432 230 12288) itemoff 15051 itemsize 16
[   10.956999]                  stride 0 devid 4 physical 67358720
[   10.957783]  item 77 key (353566720 230 65536) itemoff 15035 itemsize 16
[   10.958959]                  stride 0 devid 5 physical 67305472
[   10.959801]  item 78 key (353632256 230 36864) itemoff 15019 itemsize 16
[   10.960920]                  stride 0 devid 1 physical 88342528
[   10.961678]  item 79 key (353669120 230 8192) itemoff 15003 itemsize 16
[   10.962711]                  stride 0 devid 1 physical 88379392
[   10.963472]  item 80 key (353677312 230 20480) itemoff 14987 itemsize 16
[   10.964502]                  stride 0 devid 1 physical 88387584
[   10.965239]  item 81 key (353697792 230 12288) itemoff 14971 itemsize 16
[   10.966297]                  stride 0 devid 2 physical 67371008
[   10.967010]  item 82 key (353710080 230 53248) itemoff 14955 itemsize 16
[   10.967990]                  stride 0 devid 2 physical 67383296
[   10.968756]  item 83 key (353763328 230 8192) itemoff 14939 itemsize 16
[   10.969807]                  stride 0 devid 3 physical 67371008
[   10.970537]  item 84 key (353771520 230 53248) itemoff 14923 itemsize 16
[   10.971596]                  stride 0 devid 3 physical 67379200
[   10.972355]  item 85 key (353828864 230 65536) itemoff 14907 itemsize 16
[   10.973360]                  stride 0 devid 4 physical 67371008
[   10.974131]  item 86 key (353894400 230 57344) itemoff 14891 itemsize 16
[   10.975145]                  stride 0 devid 5 physical 67371008
[   10.975872]  item 87 key (353951744 230 8192) itemoff 14875 itemsize 16
[   10.976895]                  stride 0 devid 5 physical 67428352
[   10.977660]  item 88 key (353959936 230 65536) itemoff 14859 itemsize 16
[   10.978740]                  stride 0 devid 1 physical 88408064
[   10.979593]  item 89 key (354025472 230 45056) itemoff 14843 itemsize 16
[   10.980794]                  stride 0 devid 2 physical 67436544
[   10.981654]  item 90 key (354070528 230 20480) itemoff 14827 itemsize 16
[   10.982856]                  stride 0 devid 2 physical 67481600
[   10.983710]  item 91 key (354091008 230 65536) itemoff 14811 itemsize 16
[   10.984906]                  stride 0 devid 3 physical 67436544
[   10.985769]  item 92 key (354156544 230 4096) itemoff 14795 itemsize 16
[   10.986967]                  stride 0 devid 4 physical 67436544
[   10.987818]  item 93 key (354160640 230 61440) itemoff 14779 itemsize 16
[   10.989024]                  stride 0 devid 4 physical 67440640
[   10.989888]  item 94 key (354222080 230 20480) itemoff 14763 itemsize 16
[   10.991089]                  stride 0 devid 5 physical 67436544
[   10.991934]  item 95 key (354242560 230 4096) itemoff 14747 itemsize 16
[   10.993134]                  stride 0 devid 5 physical 67457024
[   10.993993]  item 96 key (354246656 230 40960) itemoff 14731 itemsize 16
[   10.995195]                  stride 0 devid 5 physical 67461120
[   10.996048]  item 97 key (354287616 230 65536) itemoff 14715 itemsize 16
[   10.997241]                  stride 0 devid 1 physical 88473600
[   10.998111]  item 98 key (354353152 230 24576) itemoff 14699 itemsize 16
[   10.999393]                  stride 0 devid 2 physical 67502080
[   11.000287]  item 99 key (354377728 230 36864) itemoff 14683 itemsize 16
[   11.001492]                  stride 0 devid 2 physical 67526656
[   11.002397]  item 100 key (354414592 230 4096) itemoff 14667 itemsize 16
[   11.003640]                  stride 0 devid 2 physical 67563520
[   11.004511]  item 101 key (354418688 230 4096) itemoff 14651 itemsize 16
[   11.005727]                  stride 0 devid 3 physical 67502080
[   11.006609]  item 102 key (354422784 230 61440) itemoff 14635 itemsize 16
[   11.007917]                  stride 0 devid 3 physical 67506176
[   11.008830]  item 103 key (354484224 230 20480) itemoff 14619 itemsize 16
[   11.010149]                  stride 0 devid 4 physical 67502080
[   11.011138]  item 104 key (354504704 230 45056) itemoff 14603 itemsize 16
[   11.012511]                  stride 0 devid 4 physical 67522560
[   11.013484]  item 105 key (354549760 230 20480) itemoff 14587 itemsize 16
[   11.014896]                  stride 0 devid 5 physical 67502080
[   11.015851]  item 106 key (354631680 230 4096) itemoff 14571 itemsize 16
[   11.017203]                  stride 0 devid 1 physical 88559616
[   11.018152]  item 107 key (354631680 230 32768) itemoff 14555 itemsize 16
[   11.019502]                  stride 0 devid 1 physical 88555520
[   11.020501]  item 108 key (354717696 230 28672) itemoff 14539 itemsize 16
[   11.021880]                  stride 0 devid 2 physical 67604480
[   11.022823]  item 109 key (354746368 230 20480) itemoff 14523 itemsize 16
[   11.024193]                  stride 0 devid 3 physical 67567616
[   11.025157]  item 110 key (354766848 230 45056) itemoff 14507 itemsize 16
[   11.026537]                  stride 0 devid 3 physical 67588096
[   11.027548]  item 111 key (354811904 230 49152) itemoff 14491 itemsize 16
[   11.028919]                  stride 0 devid 4 physical 67567616
[   11.029901]  item 112 key (354861056 230 16384) itemoff 14475 itemsize 16
[   11.031268]                  stride 0 devid 4 physical 67616768
[   11.032231]  item 113 key (354877440 230 61440) itemoff 14459 itemsize 16
[   11.033621]                  stride 0 devid 5 physical 67567616
[   11.034591]  item 114 key (354938880 230 4096) itemoff 14443 itemsize 16
[   11.035953]                  stride 0 devid 5 physical 67629056
[   11.036920]  item 115 key (354942976 230 65536) itemoff 14427 itemsize 16
[   11.038337]                  stride 0 devid 1 physical 88604672
[   11.039281]  item 116 key (355008512 230 20480) itemoff 14411 itemsize 16
[   11.040662]                  stride 0 devid 2 physical 67633152
[   11.041618]  item 117 key (355028992 230 45056) itemoff 14395 itemsize 16
[   11.043005]                  stride 0 devid 2 physical 67653632
[   11.043976]  item 118 key (355086336 230 53248) itemoff 14379 itemsize 16
[   11.045378]                  stride 0 devid 3 physical 67645440
[   11.046319]  item 119 key (355139584 230 65536) itemoff 14363 itemsize 16
[   11.047699]                  stride 0 devid 4 physical 67633152
[   11.048677]  item 120 key (355205120 230 8192) itemoff 14347 itemsize 16
[   11.050048]                  stride 0 devid 5 physical 67633152
[   11.050982]  item 121 key (355213312 230 57344) itemoff 14331 itemsize 16
[   11.052310]                  stride 0 devid 5 physical 67641344
[   11.053276]  item 122 key (355270656 230 28672) itemoff 14315 itemsize 16
[   11.054641]                  stride 0 devid 1 physical 88670208
[   11.055616]  item 123 key (355299328 230 36864) itemoff 14299 itemsize 16
[   11.057021]                  stride 0 devid 1 physical 88698880
[   11.057989]  item 124 key (355336192 230 65536) itemoff 14283 itemsize 16
[   11.059363]                  stride 0 devid 2 physical 67698688
[   11.060320]  item 125 key (355401728 230 16384) itemoff 14267 itemsize 16
[   11.061718]                  stride 0 devid 3 physical 67698688
[   11.062678]  item 126 key (355418112 230 49152) itemoff 14251 itemsize 16
[   11.063975]                  stride 0 devid 3 physical 67715072
[   11.064899]  item 127 key (355467264 230 16384) itemoff 14235 itemsize 16
[   11.066220]                  stride 0 devid 4 physical 67698688
[   11.067131]  item 128 key (355483648 230 49152) itemoff 14219 itemsize 16
[   11.068440]                  stride 0 devid 4 physical 67715072
[   11.069356]  item 129 key (355532800 230 4096) itemoff 14203 itemsize 16
[   11.070651]                  stride 0 devid 5 physical 67698688
[   11.071558]  item 130 key (355536896 230 4096) itemoff 14187 itemsize 16
[   11.072852]                  stride 0 devid 5 physical 67702784
[   11.073775]  item 131 key (355540992 230 24576) itemoff 14171 itemsize 16
[   11.075068]                  stride 0 devid 5 physical 67706880
[   11.075973]  item 132 key (355565568 230 32768) itemoff 14155 itemsize 16
[   11.077278]                  stride 0 devid 5 physical 67731456
[   11.078196]  item 133 key (355598336 230 8192) itemoff 14139 itemsize 16
[   11.079485]                  stride 0 devid 1 physical 88735744
[   11.080409]  item 134 key (355606528 230 57344) itemoff 14123 itemsize 16
[   11.081712]                  stride 0 devid 1 physical 88743936
[   11.082627]  item 135 key (355663872 230 49152) itemoff 14107 itemsize 16
[   11.083927]                  stride 0 devid 2 physical 67764224
[   11.084836]  item 136 key (355713024 230 16384) itemoff 14091 itemsize 16
[   11.086145]                  stride 0 devid 2 physical 67813376
[   11.087056]  item 137 key (355729408 230 65536) itemoff 14075 itemsize 16
[   11.088367]                  stride 0 devid 3 physical 67764224
[   11.089281]  item 138 key (355794944 230 65536) itemoff 14059 itemsize 16
[   11.090592]                  stride 0 devid 4 physical 67764224
[   11.091508]  item 139 key (355860480 230 4096) itemoff 14043 itemsize 16
[   11.092762]                  stride 0 devid 5 physical 67764224
[   11.093684]  item 140 key (355864576 230 57344) itemoff 14027 itemsize 16
[   11.094996]                  stride 0 devid 5 physical 67768320
[   11.095911]  item 141 key (355921920 230 4096) itemoff 14011 itemsize 16
[   11.097203]                  stride 0 devid 5 physical 67825664
[   11.098124]  item 142 key (355926016 230 12288) itemoff 13995 itemsize 16
[   11.099414]                  stride 0 devid 1 physical 88801280
[   11.100341]  item 143 key (355938304 230 4096) itemoff 13979 itemsize 16
[   11.101630]                  stride 0 devid 1 physical 88813568
[   11.102507]  item 144 key (355942400 230 49152) itemoff 13963 itemsize 16
[   11.103628]                  stride 0 devid 1 physical 88817664
[   11.104395]  item 145 key (355991552 230 40960) itemoff 13947 itemsize 16
[   11.105494]                  stride 0 devid 2 physical 67829760
[   11.106285]  item 146 key (356032512 230 12288) itemoff 13931 itemsize 16
[   11.107389]                  stride 0 devid 2 physical 67870720
[   11.108155]  item 147 key (356044800 230 12288) itemoff 13915 itemsize 16
[   11.109247]                  stride 0 devid 2 physical 67883008
[   11.110031]  item 148 key (356057088 230 65536) itemoff 13899 itemsize 16
[   11.111137]                  stride 0 devid 3 physical 67829760
[   11.111914]  item 149 key (356122624 230 45056) itemoff 13883 itemsize 16
[   11.113006]                  stride 0 devid 4 physical 67829760
[   11.113785]  item 150 key (356167680 230 20480) itemoff 13867 itemsize 16
[   11.114899]                  stride 0 devid 4 physical 67874816
[   11.115676]  item 151 key (356188160 230 8192) itemoff 13851 itemsize 16
[   11.116766]                  stride 0 devid 5 physical 67829760
[   11.117543]  item 152 key (356196352 230 49152) itemoff 13835 itemsize 16
[   11.118656]                  stride 0 devid 5 physical 67837952
[   11.119428]  item 153 key (356245504 230 8192) itemoff 13819 itemsize 16
[   11.120523]                  stride 0 devid 5 physical 67887104
[   11.121294]  item 154 key (356253696 230 36864) itemoff 13803 itemsize 16
[   11.122406]                  stride 0 devid 1 physical 88866816
[   11.123173]  item 155 key (356290560 230 28672) itemoff 13787 itemsize 16
[   11.124278]                  stride 0 devid 1 physical 88903680
[   11.125048]  item 156 key (356319232 230 12288) itemoff 13771 itemsize 16
[   11.126155]                  stride 0 devid 2 physical 67895296
[   11.126930]  item 157 key (356331520 230 53248) itemoff 13755 itemsize 16
[   11.128033]                  stride 0 devid 2 physical 67907584
[   11.128816]  item 158 key (356384768 230 4096) itemoff 13739 itemsize 16
[   11.129907]                  stride 0 devid 3 physical 67895296
[   11.130674]  item 159 key (356388864 230 61440) itemoff 13723 itemsize 16
[   11.131777]                  stride 0 devid 3 physical 67899392
[   11.132555]  item 160 key (356450304 230 28672) itemoff 13707 itemsize 16
[   11.133654]                  stride 0 devid 4 physical 67895296
[   11.134416]  item 161 key (356478976 230 36864) itemoff 13691 itemsize 16
[   11.135499]                  stride 0 devid 4 physical 67923968
[   11.136284]  item 162 key (356515840 230 65536) itemoff 13675 itemsize 16
[   11.137370]                  stride 0 devid 5 physical 67895296
[   11.138144]  item 163 key (356581376 230 24576) itemoff 13659 itemsize 16
[   11.139234]                  stride 0 devid 1 physical 88932352
[   11.140020]  item 164 key (356679680 230 32768) itemoff 13643 itemsize 16
[   11.141115]                  stride 0 devid 2 physical 67993600
[   11.141901]  item 165 key (356712448 230 65536) itemoff 13627 itemsize 16
[   11.143007]                  stride 0 devid 3 physical 67960832
[   11.143784]  item 166 key (356777984 230 65536) itemoff 13611 itemsize 16
[   11.144873]                  stride 0 devid 4 physical 67960832
[   11.145651]  item 167 key (356843520 230 40960) itemoff 13595 itemsize 16
[   11.146768]                  stride 0 devid 5 physical 67960832
[   11.147536]  item 168 key (356884480 230 24576) itemoff 13579 itemsize 16
[   11.148750]                  stride 0 devid 5 physical 68001792
[   11.149618]  item 169 key (356909056 230 65536) itemoff 13563 itemsize 16
[   11.150438]                  stride 0 devid 1 physical 88997888
[   11.151014]  item 170 key (356974592 230 53248) itemoff 13547 itemsize 16
[   11.151832]                  stride 0 devid 2 physical 68026368
[   11.152406]  item 171 key (357027840 230 12288) itemoff 13531 itemsize 16
[   11.153358]                  stride 0 devid 2 physical 68079616
[   11.154131]  item 172 key (357040128 230 65536) itemoff 13515 itemsize 16
[   11.155236]                  stride 0 devid 3 physical 68026368
[   11.156014]  item 173 key (357105664 230 24576) itemoff 13499 itemsize 16
[   11.157175]                  stride 0 devid 4 physical 68026368
[   11.158002]  item 174 key (357130240 230 20480) itemoff 13483 itemsize 16
[   11.159169]                  stride 0 devid 4 physical 68050944
[   11.159988]  item 175 key (357150720 230 20480) itemoff 13467 itemsize 16
[   11.161097]                  stride 0 devid 4 physical 68071424
[   11.161887]  item 176 key (357171200 230 36864) itemoff 13451 itemsize 16
[   11.162991]                  stride 0 devid 5 physical 68026368
[   11.163799]  item 177 key (357208064 230 8192) itemoff 13435 itemsize 16
[   11.164909]                  stride 0 devid 5 physical 68063232
[   11.165697]  item 178 key (357216256 230 20480) itemoff 13419 itemsize 16
[   11.166788]                  stride 0 devid 5 physical 68071424
[   11.167531]  item 179 key (357236736 230 61440) itemoff 13403 itemsize 16
[   11.168602]                  stride 0 devid 1 physical 89063424
[   11.169445]  item 180 key (357298176 230 4096) itemoff 13387 itemsize 16
[   11.170671]                  stride 0 devid 1 physical 89124864
[   11.171522]  item 181 key (357302272 230 12288) itemoff 13371 itemsize 16
[   11.172737]                  stride 0 devid 2 physical 68091904
[   11.173602]  item 182 key (357314560 230 24576) itemoff 13355 itemsize 16
[   11.174827]                  stride 0 devid 2 physical 68104192
[   11.175671]  item 183 key (357339136 230 4096) itemoff 13339 itemsize 16
[   11.176889]                  stride 0 devid 2 physical 68128768
[   11.177744]  item 184 key (357343232 230 8192) itemoff 13323 itemsize 16
[   11.178951]                  stride 0 devid 2 physical 68132864
[   11.179803]  item 185 key (357351424 230 16384) itemoff 13307 itemsize 16
[   11.181035]                  stride 0 devid 2 physical 68141056
[   11.181907]  item 186 key (357367808 230 32768) itemoff 13291 itemsize 16
[   11.183143]                  stride 0 devid 3 physical 68091904
[   11.184013]  item 187 key (357400576 230 4096) itemoff 13275 itemsize 16
[   11.185164]                  stride 0 devid 3 physical 68124672
[   11.185944]  item 188 key (357404672 230 4096) itemoff 13259 itemsize 16
[   11.187041]                  stride 0 devid 3 physical 68128768
[   11.187803]  item 189 key (357408768 230 24576) itemoff 13243 itemsize 16
[   11.188911]                  stride 0 devid 3 physical 68132864
[   11.189688]  item 190 key (357433344 230 65536) itemoff 13227 itemsize 16
[   11.190794]                  stride 0 devid 4 physical 68091904
[   11.191566]  item 191 key (357498880 230 8192) itemoff 13211 itemsize 16
[   11.192648]                  stride 0 devid 5 physical 68091904
[   11.193435]  item 192 key (357507072 230 57344) itemoff 13195 itemsize 16
[   11.194568]                  stride 0 devid 5 physical 68100096
[   11.195352]  item 193 key (357564416 230 8192) itemoff 13179 itemsize 16
[   11.196559]                  stride 0 devid 1 physical 89128960
[   11.197479] BTRFS critical (device nvme1n1): slot 106 key (354631680 230 32768) new key (354635776 230 4096)
[   11.199354] ------------[ cut here ]------------
[   11.200117] kernel BUG at fs/btrfs/ctree.c:2602!
[   11.200886] Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
[   11.201808] CPU: 1 UID: 0 PID: 1030 Comm: fsstress Not tainted 6.13.0-rc2+ #1525
[   11.203028] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
[   11.204997] RIP: 0010:btrfs_set_item_key_safe+0xf7/0x270
[   11.205963] Code: 89 ea 48 89 df 41 0f b6 46 08 48 c7 c6 20 fd 15 82 50 41 ff 36 4c 8b 4c 24 28 44 0f b6 44 24 27 48 8b 4c 24 1f e8 39 b7 bd ff <0f> 0b 49 8b 06 48 8d 74 24 07 4c 89 ff b9 11 00 00 00 48 89 44 24
[   11.209287] RSP: 0018:ffffc9000131fb58 EFLAGS: 00010283
[   11.210236] RAX: 0000000000000000 RBX: ffff888110a25000 RCX: 0000000000000000
[   11.211513] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000ffffffff
[   11.212804] RBP: ffff88810fc1ce70 R08: 00000000ffffefff R09: ffffffff8244c580
[   11.214076] R10: 00000000ffffefff R11: 00000000ffffffff R12: ffff888111865150
[   11.215372] R13: 000000000000006a R14: ffffc9000131fbc7 R15: ffff88810eb10738
[   11.216661] FS:  00007fbe90168740(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
[   11.218121] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   11.219168] CR2: 00007fbe8f89edce CR3: 000000010f836004 CR4: 0000000000370eb0
[   11.220450] Call Trace:
[   11.220908]  <TASK>
[   11.221313]  ? __die_body.cold+0x14/0x1a
[   11.222035]  ? die+0x2e/0x50
[   11.222582]  ? do_trap+0xca/0x110
[   11.223197]  ? do_error_trap+0x65/0x80
[   11.223883]  ? btrfs_set_item_key_safe+0xf7/0x270
[   11.224744]  ? exc_invalid_op+0x50/0x70
[   11.225457]  ? btrfs_set_item_key_safe+0xf7/0x270
[   11.226314]  ? asm_exc_invalid_op+0x1a/0x20
[   11.227074]  ? btrfs_set_item_key_safe+0xf7/0x270
[   11.227924]  ? btrfs_set_item_key_safe+0xf7/0x270
[   11.228785]  btrfs_partially_delete_raid_extent+0xc4/0xe0
[   11.229777]  btrfs_delete_raid_extent+0x227/0x240
[   11.230690]  __btrfs_free_extent.isra.0+0x5e0/0xc30
[   11.231637]  ? insn_get_displacement+0x110/0x160
[   11.232537]  __btrfs_run_delayed_refs+0x2fa/0xe80
[   11.233454]  btrfs_run_delayed_refs+0x80/0x110
[   11.234304]  btrfs_commit_transaction+0x2dd/0xbe0
[   11.235214]  ? __pfx_autoremove_wake_function+0x10/0x10
[   11.236221]  ? __pfx_sync_fs_one_sb+0x10/0x10
[   11.237070]  iterate_supers+0x7b/0xf0
[   11.237788]  ksys_sync+0x54/0x90
[   11.238423]  __do_sys_sync+0xe/0x20
[   11.239105]  do_syscall_64+0x54/0x110
[   11.239834]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   11.240803] RIP: 0033:0x7fbe8f917677
[   11.241502] Code: Unable to access opcode bytes at 0x7fbe8f91764d.
[   11.242679] RSP: 002b:00007ffce109d8f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a2
[   11.244122] RAX: ffffffffffffffda RBX: 0000000000000054 RCX: 00007fbe8f917677
[   11.245479] RDX: 0000000020a64235 RSI: 0000000020a64235 RDI: 0000000000000054
[   11.246852] RBP: 0000000000000032 R08: 0000000000000036 R09: 0000000000000019
[   11.248215] R10: 0000000000000000 R11: 0000000000000202 R12: 028f5c28f5c28f5c
[   11.249584] R13: 8f5c28f5c28f5c29 R14: 00000000004038b0 R15: 00007fbe901686c8
[   11.250944]  </TASK>
[   11.251377] Modules linked in:
[   11.251988] ---[ end trace 0000000000000000 ]---
[   11.252868] RIP: 0010:btrfs_set_item_key_safe+0xf7/0x270
[   11.253900] Code: 89 ea 48 89 df 41 0f b6 46 08 48 c7 c6 20 fd 15 82 50 41 ff 36 4c 8b 4c 24 28 44 0f b6 44 24 27 48 8b 4c 24 1f e8 39 b7 bd ff <0f> 0b 49 8b 06 48 8d 74 24 07 4c 89 ff b9 11 00 00 00 48 89 44 24
[   11.257438] RSP: 0018:ffffc9000131fb58 EFLAGS: 00010283
[   11.258457] RAX: 0000000000000000 RBX: ffff888110a25000 RCX: 0000000000000000
[   11.259823] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000ffffffff
[   11.261194] RBP: ffff88810fc1ce70 R08: 00000000ffffefff R09: ffffffff8244c580
[   11.262512] R10: 00000000ffffefff R11: 00000000ffffffff R12: ffff888111865150
[   11.263886] R13: 000000000000006a R14: ffffc9000131fbc7 R15: ffff88810eb10738
[   11.265246] FS:  00007fbe90168740(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
[   11.266800] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   11.267902] CR2: 00007fbe8f91764d CR3: 000000010f836004 CR4: 0000000000370eb0
[   11.269260] Kernel panic - not syncing: Fatal exception
[   11.270353] Kernel Offset: disabled
[   11.270966] ---[ end Kernel panic - not syncing: Fatal exception ]---

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2024-12-19 10:03 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-12  7:55 [PATCH 00/14] btrfs: more RST delete fixes Johannes Thumshirn
2024-12-12  7:55 ` [PATCH 01/14] btrfs: don't try to delete RAID stripe-extents if we don't need to Johannes Thumshirn
2024-12-17 14:40   ` Filipe Manana
2024-12-12  7:55 ` [PATCH 02/14] btrfs: assert RAID stripe-extent length is always greater than 0 Johannes Thumshirn
2024-12-17 14:46   ` Filipe Manana
2024-12-12  7:55 ` [PATCH 03/14] btrfs: fix search when deleting a RAID stripe-extent Johannes Thumshirn
2024-12-17 14:56   ` Filipe Manana
2024-12-12  7:55 ` [PATCH 04/14] btrfs: fix front delete range calculation for RAID stripe extents Johannes Thumshirn
2024-12-17 15:02   ` Filipe Manana
2024-12-18  9:55     ` Johannes Thumshirn
2024-12-12  7:55 ` [PATCH 05/14] btrfs: fix tail delete of RAID stripe-extents Johannes Thumshirn
2024-12-17 15:45   ` Filipe Manana
2024-12-12  7:55 ` [PATCH 06/14] btrfs: fix deletion of a range spanning parts two RAID stripe extents Johannes Thumshirn
2024-12-17 15:58   ` Filipe Manana
2024-12-12  7:55 ` [PATCH 07/14] btrfs: implement hole punching for " Johannes Thumshirn
2024-12-17 16:05   ` Filipe Manana
2024-12-12  7:55 ` [PATCH 08/14] btrfs: don't use btrfs_set_item_key_safe on RAID stripe-extents Johannes Thumshirn
2024-12-17 16:23   ` Filipe Manana
2024-12-18 10:00     ` Johannes Thumshirn
2024-12-19 10:03     ` Johannes Thumshirn
2024-12-12  7:55 ` [PATCH 09/14] btrfs: selftests: check for correct return value of failed lookup Johannes Thumshirn
2024-12-17 16:24   ` Filipe Manana
2024-12-12  7:55 ` [PATCH 10/14] btrfs: selftests: don't split RAID extents in half Johannes Thumshirn
2024-12-17 16:31   ` Filipe Manana
2024-12-12  7:55 ` [PATCH 11/14] btrfs: selftests: test RAID stripe-tree deletion spanning two items Johannes Thumshirn
2024-12-17 16:43   ` Filipe Manana
2024-12-12  7:55 ` [PATCH 12/14] btrfs: selftests: add selftest for punching holes into the RAID stripe extents Johannes Thumshirn
2024-12-17 16:50   ` Filipe Manana
2024-12-12  7:55 ` [PATCH 13/14] btrfs: selftests: add test for punching a hole into 3 RAID stripe-extents Johannes Thumshirn
2024-12-17 16:55   ` Filipe Manana
2024-12-12  7:55 ` [PATCH 14/14] btrfs: selftests: add a selftest for deleting two out of three extents Johannes Thumshirn
2024-12-17 16:59   ` Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox