linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] implement truncation for RAID stripe-extents
@ 2024-10-09 15:30 Johannes Thumshirn
  2024-10-09 15:30 ` [PATCH v3 1/2] btrfs: implement partial deletion of RAID stripe extents Johannes Thumshirn
  2024-10-09 15:30 ` [PATCH v3 2/2] btrfs: implement self-tests for partial RAID srtipe-tree delete Johannes Thumshirn
  0 siblings, 2 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2024-10-09 15:30 UTC (permalink / raw)
  To: David Sterba, Josef Bacik
  Cc: Filipe Manana, Naohiro Aota, linux-btrfs, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Implement truncation of RAID stripe-tree entries and add selftests for these
two edge cases of deletion to the RAID stripe-tree selftests.

Changes to v2:
- Correctly adjust the btree keys and insert new items instead (Filipe)
- Add selftests

Link to v2:
https://lore.kernel.org/linux-btrfs/20240911095206.31060-1-jth@kernel.org

Johannes Thumshirn (2):
  btrfs: implement partial deletion of RAID stripe extents
  btrfs: implement self-tests for partial RAID srtipe-tree delete

 fs/btrfs/raid-stripe-tree.c             |  85 ++++++++-
 fs/btrfs/tests/raid-stripe-tree-tests.c | 223 ++++++++++++++++++++++++
 2 files changed, 304 insertions(+), 4 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 1/2] btrfs: implement partial deletion of RAID stripe extents
  2024-10-09 15:30 [PATCH v3 0/2] implement truncation for RAID stripe-extents Johannes Thumshirn
@ 2024-10-09 15:30 ` Johannes Thumshirn
  2024-10-09 16:15   ` Johannes Thumshirn
  2024-10-09 16:41   ` Filipe Manana
  2024-10-09 15:30 ` [PATCH v3 2/2] btrfs: implement self-tests for partial RAID srtipe-tree delete Johannes Thumshirn
  1 sibling, 2 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2024-10-09 15:30 UTC (permalink / raw)
  To: David Sterba, Josef Bacik
  Cc: Filipe Manana, Naohiro Aota, linux-btrfs, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

In our CI system, the RAID stripe tree configuration sometimes fails with
the following ASSERT():

 assertion failed: found_start >= start && found_end <= end, in fs/btrfs/raid-stripe-tree.c:64

This ASSERT()ion triggers, because for the initial design of RAID
stripe-tree, I had the "one ordered-extent equals one bio" rule of zoned
btrfs in mind.

But for a RAID stripe-tree based system, that is not hosted on a zoned
storage device, but on a regular device this rule doesn't apply.

So in case the range we want to delete starts in the middle of the
previous item, grab the item and "truncate" it's length. That is, clone
the item, subtract the deleted portion from the key's offset, delete the
old item and insert the new one.

In case the range to delete ends in the middle of an item, we have to
adjust both the item's key as well as the stripe extents and then
re-insert the modified clone into the tree after deleting the old stripe
extent.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/raid-stripe-tree.c | 85 +++++++++++++++++++++++++++++++++++--
 1 file changed, 81 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 41970bbdb05f..40cc0a392be2 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -13,6 +13,54 @@
 #include "volumes.h"
 #include "print-tree.h"
 
+static int btrfs_partially_delete_raid_extent(struct btrfs_trans_handle *trans,
+					      struct btrfs_path *path,
+					      struct btrfs_key *oldkey,
+					      u64 newlen, u64 frontpad)
+{
+	struct btrfs_root *stripe_root = trans->fs_info->stripe_root;
+	struct btrfs_stripe_extent *extent, *new;
+	struct extent_buffer *leaf = path->nodes[0];
+	int slot = path->slots[0];
+	const size_t item_size = btrfs_item_size(leaf, slot);
+	struct btrfs_key newkey;
+	int ret;
+	int i;
+
+	new = kzalloc(item_size, GFP_NOFS);
+	if (!new)
+		return -ENOMEM;
+
+	memcpy(&newkey, oldkey, sizeof(struct btrfs_key));
+	newkey.objectid += frontpad;
+	newkey.offset -= newlen;
+
+	extent = btrfs_item_ptr(leaf, slot, struct btrfs_stripe_extent);
+
+	for (i = 0; i < btrfs_num_raid_stripes(item_size); i++) {
+		u64 devid;
+		u64 phys;
+
+		devid = btrfs_raid_stride_devid(leaf, &extent->strides[i]);
+		btrfs_set_stack_raid_stride_devid(&new->strides[i], devid);
+
+		phys = btrfs_raid_stride_physical(leaf, &extent->strides[i]);
+		phys += frontpad;
+		btrfs_set_stack_raid_stride_physical(&new->strides[i], phys);
+	}
+
+	ret = btrfs_del_item(trans, stripe_root, path);
+	if (ret)
+		goto out;
+
+	btrfs_release_path(path);
+	ret = btrfs_insert_item(trans, stripe_root, &newkey, new, item_size);
+
+ out:
+	kfree(new);
+	return ret;
+}
+
 int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 length)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
@@ -43,9 +91,8 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 			break;
 		if (ret > 0) {
 			ret = 0;
-			if (path->slots[0] == 0)
-				break;
-			path->slots[0]--;
+			if (path->slots[0] > 0)
+				path->slots[0]--;
 		}
 
 		leaf = path->nodes[0];
@@ -61,7 +108,37 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 		trace_btrfs_raid_extent_delete(fs_info, start, end,
 					       found_start, found_end);
 
-		ASSERT(found_start >= start && found_end <= end);
+		/*
+		 * The stripe extent starts before the range we want to delete:
+		 *
+		 * |--- RAID Stripe Extent ---|
+		 * |--- keep  ---|--- drop ---|
+		 *
+		 * This means we have to duplicate the tree item, truncate the
+		 * length to the new size and then re-insert the item.
+		 */
+		if (found_start < start) {
+			ret = btrfs_partially_delete_raid_extent(trans, path, &key,
+							start - found_start, 0);
+			break;
+		}
+
+		/*
+		 * The stripe extent ends after the range we want to delete:
+		 *
+		 * |--- RAID Stripe Extent ---|
+		 * |--- drop  ---|--- keep ---|
+		 * This means we have to duplicate the tree item, truncate the
+		 * length to the new size and then re-insert the item.
+		 */
+		if (found_end > end) {
+			u64 diff = found_end - end;
+
+			ret = btrfs_partially_delete_raid_extent(trans, path, &key,
+								 diff, diff);
+			break;
+		}
+
 		ret = btrfs_del_item(trans, stripe_root, path);
 		if (ret)
 			break;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 2/2] btrfs: implement self-tests for partial RAID srtipe-tree delete
  2024-10-09 15:30 [PATCH v3 0/2] implement truncation for RAID stripe-extents Johannes Thumshirn
  2024-10-09 15:30 ` [PATCH v3 1/2] btrfs: implement partial deletion of RAID stripe extents Johannes Thumshirn
@ 2024-10-09 15:30 ` Johannes Thumshirn
  1 sibling, 0 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2024-10-09 15:30 UTC (permalink / raw)
  To: David Sterba, Josef Bacik
  Cc: Filipe Manana, Naohiro Aota, linux-btrfs, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Implement self-tests for partial deletion of RAID stripe-tree entries.

These two new tests cover both the deletion of the front of a RAID
stripe-tree stripe extent as well as truncation of an item to make it
smaller.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/tests/raid-stripe-tree-tests.c | 223 ++++++++++++++++++++++++
 1 file changed, 223 insertions(+)

diff --git a/fs/btrfs/tests/raid-stripe-tree-tests.c b/fs/btrfs/tests/raid-stripe-tree-tests.c
index b8013ab13c43..dafa6a9addee 100644
--- a/fs/btrfs/tests/raid-stripe-tree-tests.c
+++ b/fs/btrfs/tests/raid-stripe-tree-tests.c
@@ -29,6 +29,227 @@ static struct btrfs_device *btrfs_device_by_devid(struct btrfs_fs_devices *fs_de
 	return NULL;
 }
 
+/*
+ * Test a 64K RST write on a 2 disk RAID1 at a logical address of 1M and then
+ * delete the 1st 32K, making the new start address 1M+32K.
+ */
+static int test_front_delete(struct btrfs_trans_handle *trans)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_io_context *bioc;
+	struct btrfs_io_stripe io_stripe = { 0 };
+	u64 map_type = RST_TEST_RAID1_TYPE;
+	u64 logical = SZ_1M;
+	u64 len = SZ_64K;
+	int ret;
+
+	bioc = alloc_btrfs_io_context(fs_info, logical, RST_TEST_NUM_DEVICES);
+	if (!bioc) {
+		test_std_err(TEST_ALLOC_IO_CONTEXT);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	io_stripe.dev = btrfs_device_by_devid(fs_info->fs_devices, 0);
+	bioc->map_type = map_type;
+	bioc->size = len;
+
+	for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
+		struct btrfs_io_stripe *stripe = &bioc->stripes[i];
+
+		stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
+		if (!stripe->dev) {
+			test_err("cannot find device with devid %d", i);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		stripe->physical = logical + i * SZ_1G;
+	}
+
+	ret = btrfs_insert_one_raid_extent(trans, bioc);
+	if (ret) {
+		test_err("inserting RAID extent failed: %d", ret);
+		goto out;
+	}
+
+	ret = btrfs_get_raid_extent_offset(fs_info, logical, &len, map_type, 0,
+					   &io_stripe);
+	if (ret) {
+		test_err("lookup of RAID extent [%llu, %llu] failed", logical,
+			 logical + len);
+		goto out;
+	}
+
+	if (io_stripe.physical != logical) {
+		test_err("invalid physical address, expected %llu got %llu",
+			 logical, io_stripe.physical);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (len != SZ_64K) {
+		test_err("invalid stripe length, expected %llu got %llu",
+			 (u64)SZ_64K, len);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = btrfs_delete_raid_extent(trans, logical, SZ_32K);
+	if (ret) {
+		test_err("deleting RAID extent [%llu, %llu] failed", logical,
+			 logical + SZ_32K);
+		goto out;
+	}
+
+	len = SZ_32K;
+	ret = btrfs_get_raid_extent_offset(fs_info, logical + SZ_32K, &len,
+					   map_type, 0, &io_stripe);
+	if (ret) {
+		test_err("lookup of RAID extent [%llu, %llu] failed",
+			 logical + SZ_32K, logical + SZ_32K + len);
+		goto out;
+	}
+
+	if (io_stripe.physical != logical + SZ_32K) {
+		test_err("invalid physical address, expected %llu, got %llu",
+			 logical + SZ_32K, io_stripe.physical);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (len != SZ_32K) {
+		test_err("invalid stripe length, expected %llu, got %llu",
+			 (u64)SZ_32K, len);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = btrfs_get_raid_extent_offset(fs_info, logical, &len, map_type, 0,
+					   &io_stripe);
+	if (!ret) {
+		ret = -EINVAL;
+		test_err("lookup of RAID extent [%llu, %llu] succeeded, should fail",
+			 logical, logical + len);
+		goto out;
+	}
+
+	ret = btrfs_delete_raid_extent(trans, logical + SZ_32K, SZ_32K);
+ out:
+	return ret;
+}
+
+/*
+ * Test a 64K RST write on a 2 disk RAID1 at a logical address of 1M and then
+ * truncate the stripe extent down to 32K.
+ */
+static int test_tail_delete(struct btrfs_trans_handle *trans)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_io_context *bioc;
+	struct btrfs_io_stripe io_stripe = { 0 };
+	u64 map_type = RST_TEST_RAID1_TYPE;
+	u64 logical = SZ_1M;
+	u64 len = SZ_64K;
+	int ret;
+
+	bioc = alloc_btrfs_io_context(fs_info, logical, RST_TEST_NUM_DEVICES);
+	if (!bioc) {
+		test_std_err(TEST_ALLOC_IO_CONTEXT);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	io_stripe.dev = btrfs_device_by_devid(fs_info->fs_devices, 0);
+	bioc->map_type = map_type;
+	bioc->size = len;
+
+	for (int i = 0; i < RST_TEST_NUM_DEVICES; i++) {
+		struct btrfs_io_stripe *stripe = &bioc->stripes[i];
+
+		stripe->dev = btrfs_device_by_devid(fs_info->fs_devices, i);
+		if (!stripe->dev) {
+			test_err("cannot find device with devid %d", i);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		stripe->physical = logical + i * SZ_1G;
+	}
+
+	ret = btrfs_insert_one_raid_extent(trans, bioc);
+	if (ret) {
+		test_err("inserting RAID extent failed: %d", ret);
+		goto out;
+	}
+
+	io_stripe.dev = btrfs_device_by_devid(fs_info->fs_devices, 0);
+	if (!io_stripe.dev) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = btrfs_get_raid_extent_offset(fs_info, logical, &len, map_type, 0,
+					   &io_stripe);
+	if (ret) {
+		test_err("lookup of RAID extent [%llu, %llu] failed", logical,
+			 logical + len);
+		goto out;
+	}
+
+	if (io_stripe.physical != logical) {
+		test_err("invalid physical address, expected %llu got %llu",
+			 logical, io_stripe.physical);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (len != SZ_64K) {
+		test_err("invalid stripe length, expected %llu got %llu",
+			 (u64)SZ_64K, len);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = btrfs_delete_raid_extent(trans, logical + SZ_32K, SZ_32K);
+	if (ret) {
+		test_err("deleting RAID extent [%llu, %llu] failed",
+			 logical + SZ_32K, logical + SZ_64K);
+		goto out;
+	}
+
+	len = SZ_32K;
+	ret = btrfs_get_raid_extent_offset(fs_info, logical, &len, map_type, 0, &io_stripe);
+	if (ret) {
+		test_err("lookup of RAID extent [%llu, %llu] failed", logical,
+			 logical + len);
+		goto out;
+	}
+
+	if (io_stripe.physical != logical) {
+		test_err("invalid physical address, expected %llu, got %llu",
+			 logical, io_stripe.physical);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (len != SZ_32K) {
+		test_err("invalid stripe length, expected %llu, got %llu",
+			 (u64)SZ_32K, len);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = btrfs_delete_raid_extent(trans, logical, len);
+	if (ret)
+		test_err("deleting RAID extent [%llu, %llu] failed", logical,
+			 logical + len);
+
+out:
+	btrfs_put_bioc(bioc);
+	return ret;
+}
+
 /*
  * Test a 64K RST write on a 2 disk RAID1 at a logical address of 1M and then
  * overwrite the whole range giving it new physical address at an offset of 1G.
@@ -235,6 +456,8 @@ static int test_simple_create_delete(struct btrfs_trans_handle *trans)
 static const test_func_t tests[] = {
 	test_simple_create_delete,
 	test_create_update_delete,
+	test_tail_delete,
+	test_front_delete,
 };
 
 static int run_test(test_func_t test, u32 sectorsize, u32 nodesize)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 1/2] btrfs: implement partial deletion of RAID stripe extents
  2024-10-09 15:30 ` [PATCH v3 1/2] btrfs: implement partial deletion of RAID stripe extents Johannes Thumshirn
@ 2024-10-09 16:15   ` Johannes Thumshirn
  2024-10-09 16:41   ` Filipe Manana
  1 sibling, 0 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2024-10-09 16:15 UTC (permalink / raw)
  To: Johannes Thumshirn, David Sterba, Josef Bacik
  Cc: Filipe Manana, Naohiro Aota, linux-btrfs@vger.kernel.org

On 09.10.24 17:32, Johannes Thumshirn wrote:
> @@ -43,9 +91,8 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>   			break;
>   		if (ret > 0) {
>   			ret = 0;
> -			if (path->slots[0] == 0)
> -				break;
> -			path->slots[0]--;
> +			if (path->slots[0] > 0)
> +				path->slots[0]--;
>   		}
>   
>   		leaf = path->nodes[0];

That part is wrong, will send a v4 shortly.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 1/2] btrfs: implement partial deletion of RAID stripe extents
  2024-10-09 15:30 ` [PATCH v3 1/2] btrfs: implement partial deletion of RAID stripe extents Johannes Thumshirn
  2024-10-09 16:15   ` Johannes Thumshirn
@ 2024-10-09 16:41   ` Filipe Manana
  2024-10-10  5:55     ` Johannes Thumshirn
  1 sibling, 1 reply; 6+ messages in thread
From: Filipe Manana @ 2024-10-09 16:41 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: David Sterba, Josef Bacik, Filipe Manana, Naohiro Aota,
	linux-btrfs, Johannes Thumshirn

On Wed, Oct 9, 2024 at 5:00 PM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> In our CI system, the RAID stripe tree configuration sometimes fails with
> the following ASSERT():
>
>  assertion failed: found_start >= start && found_end <= end, in fs/btrfs/raid-stripe-tree.c:64
>
> This ASSERT()ion triggers, because for the initial design of RAID
> stripe-tree, I had the "one ordered-extent equals one bio" rule of zoned
> btrfs in mind.
>
> But for a RAID stripe-tree based system, that is not hosted on a zoned
> storage device, but on a regular device this rule doesn't apply.
>
> So in case the range we want to delete starts in the middle of the
> previous item, grab the item and "truncate" it's length. That is, clone
> the item, subtract the deleted portion from the key's offset, delete the
> old item and insert the new one.
>
> In case the range to delete ends in the middle of an item, we have to
> adjust both the item's key as well as the stripe extents and then
> re-insert the modified clone into the tree after deleting the old stripe
> extent.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/raid-stripe-tree.c | 85 +++++++++++++++++++++++++++++++++++--
>  1 file changed, 81 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
> index 41970bbdb05f..40cc0a392be2 100644
> --- a/fs/btrfs/raid-stripe-tree.c
> +++ b/fs/btrfs/raid-stripe-tree.c
> @@ -13,6 +13,54 @@
>  #include "volumes.h"
>  #include "print-tree.h"
>
> +static int btrfs_partially_delete_raid_extent(struct btrfs_trans_handle *trans,
> +                                             struct btrfs_path *path,
> +                                             struct btrfs_key *oldkey,
> +                                             u64 newlen, u64 frontpad)
> +{
> +       struct btrfs_root *stripe_root = trans->fs_info->stripe_root;
> +       struct btrfs_stripe_extent *extent, *new;
> +       struct extent_buffer *leaf = path->nodes[0];
> +       int slot = path->slots[0];
> +       const size_t item_size = btrfs_item_size(leaf, slot);
> +       struct btrfs_key newkey;
> +       int ret;
> +       int i;
> +
> +       new = kzalloc(item_size, GFP_NOFS);
> +       if (!new)
> +               return -ENOMEM;
> +
> +       memcpy(&newkey, oldkey, sizeof(struct btrfs_key));
> +       newkey.objectid += frontpad;
> +       newkey.offset -= newlen;
> +
> +       extent = btrfs_item_ptr(leaf, slot, struct btrfs_stripe_extent);
> +
> +       for (i = 0; i < btrfs_num_raid_stripes(item_size); i++) {

The loop variable could be declared here in the for expression, as
it's not used anywhere outside it.

> +               u64 devid;
> +               u64 phys;
> +
> +               devid = btrfs_raid_stride_devid(leaf, &extent->strides[i]);
> +               btrfs_set_stack_raid_stride_devid(&new->strides[i], devid);
> +
> +               phys = btrfs_raid_stride_physical(leaf, &extent->strides[i]);
> +               phys += frontpad;
> +               btrfs_set_stack_raid_stride_physical(&new->strides[i], phys);
> +       }
> +
> +       ret = btrfs_del_item(trans, stripe_root, path);
> +       if (ret)
> +               goto out;
> +
> +       btrfs_release_path(path);
> +       ret = btrfs_insert_item(trans, stripe_root, &newkey, new, item_size);

So instead of doing a deletion followed by an insertion, which implies
two searches in the btree and occasional node/leaf merges and splits,
can't we do this in a single search?
By adjusting item keys, updating items and duplicating them (followed
by updating them), similar to what we do at btrfs_drop_extents() for
example.
Otherwise this may result in very high lock contention and extra work.

It's ok for an initial implementation and can be improved later, but I
was just curious if there's any reason besides simplicity for now.

Thanks.

> +
> + out:
> +       kfree(new);
> +       return ret;
> +}
> +
>  int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 length)
>  {
>         struct btrfs_fs_info *fs_info = trans->fs_info;
> @@ -43,9 +91,8 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>                         break;
>                 if (ret > 0) {
>                         ret = 0;
> -                       if (path->slots[0] == 0)
> -                               break;
> -                       path->slots[0]--;
> +                       if (path->slots[0] > 0)
> +                               path->slots[0]--;
>                 }
>
>                 leaf = path->nodes[0];
> @@ -61,7 +108,37 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>                 trace_btrfs_raid_extent_delete(fs_info, start, end,
>                                                found_start, found_end);
>
> -               ASSERT(found_start >= start && found_end <= end);
> +               /*
> +                * The stripe extent starts before the range we want to delete:
> +                *
> +                * |--- RAID Stripe Extent ---|
> +                * |--- keep  ---|--- drop ---|
> +                *
> +                * This means we have to duplicate the tree item, truncate the
> +                * length to the new size and then re-insert the item.
> +                */
> +               if (found_start < start) {
> +                       ret = btrfs_partially_delete_raid_extent(trans, path, &key,
> +                                                       start - found_start, 0);
> +                       break;
> +               }
> +
> +               /*
> +                * The stripe extent ends after the range we want to delete:
> +                *
> +                * |--- RAID Stripe Extent ---|
> +                * |--- drop  ---|--- keep ---|
> +                * This means we have to duplicate the tree item, truncate the
> +                * length to the new size and then re-insert the item.
> +                */
> +               if (found_end > end) {
> +                       u64 diff = found_end - end;
> +
> +                       ret = btrfs_partially_delete_raid_extent(trans, path, &key,
> +                                                                diff, diff);
> +                       break;
> +               }
>                 ret = btrfs_del_item(trans, stripe_root, path);
>                 if (ret)
>                         break;
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 1/2] btrfs: implement partial deletion of RAID stripe extents
  2024-10-09 16:41   ` Filipe Manana
@ 2024-10-10  5:55     ` Johannes Thumshirn
  0 siblings, 0 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2024-10-10  5:55 UTC (permalink / raw)
  To: Filipe Manana, Johannes Thumshirn
  Cc: David Sterba, Josef Bacik, Filipe Manana, Naohiro Aota,
	linux-btrfs@vger.kernel.org

On 09.10.24 18:42, Filipe Manana wrote:

>>
>> +static int btrfs_partially_delete_raid_extent(struct btrfs_trans_handle *trans,
>> +                                             struct btrfs_path *path,
>> +                                             struct btrfs_key *oldkey,
>> +                                             u64 newlen, u64 frontpad)
>> +{
>> +       struct btrfs_root *stripe_root = trans->fs_info->stripe_root;
>> +       struct btrfs_stripe_extent *extent, *new;
>> +       struct extent_buffer *leaf = path->nodes[0];
>> +       int slot = path->slots[0];
>> +       const size_t item_size = btrfs_item_size(leaf, slot);
>> +       struct btrfs_key newkey;
>> +       int ret;
>> +       int i;
>> +
>> +       new = kzalloc(item_size, GFP_NOFS);
>> +       if (!new)
>> +               return -ENOMEM;
>> +
>> +       memcpy(&newkey, oldkey, sizeof(struct btrfs_key));
>> +       newkey.objectid += frontpad;
>> +       newkey.offset -= newlen;
>> +
>> +       extent = btrfs_item_ptr(leaf, slot, struct btrfs_stripe_extent);
>> +
>> +       for (i = 0; i < btrfs_num_raid_stripes(item_size); i++) {
> 
> The loop variable could be declared here in the for expression, as
> it's not used anywhere outside it.

yup will fix that up.

>> +               u64 devid;
>> +               u64 phys;
>> +
>> +               devid = btrfs_raid_stride_devid(leaf, &extent->strides[i]);
>> +               btrfs_set_stack_raid_stride_devid(&new->strides[i], devid);
>> +
>> +               phys = btrfs_raid_stride_physical(leaf, &extent->strides[i]);
>> +               phys += frontpad;
>> +               btrfs_set_stack_raid_stride_physical(&new->strides[i], phys);
>> +       }
>> +
>> +       ret = btrfs_del_item(trans, stripe_root, path);
>> +       if (ret)
>> +               goto out;
>> +
>> +       btrfs_release_path(path);
>> +       ret = btrfs_insert_item(trans, stripe_root, &newkey, new, item_size);
> 
> So instead of doing a deletion followed by an insertion, which implies
> two searches in the btree and occasional node/leaf merges and splits,
> can't we do this in a single search?
> By adjusting item keys, updating items and duplicating them (followed
> by updating them), similar to what we do at btrfs_drop_extents() for
> example.
> Otherwise this may result in very high lock contention and extra work.
> 
> It's ok for an initial implementation and can be improved later, but I
> was just curious if there's any reason besides simplicity for now.


I did have a version using btrfs_duplicate_item() and dropped it again. 
But yes sure I can resurrect it.

But firstly I have to find out why both of these (- and +) are buggy.

-			if (path->slots[0] == 0)
-				break;
-			path->slots[0]--;
+			if (path->slots[0] > 0)
+				path->slots[0]--;
  		
The '-' version passes xfstests but not the selftest (as it's the 1st 
item in the tree, so it doesn't find it and bail out), the '+' version 
passes the selftest but gives FS data corruption on xfstests, because it 
deletes the wrong data.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-10-10  5:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-09 15:30 [PATCH v3 0/2] implement truncation for RAID stripe-extents Johannes Thumshirn
2024-10-09 15:30 ` [PATCH v3 1/2] btrfs: implement partial deletion of RAID stripe extents Johannes Thumshirn
2024-10-09 16:15   ` Johannes Thumshirn
2024-10-09 16:41   ` Filipe Manana
2024-10-10  5:55     ` Johannes Thumshirn
2024-10-09 15:30 ` [PATCH v3 2/2] btrfs: implement self-tests for partial RAID srtipe-tree delete Johannes Thumshirn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).