* [PATCH 0/3] btrfs: rst: updates for RAID stripe tree
@ 2024-06-10 8:40 Johannes Thumshirn
2024-06-10 8:40 ` [PATCH 1/3] btrfs: rst: remove encoding field from stripe_extent Johannes Thumshirn
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2024-06-10 8:40 UTC (permalink / raw)
To: Chris Mason, Josef Bacik, David Sterba
Cc: linux-btrfs, linux-kernel, Johannes Thumshirn
Three independent updates for RAID stripe tree.
The 1st one removes pointless space from the on-disk format. As the
feature itself is still experimental I'd like to get rid of that as early
as possible.
Patch 2 replaces stripe extents in case we hit a EEXIST when inserting a
stripe extent on a write. This can happen i.e. on device-replace.
Patch 3 splits a stripe extent on partial delete of a stripe.
---
Johannes Thumshirn (2):
btrfs: rst: remove encoding field from stripe_extent
btrfs: replace stripe extents
JohnnesThumshirn (1):
btrfs: split RAID stripes on deletion
fs/btrfs/accessors.h | 3 -
fs/btrfs/ctree.c | 1 +
fs/btrfs/print-tree.c | 5 --
fs/btrfs/raid-stripe-tree.c | 148 ++++++++++++++++++++++++++++++----------
fs/btrfs/raid-stripe-tree.h | 3 +-
fs/btrfs/tree-checker.c | 19 ------
include/uapi/linux/btrfs_tree.h | 14 +---
7 files changed, 114 insertions(+), 79 deletions(-)
---
base-commit: e361635b966fca48f92263277ff38cd5a1971d39
change-id: 20240610-b4-rst-updates-d0aa696b9d5a
Best regards,
--
Johannes Thumshirn <jth@kernel.org>
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH 1/3] btrfs: rst: remove encoding field from stripe_extent 2024-06-10 8:40 [PATCH 0/3] btrfs: rst: updates for RAID stripe tree Johannes Thumshirn @ 2024-06-10 8:40 ` Johannes Thumshirn 2024-06-11 14:36 ` David Sterba 2024-06-17 6:27 ` Qu Wenruo 2024-06-10 8:40 ` [PATCH 2/3] btrfs: replace stripe extents Johannes Thumshirn 2024-06-10 8:40 ` [PATCH 3/3] btrfs: split RAID stripes on deletion Johannes Thumshirn 2 siblings, 2 replies; 14+ messages in thread From: Johannes Thumshirn @ 2024-06-10 8:40 UTC (permalink / raw) To: Chris Mason, Josef Bacik, David Sterba Cc: linux-btrfs, linux-kernel, Johannes Thumshirn From: Johannes Thumshirn <johannes.thumshirn@wdc.com> Remove the encoding field from 'struct btrfs_stripe_extent'. It was originally intended to encode the RAID type as well as if we're a data or a parity stripe. But the RAID type can be inferred form the block-group and the data vs. parity differentiation can be done easier with adding a new key type for parity stripes in the RAID stripe tree. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> --- fs/btrfs/accessors.h | 3 --- fs/btrfs/print-tree.c | 5 ----- fs/btrfs/raid-stripe-tree.c | 13 ------------- fs/btrfs/raid-stripe-tree.h | 3 +-- fs/btrfs/tree-checker.c | 19 ------------------- include/uapi/linux/btrfs_tree.h | 14 +------------- 6 files changed, 2 insertions(+), 55 deletions(-) diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h index 6c3deaa3e878..b2eb9cde2c5d 100644 --- a/fs/btrfs/accessors.h +++ b/fs/btrfs/accessors.h @@ -315,11 +315,8 @@ BTRFS_SETGET_FUNCS(timespec_nsec, struct btrfs_timespec, nsec, 32); BTRFS_SETGET_STACK_FUNCS(stack_timespec_sec, struct btrfs_timespec, sec, 64); BTRFS_SETGET_STACK_FUNCS(stack_timespec_nsec, struct btrfs_timespec, nsec, 32); -BTRFS_SETGET_FUNCS(stripe_extent_encoding, struct btrfs_stripe_extent, encoding, 8); BTRFS_SETGET_FUNCS(raid_stride_devid, struct btrfs_raid_stride, devid, 64); BTRFS_SETGET_FUNCS(raid_stride_physical, struct btrfs_raid_stride, physical, 64); -BTRFS_SETGET_STACK_FUNCS(stack_stripe_extent_encoding, - struct btrfs_stripe_extent, encoding, 8); BTRFS_SETGET_STACK_FUNCS(stack_raid_stride_devid, struct btrfs_raid_stride, devid, 64); BTRFS_SETGET_STACK_FUNCS(stack_raid_stride_physical, struct btrfs_raid_stride, physical, 64); diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c index 7e46aa8a0444..9f1e5e11bf71 100644 --- a/fs/btrfs/print-tree.c +++ b/fs/btrfs/print-tree.c @@ -208,11 +208,6 @@ static void print_raid_stripe_key(const struct extent_buffer *eb, u32 item_size, struct btrfs_stripe_extent *stripe) { const int num_stripes = btrfs_num_raid_stripes(item_size); - const u8 encoding = btrfs_stripe_extent_encoding(eb, stripe); - - pr_info("\t\t\tencoding: %s\n", - (encoding && encoding < BTRFS_NR_RAID_TYPES) ? - btrfs_raid_array[encoding].raid_name : "unknown"); for (int i = 0; i < num_stripes; i++) pr_info("\t\t\tstride %d devid %llu physical %llu\n", diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c index 6af6b4b9a32e..e6f7a234b8f6 100644 --- a/fs/btrfs/raid-stripe-tree.c +++ b/fs/btrfs/raid-stripe-tree.c @@ -80,7 +80,6 @@ static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans, struct btrfs_key stripe_key; struct btrfs_root *stripe_root = fs_info->stripe_root; const int num_stripes = btrfs_bg_type_to_factor(bioc->map_type); - u8 encoding = btrfs_bg_flags_to_raid_index(bioc->map_type); struct btrfs_stripe_extent *stripe_extent; const size_t item_size = struct_size(stripe_extent, strides, num_stripes); int ret; @@ -94,7 +93,6 @@ static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans, trace_btrfs_insert_one_raid_extent(fs_info, bioc->logical, bioc->size, num_stripes); - btrfs_set_stack_stripe_extent_encoding(stripe_extent, encoding); for (int i = 0; i < num_stripes; i++) { u64 devid = bioc->stripes[i].dev->devid; u64 physical = bioc->stripes[i].physical; @@ -159,7 +157,6 @@ int btrfs_get_raid_extent_offset(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf; const u64 end = logical + *length; int num_stripes; - u8 encoding; u64 offset; u64 found_logical; u64 found_length; @@ -222,16 +219,6 @@ int btrfs_get_raid_extent_offset(struct btrfs_fs_info *fs_info, num_stripes = btrfs_num_raid_stripes(btrfs_item_size(leaf, slot)); stripe_extent = btrfs_item_ptr(leaf, slot, struct btrfs_stripe_extent); - encoding = btrfs_stripe_extent_encoding(leaf, stripe_extent); - - if (encoding != btrfs_bg_flags_to_raid_index(map_type)) { - ret = -EUCLEAN; - btrfs_handle_fs_error(fs_info, ret, - "on-disk stripe encoding %d doesn't match RAID index %d", - encoding, - btrfs_bg_flags_to_raid_index(map_type)); - goto out; - } for (int i = 0; i < num_stripes; i++) { struct btrfs_raid_stride *stride = &stripe_extent->strides[i]; diff --git a/fs/btrfs/raid-stripe-tree.h b/fs/btrfs/raid-stripe-tree.h index c9c258f84903..1ac1c21aac2f 100644 --- a/fs/btrfs/raid-stripe-tree.h +++ b/fs/btrfs/raid-stripe-tree.h @@ -48,8 +48,7 @@ static inline bool btrfs_need_stripe_tree_update(struct btrfs_fs_info *fs_info, static inline int btrfs_num_raid_stripes(u32 item_size) { - return (item_size - offsetof(struct btrfs_stripe_extent, strides)) / - sizeof(struct btrfs_raid_stride); + return item_size / sizeof(struct btrfs_raid_stride); } #endif diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index a2c3651a3d8f..1e140f6dabc6 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -1682,9 +1682,6 @@ static int check_inode_ref(struct extent_buffer *leaf, static int check_raid_stripe_extent(const struct extent_buffer *leaf, const struct btrfs_key *key, int slot) { - struct btrfs_stripe_extent *stripe_extent = - btrfs_item_ptr(leaf, slot, struct btrfs_stripe_extent); - if (unlikely(!IS_ALIGNED(key->objectid, leaf->fs_info->sectorsize))) { generic_err(leaf, slot, "invalid key objectid for raid stripe extent, have %llu expect aligned to %u", @@ -1698,22 +1695,6 @@ static int check_raid_stripe_extent(const struct extent_buffer *leaf, return -EUCLEAN; } - switch (btrfs_stripe_extent_encoding(leaf, stripe_extent)) { - case BTRFS_STRIPE_RAID0: - case BTRFS_STRIPE_RAID1: - case BTRFS_STRIPE_DUP: - case BTRFS_STRIPE_RAID10: - case BTRFS_STRIPE_RAID5: - case BTRFS_STRIPE_RAID6: - case BTRFS_STRIPE_RAID1C3: - case BTRFS_STRIPE_RAID1C4: - break; - default: - generic_err(leaf, slot, "invalid raid stripe encoding %u", - btrfs_stripe_extent_encoding(leaf, stripe_extent)); - return -EUCLEAN; - } - return 0; } diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index d24e8e121507..cb103c76d398 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -747,21 +747,9 @@ struct btrfs_raid_stride { __le64 physical; } __attribute__ ((__packed__)); -/* The stripe_extent::encoding, 1:1 mapping of enum btrfs_raid_types. */ -#define BTRFS_STRIPE_RAID0 1 -#define BTRFS_STRIPE_RAID1 2 -#define BTRFS_STRIPE_DUP 3 -#define BTRFS_STRIPE_RAID10 4 -#define BTRFS_STRIPE_RAID5 5 -#define BTRFS_STRIPE_RAID6 6 -#define BTRFS_STRIPE_RAID1C3 7 -#define BTRFS_STRIPE_RAID1C4 8 - struct btrfs_stripe_extent { - __u8 encoding; - __u8 reserved[7]; /* An array of raid strides this stripe is composed of. */ - struct btrfs_raid_stride strides[]; + __DECLARE_FLEX_ARRAY(struct btrfs_raid_stride, strides); } __attribute__ ((__packed__)); #define BTRFS_HEADER_FLAG_WRITTEN (1ULL << 0) -- 2.43.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] btrfs: rst: remove encoding field from stripe_extent 2024-06-10 8:40 ` [PATCH 1/3] btrfs: rst: remove encoding field from stripe_extent Johannes Thumshirn @ 2024-06-11 14:36 ` David Sterba 2024-06-11 16:33 ` Johannes Thumshirn 2024-06-17 6:27 ` Qu Wenruo 1 sibling, 1 reply; 14+ messages in thread From: David Sterba @ 2024-06-11 14:36 UTC (permalink / raw) To: Johannes Thumshirn Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs, linux-kernel, Johannes Thumshirn On Mon, Jun 10, 2024 at 10:40:25AM +0200, Johannes Thumshirn wrote: > -#define BTRFS_STRIPE_RAID5 5 > -#define BTRFS_STRIPE_RAID6 6 > -#define BTRFS_STRIPE_RAID1C3 7 > -#define BTRFS_STRIPE_RAID1C4 8 > - > struct btrfs_stripe_extent { > - __u8 encoding; > - __u8 reserved[7]; > /* An array of raid strides this stripe is composed of. */ > - struct btrfs_raid_stride strides[]; > + __DECLARE_FLEX_ARRAY(struct btrfs_raid_stride, strides); Is there a reason to use the __ underscore macro? I see no difference between that and DECLARE_FLEX_ARRAY and underscore usually means that it's special in some way. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] btrfs: rst: remove encoding field from stripe_extent 2024-06-11 14:36 ` David Sterba @ 2024-06-11 16:33 ` Johannes Thumshirn 2024-06-13 21:23 ` David Sterba 0 siblings, 1 reply; 14+ messages in thread From: Johannes Thumshirn @ 2024-06-11 16:33 UTC (permalink / raw) To: dsterba@suse.cz, Johannes Thumshirn Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org On 11.06.24 16:37, David Sterba wrote: > On Mon, Jun 10, 2024 at 10:40:25AM +0200, Johannes Thumshirn wrote: >> -#define BTRFS_STRIPE_RAID5 5 >> -#define BTRFS_STRIPE_RAID6 6 >> -#define BTRFS_STRIPE_RAID1C3 7 >> -#define BTRFS_STRIPE_RAID1C4 8 >> - >> struct btrfs_stripe_extent { >> - __u8 encoding; >> - __u8 reserved[7]; >> /* An array of raid strides this stripe is composed of. */ >> - struct btrfs_raid_stride strides[]; >> + __DECLARE_FLEX_ARRAY(struct btrfs_raid_stride, strides); > > Is there a reason to use the __ underscore macro? I see no difference > between that and DECLARE_FLEX_ARRAY and underscore usually means that > it's special in some way. > Yes, the __ version is for UAPI, like __u8 or __le32 and so on. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] btrfs: rst: remove encoding field from stripe_extent 2024-06-11 16:33 ` Johannes Thumshirn @ 2024-06-13 21:23 ` David Sterba 2024-06-14 9:36 ` Johannes Thumshirn 0 siblings, 1 reply; 14+ messages in thread From: David Sterba @ 2024-06-13 21:23 UTC (permalink / raw) To: Johannes Thumshirn Cc: dsterba@suse.cz, Johannes Thumshirn, Chris Mason, Josef Bacik, David Sterba, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org On Tue, Jun 11, 2024 at 04:33:19PM +0000, Johannes Thumshirn wrote: > On 11.06.24 16:37, David Sterba wrote: > > On Mon, Jun 10, 2024 at 10:40:25AM +0200, Johannes Thumshirn wrote: > >> -#define BTRFS_STRIPE_RAID5 5 > >> -#define BTRFS_STRIPE_RAID6 6 > >> -#define BTRFS_STRIPE_RAID1C3 7 > >> -#define BTRFS_STRIPE_RAID1C4 8 > >> - > >> struct btrfs_stripe_extent { > >> - __u8 encoding; > >> - __u8 reserved[7]; > >> /* An array of raid strides this stripe is composed of. */ > >> - struct btrfs_raid_stride strides[]; > >> + __DECLARE_FLEX_ARRAY(struct btrfs_raid_stride, strides); > > > > Is there a reason to use the __ underscore macro? I see no difference > > between that and DECLARE_FLEX_ARRAY and underscore usually means that > > it's special in some way. > > > > Yes, the __ version is for UAPI, like __u8 or __le32 and so on. I see, though I'd rather keep the on-disk definitions free of wrappers that hide the types. We use the __ int types but that's all and quite clear what it means. There already are flexible members (btrfs_leaf, btrfs_node, btrfs_inode_extref), using the empty[] syntax. The macro wraps the distinction that c++ needs but so far the existing declarations have't been problematic. So I'd rather keep the declarations consistent. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] btrfs: rst: remove encoding field from stripe_extent 2024-06-13 21:23 ` David Sterba @ 2024-06-14 9:36 ` Johannes Thumshirn 2024-06-16 18:19 ` David Sterba 0 siblings, 1 reply; 14+ messages in thread From: Johannes Thumshirn @ 2024-06-14 9:36 UTC (permalink / raw) To: dsterba@suse.cz Cc: Johannes Thumshirn, Chris Mason, Josef Bacik, David Sterba, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org On 13.06.24 23:23, David Sterba wrote: > On Tue, Jun 11, 2024 at 04:33:19PM +0000, Johannes Thumshirn wrote: >> On 11.06.24 16:37, David Sterba wrote: >>> On Mon, Jun 10, 2024 at 10:40:25AM +0200, Johannes Thumshirn wrote: >>>> -#define BTRFS_STRIPE_RAID5 5 >>>> -#define BTRFS_STRIPE_RAID6 6 >>>> -#define BTRFS_STRIPE_RAID1C3 7 >>>> -#define BTRFS_STRIPE_RAID1C4 8 >>>> - >>>> struct btrfs_stripe_extent { >>>> - __u8 encoding; >>>> - __u8 reserved[7]; >>>> /* An array of raid strides this stripe is composed of. */ >>>> - struct btrfs_raid_stride strides[]; >>>> + __DECLARE_FLEX_ARRAY(struct btrfs_raid_stride, strides); >>> >>> Is there a reason to use the __ underscore macro? I see no difference >>> between that and DECLARE_FLEX_ARRAY and underscore usually means that >>> it's special in some way. >>> >> >> Yes, the __ version is for UAPI, like __u8 or __le32 and so on. > > I see, though I'd rather keep the on-disk definitions free of wrappers > that hide the types. We use the __ int types but that's all and quite > clear what it means. > > There already are flexible members (btrfs_leaf, btrfs_node, > btrfs_inode_extref), using the empty[] syntax. The macro wraps the > distinction that c++ needs but so far the existing declarations have't > been problematic. So I'd rather keep the declarations consistent. > Yes but all these examples have other members as well. After this patch, btrfs_stripe_extent is a container for btrfs_raid_stride, and C doesn't allow a flexmember only struct: In file included from fs/btrfs/ctree.h:18, from fs/btrfs/delayed-inode.h:19, from fs/btrfs/super.c:32: ./include/uapi/linux/btrfs_tree.h:753:34: error: flexible array member in a struct with no named members 753 | struct btrfs_raid_stride strides[]; | ^~~~~~~ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] btrfs: rst: remove encoding field from stripe_extent 2024-06-14 9:36 ` Johannes Thumshirn @ 2024-06-16 18:19 ` David Sterba 0 siblings, 0 replies; 14+ messages in thread From: David Sterba @ 2024-06-16 18:19 UTC (permalink / raw) To: Johannes Thumshirn Cc: Johannes Thumshirn, Chris Mason, Josef Bacik, David Sterba, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org On Fri, Jun 14, 2024 at 09:36:34AM +0000, Johannes Thumshirn wrote: > On 13.06.24 23:23, David Sterba wrote: > > On Tue, Jun 11, 2024 at 04:33:19PM +0000, Johannes Thumshirn wrote: > >> On 11.06.24 16:37, David Sterba wrote: > >>> On Mon, Jun 10, 2024 at 10:40:25AM +0200, Johannes Thumshirn wrote: > >>>> -#define BTRFS_STRIPE_RAID5 5 > >>>> -#define BTRFS_STRIPE_RAID6 6 > >>>> -#define BTRFS_STRIPE_RAID1C3 7 > >>>> -#define BTRFS_STRIPE_RAID1C4 8 > >>>> - > >>>> struct btrfs_stripe_extent { > >>>> - __u8 encoding; > >>>> - __u8 reserved[7]; > >>>> /* An array of raid strides this stripe is composed of. */ > >>>> - struct btrfs_raid_stride strides[]; > >>>> + __DECLARE_FLEX_ARRAY(struct btrfs_raid_stride, strides); > >>> > >>> Is there a reason to use the __ underscore macro? I see no difference > >>> between that and DECLARE_FLEX_ARRAY and underscore usually means that > >>> it's special in some way. > >>> > >> > >> Yes, the __ version is for UAPI, like __u8 or __le32 and so on. > > > > I see, though I'd rather keep the on-disk definitions free of wrappers > > that hide the types. We use the __ int types but that's all and quite > > clear what it means. > > > > There already are flexible members (btrfs_leaf, btrfs_node, > > btrfs_inode_extref), using the empty[] syntax. The macro wraps the > > distinction that c++ needs but so far the existing declarations have't > > been problematic. So I'd rather keep the declarations consistent. > > > > Yes but all these examples have other members as well. After this patch, > btrfs_stripe_extent is a container for btrfs_raid_stride, and C doesn't > allow a flexmember only struct: > > In file included from fs/btrfs/ctree.h:18, > from fs/btrfs/delayed-inode.h:19, > from fs/btrfs/super.c:32: > ./include/uapi/linux/btrfs_tree.h:753:34: error: flexible array member > in a struct with no named members > 753 | struct btrfs_raid_stride strides[]; > | ^~~~~~~ To fix that __DECLARE_FLEX_ARRAY adds the layer of an anonymous struct and an empty other member. We'd have to duplicate that so let's use the macro. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] btrfs: rst: remove encoding field from stripe_extent 2024-06-10 8:40 ` [PATCH 1/3] btrfs: rst: remove encoding field from stripe_extent Johannes Thumshirn 2024-06-11 14:36 ` David Sterba @ 2024-06-17 6:27 ` Qu Wenruo 1 sibling, 0 replies; 14+ messages in thread From: Qu Wenruo @ 2024-06-17 6:27 UTC (permalink / raw) To: Johannes Thumshirn, Chris Mason, Josef Bacik, David Sterba Cc: linux-btrfs, linux-kernel, Johannes Thumshirn 在 2024/6/10 18:10, Johannes Thumshirn 写道: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > Remove the encoding field from 'struct btrfs_stripe_extent'. It was > originally intended to encode the RAID type as well as if we're a data > or a parity stripe. > > But the RAID type can be inferred form the block-group and the data vs. > parity differentiation can be done easier with adding a new key type > for parity stripes in the RAID stripe tree. Talking about adding new key type or even new members, I'm wondering can we also utilizing the higher 8/16/32 bits of key.offset? Currently the RST entry uses key.offset for the length of the entry, which is at most the length of the max zone append size. If the vendor (WDC) is determined to go with zone size smaller than 4G (32bits) or 256T (48bits), we definitely have 32 or 16 bits for other usages, without expanding the entry item size. The 4G (32bits) looks a little unreliable but 256T (48bits) is definitely worthy considering. Thanks, Qu > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/accessors.h | 3 --- > fs/btrfs/print-tree.c | 5 ----- > fs/btrfs/raid-stripe-tree.c | 13 ------------- > fs/btrfs/raid-stripe-tree.h | 3 +-- > fs/btrfs/tree-checker.c | 19 ------------------- > include/uapi/linux/btrfs_tree.h | 14 +------------- > 6 files changed, 2 insertions(+), 55 deletions(-) > > diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h > index 6c3deaa3e878..b2eb9cde2c5d 100644 > --- a/fs/btrfs/accessors.h > +++ b/fs/btrfs/accessors.h > @@ -315,11 +315,8 @@ BTRFS_SETGET_FUNCS(timespec_nsec, struct btrfs_timespec, nsec, 32); > BTRFS_SETGET_STACK_FUNCS(stack_timespec_sec, struct btrfs_timespec, sec, 64); > BTRFS_SETGET_STACK_FUNCS(stack_timespec_nsec, struct btrfs_timespec, nsec, 32); > > -BTRFS_SETGET_FUNCS(stripe_extent_encoding, struct btrfs_stripe_extent, encoding, 8); > BTRFS_SETGET_FUNCS(raid_stride_devid, struct btrfs_raid_stride, devid, 64); > BTRFS_SETGET_FUNCS(raid_stride_physical, struct btrfs_raid_stride, physical, 64); > -BTRFS_SETGET_STACK_FUNCS(stack_stripe_extent_encoding, > - struct btrfs_stripe_extent, encoding, 8); > BTRFS_SETGET_STACK_FUNCS(stack_raid_stride_devid, struct btrfs_raid_stride, devid, 64); > BTRFS_SETGET_STACK_FUNCS(stack_raid_stride_physical, struct btrfs_raid_stride, physical, 64); > > diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c > index 7e46aa8a0444..9f1e5e11bf71 100644 > --- a/fs/btrfs/print-tree.c > +++ b/fs/btrfs/print-tree.c > @@ -208,11 +208,6 @@ static void print_raid_stripe_key(const struct extent_buffer *eb, u32 item_size, > struct btrfs_stripe_extent *stripe) > { > const int num_stripes = btrfs_num_raid_stripes(item_size); > - const u8 encoding = btrfs_stripe_extent_encoding(eb, stripe); > - > - pr_info("\t\t\tencoding: %s\n", > - (encoding && encoding < BTRFS_NR_RAID_TYPES) ? > - btrfs_raid_array[encoding].raid_name : "unknown"); > > for (int i = 0; i < num_stripes; i++) > pr_info("\t\t\tstride %d devid %llu physical %llu\n", > diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c > index 6af6b4b9a32e..e6f7a234b8f6 100644 > --- a/fs/btrfs/raid-stripe-tree.c > +++ b/fs/btrfs/raid-stripe-tree.c > @@ -80,7 +80,6 @@ static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans, > struct btrfs_key stripe_key; > struct btrfs_root *stripe_root = fs_info->stripe_root; > const int num_stripes = btrfs_bg_type_to_factor(bioc->map_type); > - u8 encoding = btrfs_bg_flags_to_raid_index(bioc->map_type); > struct btrfs_stripe_extent *stripe_extent; > const size_t item_size = struct_size(stripe_extent, strides, num_stripes); > int ret; > @@ -94,7 +93,6 @@ static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans, > > trace_btrfs_insert_one_raid_extent(fs_info, bioc->logical, bioc->size, > num_stripes); > - btrfs_set_stack_stripe_extent_encoding(stripe_extent, encoding); > for (int i = 0; i < num_stripes; i++) { > u64 devid = bioc->stripes[i].dev->devid; > u64 physical = bioc->stripes[i].physical; > @@ -159,7 +157,6 @@ int btrfs_get_raid_extent_offset(struct btrfs_fs_info *fs_info, > struct extent_buffer *leaf; > const u64 end = logical + *length; > int num_stripes; > - u8 encoding; > u64 offset; > u64 found_logical; > u64 found_length; > @@ -222,16 +219,6 @@ int btrfs_get_raid_extent_offset(struct btrfs_fs_info *fs_info, > > num_stripes = btrfs_num_raid_stripes(btrfs_item_size(leaf, slot)); > stripe_extent = btrfs_item_ptr(leaf, slot, struct btrfs_stripe_extent); > - encoding = btrfs_stripe_extent_encoding(leaf, stripe_extent); > - > - if (encoding != btrfs_bg_flags_to_raid_index(map_type)) { > - ret = -EUCLEAN; > - btrfs_handle_fs_error(fs_info, ret, > - "on-disk stripe encoding %d doesn't match RAID index %d", > - encoding, > - btrfs_bg_flags_to_raid_index(map_type)); > - goto out; > - } > > for (int i = 0; i < num_stripes; i++) { > struct btrfs_raid_stride *stride = &stripe_extent->strides[i]; > diff --git a/fs/btrfs/raid-stripe-tree.h b/fs/btrfs/raid-stripe-tree.h > index c9c258f84903..1ac1c21aac2f 100644 > --- a/fs/btrfs/raid-stripe-tree.h > +++ b/fs/btrfs/raid-stripe-tree.h > @@ -48,8 +48,7 @@ static inline bool btrfs_need_stripe_tree_update(struct btrfs_fs_info *fs_info, > > static inline int btrfs_num_raid_stripes(u32 item_size) > { > - return (item_size - offsetof(struct btrfs_stripe_extent, strides)) / > - sizeof(struct btrfs_raid_stride); > + return item_size / sizeof(struct btrfs_raid_stride); > } > > #endif > diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c > index a2c3651a3d8f..1e140f6dabc6 100644 > --- a/fs/btrfs/tree-checker.c > +++ b/fs/btrfs/tree-checker.c > @@ -1682,9 +1682,6 @@ static int check_inode_ref(struct extent_buffer *leaf, > static int check_raid_stripe_extent(const struct extent_buffer *leaf, > const struct btrfs_key *key, int slot) > { > - struct btrfs_stripe_extent *stripe_extent = > - btrfs_item_ptr(leaf, slot, struct btrfs_stripe_extent); > - > if (unlikely(!IS_ALIGNED(key->objectid, leaf->fs_info->sectorsize))) { > generic_err(leaf, slot, > "invalid key objectid for raid stripe extent, have %llu expect aligned to %u", > @@ -1698,22 +1695,6 @@ static int check_raid_stripe_extent(const struct extent_buffer *leaf, > return -EUCLEAN; > } > > - switch (btrfs_stripe_extent_encoding(leaf, stripe_extent)) { > - case BTRFS_STRIPE_RAID0: > - case BTRFS_STRIPE_RAID1: > - case BTRFS_STRIPE_DUP: > - case BTRFS_STRIPE_RAID10: > - case BTRFS_STRIPE_RAID5: > - case BTRFS_STRIPE_RAID6: > - case BTRFS_STRIPE_RAID1C3: > - case BTRFS_STRIPE_RAID1C4: > - break; > - default: > - generic_err(leaf, slot, "invalid raid stripe encoding %u", > - btrfs_stripe_extent_encoding(leaf, stripe_extent)); > - return -EUCLEAN; > - } > - > return 0; > } > > diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h > index d24e8e121507..cb103c76d398 100644 > --- a/include/uapi/linux/btrfs_tree.h > +++ b/include/uapi/linux/btrfs_tree.h > @@ -747,21 +747,9 @@ struct btrfs_raid_stride { > __le64 physical; > } __attribute__ ((__packed__)); > > -/* The stripe_extent::encoding, 1:1 mapping of enum btrfs_raid_types. */ > -#define BTRFS_STRIPE_RAID0 1 > -#define BTRFS_STRIPE_RAID1 2 > -#define BTRFS_STRIPE_DUP 3 > -#define BTRFS_STRIPE_RAID10 4 > -#define BTRFS_STRIPE_RAID5 5 > -#define BTRFS_STRIPE_RAID6 6 > -#define BTRFS_STRIPE_RAID1C3 7 > -#define BTRFS_STRIPE_RAID1C4 8 > - > struct btrfs_stripe_extent { > - __u8 encoding; > - __u8 reserved[7]; > /* An array of raid strides this stripe is composed of. */ > - struct btrfs_raid_stride strides[]; > + __DECLARE_FLEX_ARRAY(struct btrfs_raid_stride, strides); > } __attribute__ ((__packed__)); > > #define BTRFS_HEADER_FLAG_WRITTEN (1ULL << 0) > ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/3] btrfs: replace stripe extents 2024-06-10 8:40 [PATCH 0/3] btrfs: rst: updates for RAID stripe tree Johannes Thumshirn 2024-06-10 8:40 ` [PATCH 1/3] btrfs: rst: remove encoding field from stripe_extent Johannes Thumshirn @ 2024-06-10 8:40 ` Johannes Thumshirn 2024-06-10 19:43 ` Josef Bacik 2024-06-10 8:40 ` [PATCH 3/3] btrfs: split RAID stripes on deletion Johannes Thumshirn 2 siblings, 1 reply; 14+ messages in thread From: Johannes Thumshirn @ 2024-06-10 8:40 UTC (permalink / raw) To: Chris Mason, Josef Bacik, David Sterba Cc: linux-btrfs, linux-kernel, Johannes Thumshirn From: Johannes Thumshirn <johannes.thumshirn@wdc.com> If we can't insert a stripe extent in the RAID stripe tree, because the key that points to the specific position in the stripe tree is already existing, we have to remove the item and then replace it by a new item. This can happen for example on device replace operations. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> --- fs/btrfs/ctree.c | 1 + fs/btrfs/raid-stripe-tree.c | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 1a49b9232990..ad934c5469c4 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -3844,6 +3844,7 @@ static noinline int setup_leaf_for_split(struct btrfs_trans_handle *trans, btrfs_item_key_to_cpu(leaf, &key, path->slots[0]); BUG_ON(key.type != BTRFS_EXTENT_DATA_KEY && + key.type != BTRFS_RAID_STRIPE_KEY && key.type != BTRFS_EXTENT_CSUM_KEY); if (btrfs_leaf_free_space(leaf) >= ins_len) diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c index e6f7a234b8f6..3020820dd6e2 100644 --- a/fs/btrfs/raid-stripe-tree.c +++ b/fs/btrfs/raid-stripe-tree.c @@ -73,6 +73,37 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le return ret; } +static int replace_raid_extent_item(struct btrfs_trans_handle *trans, + struct btrfs_key *key, + struct btrfs_stripe_extent *stripe_extent, + const size_t item_size) +{ + struct btrfs_fs_info *fs_info = trans->fs_info; + struct btrfs_root *stripe_root = fs_info->stripe_root; + struct btrfs_path *path; + int ret; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + ret = btrfs_search_slot(trans, stripe_root, key, path, -1, 1); + if (ret) + goto err; + + ret = btrfs_del_item(trans, stripe_root, path); + if (ret) + goto err; + + btrfs_free_path(path); + + return btrfs_insert_item(trans, stripe_root, key, stripe_extent, + item_size); + err: + btrfs_free_path(path); + return ret; +} + static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans, struct btrfs_io_context *bioc) { @@ -112,6 +143,9 @@ static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans, ret = btrfs_insert_item(trans, stripe_root, &stripe_key, stripe_extent, item_size); + if (ret == -EEXIST) + ret = replace_raid_extent_item(trans, &stripe_key, + stripe_extent, item_size); if (ret) btrfs_abort_transaction(trans, ret); -- 2.43.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] btrfs: replace stripe extents 2024-06-10 8:40 ` [PATCH 2/3] btrfs: replace stripe extents Johannes Thumshirn @ 2024-06-10 19:43 ` Josef Bacik 2024-06-11 6:27 ` Johannes Thumshirn 0 siblings, 1 reply; 14+ messages in thread From: Josef Bacik @ 2024-06-10 19:43 UTC (permalink / raw) To: Johannes Thumshirn Cc: Chris Mason, David Sterba, linux-btrfs, linux-kernel, Johannes Thumshirn On Mon, Jun 10, 2024 at 10:40:26AM +0200, Johannes Thumshirn wrote: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > If we can't insert a stripe extent in the RAID stripe tree, because > the key that points to the specific position in the stripe tree is > already existing, we have to remove the item and then replace it by a > new item. > > This can happen for example on device replace operations. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/ctree.c | 1 + > fs/btrfs/raid-stripe-tree.c | 34 ++++++++++++++++++++++++++++++++++ > 2 files changed, 35 insertions(+) > > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c > index 1a49b9232990..ad934c5469c4 100644 > --- a/fs/btrfs/ctree.c > +++ b/fs/btrfs/ctree.c > @@ -3844,6 +3844,7 @@ static noinline int setup_leaf_for_split(struct btrfs_trans_handle *trans, > btrfs_item_key_to_cpu(leaf, &key, path->slots[0]); > > BUG_ON(key.type != BTRFS_EXTENT_DATA_KEY && > + key.type != BTRFS_RAID_STRIPE_KEY && This seems unrelated. Thanks, Josef ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] btrfs: replace stripe extents 2024-06-10 19:43 ` Josef Bacik @ 2024-06-11 6:27 ` Johannes Thumshirn 0 siblings, 0 replies; 14+ messages in thread From: Johannes Thumshirn @ 2024-06-11 6:27 UTC (permalink / raw) To: Josef Bacik, Johannes Thumshirn Cc: Chris Mason, David Sterba, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org On 10.06.24 21:43, Josef Bacik wrote: > On Mon, Jun 10, 2024 at 10:40:26AM +0200, Johannes Thumshirn wrote: >> From: Johannes Thumshirn <johannes.thumshirn@wdc.com> >> >> If we can't insert a stripe extent in the RAID stripe tree, because >> the key that points to the specific position in the stripe tree is >> already existing, we have to remove the item and then replace it by a >> new item. >> >> This can happen for example on device replace operations. >> >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> >> --- >> fs/btrfs/ctree.c | 1 + >> fs/btrfs/raid-stripe-tree.c | 34 ++++++++++++++++++++++++++++++++++ >> 2 files changed, 35 insertions(+) >> >> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c >> index 1a49b9232990..ad934c5469c4 100644 >> --- a/fs/btrfs/ctree.c >> +++ b/fs/btrfs/ctree.c >> @@ -3844,6 +3844,7 @@ static noinline int setup_leaf_for_split(struct btrfs_trans_handle *trans, >> btrfs_item_key_to_cpu(leaf, &key, path->slots[0]); >> >> BUG_ON(key.type != BTRFS_EXTENT_DATA_KEY && >> + key.type != BTRFS_RAID_STRIPE_KEY && > > This seems unrelated. Thanks, > > Josef > Oops it should go into 3/3 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/3] btrfs: split RAID stripes on deletion 2024-06-10 8:40 [PATCH 0/3] btrfs: rst: updates for RAID stripe tree Johannes Thumshirn 2024-06-10 8:40 ` [PATCH 1/3] btrfs: rst: remove encoding field from stripe_extent Johannes Thumshirn 2024-06-10 8:40 ` [PATCH 2/3] btrfs: replace stripe extents Johannes Thumshirn @ 2024-06-10 8:40 ` Johannes Thumshirn 2024-06-10 19:45 ` Josef Bacik 2 siblings, 1 reply; 14+ messages in thread From: Johannes Thumshirn @ 2024-06-10 8:40 UTC (permalink / raw) To: Chris Mason, Josef Bacik, David Sterba Cc: linux-btrfs, linux-kernel, JohnnesThumshirn, Johnnes Thumshirn From: JohnnesThumshirn <johannes.thumshirn@wdc.com> The current RAID stripe code assumes, that we will always remove a whole stripe entry. But ff we're only removing a part of a RAID stripe we're hitting the ASSERT()ion checking for this condition. Instead of assuming the complete deletion of a RAID stripe, split the stripe if we need to. Signed-off-by: Johnnes Thumshirn <johannes.thumshirn@wdc.com> --- fs/btrfs/raid-stripe-tree.c | 101 +++++++++++++++++++++++++++++++++----------- 1 file changed, 77 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c index 3020820dd6e2..41403217c3e6 100644 --- a/fs/btrfs/raid-stripe-tree.c +++ b/fs/btrfs/raid-stripe-tree.c @@ -33,42 +33,95 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le if (!path) return -ENOMEM; - while (1) { - key.objectid = start; - key.type = BTRFS_RAID_STRIPE_KEY; - key.offset = length; +again: + key.objectid = start; + key.type = BTRFS_RAID_STRIPE_KEY; + key.offset = length; - ret = btrfs_search_slot(trans, stripe_root, &key, path, -1, 1); - if (ret < 0) - break; - if (ret > 0) { - ret = 0; - if (path->slots[0] == 0) - break; - path->slots[0]--; - } + ret = btrfs_search_slot(trans, stripe_root, &key, path, -1, 1); + if (ret < 0) + goto out; + if (ret > 0) { + ret = 0; + if (path->slots[0] == 0) + goto out; + path->slots[0]--; + } + + leaf = path->nodes[0]; + slot = path->slots[0]; + btrfs_item_key_to_cpu(leaf, &key, slot); + found_start = key.objectid; + found_end = found_start + key.offset; + + /* That stripe ends before we start, we're done. */ + if (found_end <= start) + goto out; + + trace_btrfs_raid_extent_delete(fs_info, start, end, + found_start, found_end); + + if (found_start < start) { + u64 diff = start - found_start; + struct btrfs_key new_key; + int num_stripes; + struct btrfs_stripe_extent *stripe_extent; + + new_key.objectid = start; + new_key.type = BTRFS_RAID_STRIPE_KEY; + new_key.offset = length - diff; + + ret = btrfs_duplicate_item(trans, stripe_root, path, + &new_key); + if (ret) + goto out; leaf = path->nodes[0]; slot = path->slots[0]; - btrfs_item_key_to_cpu(leaf, &key, slot); - found_start = key.objectid; - found_end = found_start + key.offset; - /* That stripe ends before we start, we're done. */ - if (found_end <= start) - break; + num_stripes = + btrfs_num_raid_stripes(btrfs_item_size(leaf, slot)); + stripe_extent = + btrfs_item_ptr(leaf, slot, struct btrfs_stripe_extent); - trace_btrfs_raid_extent_delete(fs_info, start, end, - found_start, found_end); + for (int i = 0; i < num_stripes; i++) { + struct btrfs_raid_stride *raid_stride = + &stripe_extent->strides[i]; + u64 physical = + btrfs_raid_stride_physical(leaf, raid_stride); - ASSERT(found_start >= start && found_end <= end); - ret = btrfs_del_item(trans, stripe_root, path); + btrfs_set_stack_raid_stride_physical(raid_stride, + physical + diff); + } + + btrfs_mark_buffer_dirty(trans, leaf); + btrfs_release_path(path); + goto again; + } + + if (found_end > end) { + u64 diff = found_end - end; + struct btrfs_key new_key; + + new_key.objectid = found_start; + new_key.type = BTRFS_RAID_STRIPE_KEY; + new_key.offset = length - diff; + + ret = btrfs_duplicate_item(trans, stripe_root, path, + &new_key); if (ret) - break; + goto out; + btrfs_mark_buffer_dirty(trans, leaf); btrfs_release_path(path); + goto again; + } + if (found_start == start && found_end == end) + ret = btrfs_del_item(trans, stripe_root, path); + + out: btrfs_free_path(path); return ret; } -- 2.43.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] btrfs: split RAID stripes on deletion 2024-06-10 8:40 ` [PATCH 3/3] btrfs: split RAID stripes on deletion Johannes Thumshirn @ 2024-06-10 19:45 ` Josef Bacik 2024-06-11 6:39 ` Johannes Thumshirn 0 siblings, 1 reply; 14+ messages in thread From: Josef Bacik @ 2024-06-10 19:45 UTC (permalink / raw) To: Johannes Thumshirn Cc: Chris Mason, David Sterba, linux-btrfs, linux-kernel, JohnnesThumshirn On Mon, Jun 10, 2024 at 10:40:27AM +0200, Johannes Thumshirn wrote: > From: JohnnesThumshirn <johannes.thumshirn@wdc.com> > > The current RAID stripe code assumes, that we will always remove a > whole stripe entry. > > But ff we're only removing a part of a RAID stripe we're hitting the > ASSERT()ion checking for this condition. > > Instead of assuming the complete deletion of a RAID stripe, split the > stripe if we need to. > > Signed-off-by: Johnnes Thumshirn <johannes.thumshirn@wdc.com> I'd like a selftest for this helper, should be relatively straightforward to do, just to test edgecases and such. Thanks, Josef ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] btrfs: split RAID stripes on deletion 2024-06-10 19:45 ` Josef Bacik @ 2024-06-11 6:39 ` Johannes Thumshirn 0 siblings, 0 replies; 14+ messages in thread From: Johannes Thumshirn @ 2024-06-11 6:39 UTC (permalink / raw) To: Josef Bacik, Johannes Thumshirn Cc: Chris Mason, David Sterba, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org On 10.06.24 21:45, Josef Bacik wrote: > On Mon, Jun 10, 2024 at 10:40:27AM +0200, Johannes Thumshirn wrote: >> From: JohnnesThumshirn <johannes.thumshirn@wdc.com> >> >> The current RAID stripe code assumes, that we will always remove a >> whole stripe entry. >> >> But ff we're only removing a part of a RAID stripe we're hitting the >> ASSERT()ion checking for this condition. >> >> Instead of assuming the complete deletion of a RAID stripe, split the >> stripe if we need to. >> >> Signed-off-by: Johnnes Thumshirn <johannes.thumshirn@wdc.com> > > I'd like a selftest for this helper, should be relatively straightforward to do, > just to test edgecases and such. Thanks, Sure. Let me see what I can cook up. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-06-17 6:27 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-06-10 8:40 [PATCH 0/3] btrfs: rst: updates for RAID stripe tree Johannes Thumshirn 2024-06-10 8:40 ` [PATCH 1/3] btrfs: rst: remove encoding field from stripe_extent Johannes Thumshirn 2024-06-11 14:36 ` David Sterba 2024-06-11 16:33 ` Johannes Thumshirn 2024-06-13 21:23 ` David Sterba 2024-06-14 9:36 ` Johannes Thumshirn 2024-06-16 18:19 ` David Sterba 2024-06-17 6:27 ` Qu Wenruo 2024-06-10 8:40 ` [PATCH 2/3] btrfs: replace stripe extents Johannes Thumshirn 2024-06-10 19:43 ` Josef Bacik 2024-06-11 6:27 ` Johannes Thumshirn 2024-06-10 8:40 ` [PATCH 3/3] btrfs: split RAID stripes on deletion Johannes Thumshirn 2024-06-10 19:45 ` Josef Bacik 2024-06-11 6:39 ` Johannes Thumshirn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox