* [PATCH v3 0/4] RAID1 with 3- and 4- copies
@ 2019-10-31 15:13 David Sterba
2019-10-31 15:13 ` [PATCH v2 1/4] btrfs: add support for 3-copy replication (raid1c3) David Sterba
` (6 more replies)
0 siblings, 7 replies; 15+ messages in thread
From: David Sterba @ 2019-10-31 15:13 UTC (permalink / raw)
To: linux-btrfs; +Cc: David Sterba
Here it goes again, RAID1 with 3- and 4- copies. I found the bug that stopped
it from inclusion last time, it was in the test itself, so the kernel code is
effectively unchanged.
So, with 1 or 2 missing devices, replace by device id works. There's one
annoying thing but not new: regarding replace of a missing device, some
extra single/dup block groups are created during the replace process.
Example below. This can happen on plain raid1 with degraded read-write
mount as well.
Now what's the merge target.
The patches almost made it to 5.3, the changes build on existing code so the
actual addition of new profiles is namely in the definitions and additional
cases. So it should be safe.
I'm for adding it to 5.5 queue, though we're at rc5 and this can be seen as a
late time for a feature. The user benefits are noticeable, raid1c3 can replace
raid6 of metadata which is the most problematic part and much more complicated
to fix (write ahead journal or something like that). The feedback regarding the
plain 3-copy as a replacement was positive, on IRC and there are mails about
that too.
Further information can be found in the 5.3-time submission:
https://lore.kernel.org/linux-btrfs/cover.1559917235.git.dsterba@suse.com/
--
Example of 2 devices gone missing and replaced
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- mkfs -d raid1c3 -m raidc3 /dev/sda10 /dev/sda11 /dev/sda12
- delete devices 2 and 3 from the system
Data Metadata System
Id Path RAID1C3 RAID1C3 RAID1C3 Unallocated
-- ---------- --------- --------- -------- -----------
1 /dev/sda10 1.00GiB 256.00MiB 8.00MiB 8.74GiB
2 missing 1.00GiB 256.00MiB 8.00MiB -1.26GiB
3 missing 1.00GiB 256.00MiB 8.00MiB -1.26GiB
-- ---------- --------- --------- -------- -----------
Total 1.00GiB 256.00MiB 8.00MiB 6.23GiB
Used 200.31MiB 320.00KiB 16.00KiB
- mount -o degraded
- btrfs replace 2 /dev/sda13
Data Metadata Metadata System System
Id Path RAID1C3 single RAID1C3 single RAID1C3 Unallocated
-- ---------- --------- --------- --------- -------- ------- -----------
1 /dev/sda10 1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB 8.46GiB
2 /dev/sda13 1.00GiB - 256.00MiB - 8.00MiB 8.74GiB
3 missing 1.00GiB - 256.00MiB - 8.00MiB -1.26GiB
-- ---------- --------- --------- --------- -------- ------- -----------
Total 1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB 15.95GiB
Used 200.31MiB 0.00B 320.00KiB 16.00KiB 0.00B
- btrfs replace 3 /dev/sda14
Data Metadata Metadata System System
Id Path RAID1C3 single RAID1C3 single RAID1C3 Unallocated
-- ---------- --------- --------- --------- -------- ------- -----------
1 /dev/sda10 1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB 8.46GiB
2 /dev/sda13 1.00GiB - 256.00MiB - 8.00MiB 8.74GiB
3 /dev/sda14 1.00GiB - 256.00MiB - 8.00MiB 8.74GiB
-- ---------- --------- --------- --------- -------- ------- -----------
Total 1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB 25.95GiB
Used 200.31MiB 0.00B 320.00KiB 16.00KiB 0.00B
There you can see the metadata/single and system/single chunks, that are
otherwise unused if there are no other writes happening during replace.
Running 'balance start -mconvert=raid1c3,profiles=single' should get rid of
them.
This is an annoyance, we have a plan to avoid that but it needs to change
behaviour with degraded mount and enabled writes.
Implementation details: The new profiles are reduced from the expected ones
(raid1 -> single or dup) to allow writes without breaking the raid
constraints. To relax that condition, allow writing to "half" of the raid
with a missing device will skip creating the block groups.
This is similar to MD-RAID that allows writing to just one of the RAID1
devices, and then sync to the other when it's available again.
With the btrfs style raid1 we can do better in case there are enough other
devices that would satify the raid1 constraint (yet with a missing device).
--
David Sterba (4):
btrfs: add support for 3-copy replication (raid1c3)
btrfs: add support for 4-copy replication (raid1c4)
btrfs: add incompat for raid1 with 3, 4 copies
btrfs: drop incompat bit for raid1c34 after last block group is gone
fs/btrfs/block-group.c | 27 ++++++++++++++--------
fs/btrfs/ctree.h | 7 +++---
fs/btrfs/super.c | 4 ++++
fs/btrfs/sysfs.c | 2 ++
fs/btrfs/volumes.c | 40 +++++++++++++++++++++++++++++++--
fs/btrfs/volumes.h | 4 ++++
include/uapi/linux/btrfs.h | 5 ++++-
include/uapi/linux/btrfs_tree.h | 10 ++++++++-
8 files changed, 83 insertions(+), 16 deletions(-)
--
2.23.0
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH v2 1/4] btrfs: add support for 3-copy replication (raid1c3) 2019-10-31 15:13 [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba @ 2019-10-31 15:13 ` David Sterba 2019-10-31 15:13 ` [PATCH v2 2/4] btrfs: add support for 4-copy replication (raid1c4) David Sterba ` (5 subsequent siblings) 6 siblings, 0 replies; 15+ messages in thread From: David Sterba @ 2019-10-31 15:13 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba Add new block group profile to store 3 copies in a simliar way that current RAID1 does. The profile attributes and constraints are defined in the raid table and used by the same code that already handles the 2-copy RAID1. The minimum number of devices is 3, the maximum number of devices/chunks that can be lost/damaged is 2. Like RAID6 but with 33% space utilization. Signed-off-by: David Sterba <dsterba@suse.com> --- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/super.c | 2 ++ fs/btrfs/volumes.c | 19 +++++++++++++++++-- fs/btrfs/volumes.h | 2 ++ include/uapi/linux/btrfs.h | 3 ++- include/uapi/linux/btrfs_tree.h | 6 +++++- 6 files changed, 30 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1c8f01eaf27c..aa1b437fb951 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -57,9 +57,9 @@ struct btrfs_ref; * filesystem data as well that can be used to read data in order to repair * read errors on other disks. * - * Current value is derived from RAID1 with 2 copies. + * Current value is derived from RAID1C3 with 3 copies. */ -#define BTRFS_MAX_MIRRORS (2 + 1) +#define BTRFS_MAX_MIRRORS (3 + 1) #define BTRFS_MAX_LEVEL 8 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 3f49407cc2aa..a5aff138e2e0 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1935,6 +1935,8 @@ static inline int btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info, num_stripes = nr_devices; else if (type & BTRFS_BLOCK_GROUP_RAID1) num_stripes = 2; + else if (type & BTRFS_BLOCK_GROUP_RAID1C3) + num_stripes = 3; else if (type & BTRFS_BLOCK_GROUP_RAID10) num_stripes = 4; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f534a6a5553e..22560062269f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -58,6 +58,18 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .bg_flag = BTRFS_BLOCK_GROUP_RAID1, .mindev_error = BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET, }, + [BTRFS_RAID_RAID1C3] = { + .sub_stripes = 1, + .dev_stripes = 1, + .devs_max = 0, + .devs_min = 3, + .tolerated_failures = 2, + .devs_increment = 3, + .ncopies = 3, + .raid_name = "raid1c3", + .bg_flag = BTRFS_BLOCK_GROUP_RAID1C3, + .mindev_error = BTRFS_ERROR_DEV_RAID1C3_MIN_NOT_MET, + }, [BTRFS_RAID_DUP] = { .sub_stripes = 1, .dev_stripes = 2, @@ -4839,8 +4851,11 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, sort(devices_info, ndevs, sizeof(struct btrfs_device_info), btrfs_cmp_device_info, NULL); - /* round down to number of usable stripes */ - ndevs = round_down(ndevs, devs_increment); + /* + * Round down to number of usable stripes, devs_increment can be any + * number so we can't use round_down() + */ + ndevs -= ndevs % devs_increment; if (ndevs < devs_min) { ret = -ENOSPC; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index ac4ba8c57283..a4e26b84e1b9 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -545,6 +545,8 @@ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags) return BTRFS_RAID_RAID10; else if (flags & BTRFS_BLOCK_GROUP_RAID1) return BTRFS_RAID_RAID1; + else if (flags & BTRFS_BLOCK_GROUP_RAID1C3) + return BTRFS_RAID_RAID1C3; else if (flags & BTRFS_BLOCK_GROUP_DUP) return BTRFS_RAID_DUP; else if (flags & BTRFS_BLOCK_GROUP_RAID0) diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 3ee0678c0a83..ba22f91a3f5b 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -831,7 +831,8 @@ enum btrfs_err_code { BTRFS_ERROR_DEV_TGT_REPLACE, BTRFS_ERROR_DEV_MISSING_NOT_FOUND, BTRFS_ERROR_DEV_ONLY_WRITABLE, - BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS + BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS, + BTRFS_ERROR_DEV_RAID1C3_MIN_NOT_MET, }; #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index 5160be1d7332..52b2964b0311 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -841,6 +841,7 @@ struct btrfs_dev_replace_item { #define BTRFS_BLOCK_GROUP_RAID10 (1ULL << 6) #define BTRFS_BLOCK_GROUP_RAID5 (1ULL << 7) #define BTRFS_BLOCK_GROUP_RAID6 (1ULL << 8) +#define BTRFS_BLOCK_GROUP_RAID1C3 (1ULL << 9) #define BTRFS_BLOCK_GROUP_RESERVED (BTRFS_AVAIL_ALLOC_BIT_SINGLE | \ BTRFS_SPACE_INFO_GLOBAL_RSV) @@ -852,6 +853,7 @@ enum btrfs_raid_types { BTRFS_RAID_SINGLE, BTRFS_RAID_RAID5, BTRFS_RAID_RAID6, + BTRFS_RAID_RAID1C3, BTRFS_NR_RAID_TYPES }; @@ -861,6 +863,7 @@ enum btrfs_raid_types { #define BTRFS_BLOCK_GROUP_PROFILE_MASK (BTRFS_BLOCK_GROUP_RAID0 | \ BTRFS_BLOCK_GROUP_RAID1 | \ + BTRFS_BLOCK_GROUP_RAID1C3 | \ BTRFS_BLOCK_GROUP_RAID5 | \ BTRFS_BLOCK_GROUP_RAID6 | \ BTRFS_BLOCK_GROUP_DUP | \ @@ -868,7 +871,8 @@ enum btrfs_raid_types { #define BTRFS_BLOCK_GROUP_RAID56_MASK (BTRFS_BLOCK_GROUP_RAID5 | \ BTRFS_BLOCK_GROUP_RAID6) -#define BTRFS_BLOCK_GROUP_RAID1_MASK (BTRFS_BLOCK_GROUP_RAID1) +#define BTRFS_BLOCK_GROUP_RAID1_MASK (BTRFS_BLOCK_GROUP_RAID1 | \ + BTRFS_BLOCK_GROUP_RAID1C3) /* * We need a bit for restriper to be able to tell when chunks of type -- 2.23.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 2/4] btrfs: add support for 4-copy replication (raid1c4) 2019-10-31 15:13 [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba 2019-10-31 15:13 ` [PATCH v2 1/4] btrfs: add support for 3-copy replication (raid1c3) David Sterba @ 2019-10-31 15:13 ` David Sterba 2019-10-31 15:13 ` [PATCH v2 3/4] btrfs: add incompat for raid1 with 3, 4 copies David Sterba ` (4 subsequent siblings) 6 siblings, 0 replies; 15+ messages in thread From: David Sterba @ 2019-10-31 15:13 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba Add new block group profile to store 4 copies in a simliar way that current RAID1 does. The profile attributes and constraints are defined in the raid table and used by the same code that already handles the 2- and 3-copy RAID1. The minimum number of devices is 4, the maximum number of devices/chunks that can be lost/damaged is 3. There is no comparable traditional RAID level, the profile is added for future needs to accompany triple-parity and beyond. Signed-off-by: David Sterba <dsterba@suse.com> --- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/super.c | 2 ++ fs/btrfs/volumes.c | 12 ++++++++++++ fs/btrfs/volumes.h | 2 ++ include/uapi/linux/btrfs.h | 1 + include/uapi/linux/btrfs_tree.h | 6 +++++- 6 files changed, 24 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index aa1b437fb951..923a8804ae94 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -57,9 +57,9 @@ struct btrfs_ref; * filesystem data as well that can be used to read data in order to repair * read errors on other disks. * - * Current value is derived from RAID1C3 with 3 copies. + * Current value is derived from RAID1C4 with 4 copies. */ -#define BTRFS_MAX_MIRRORS (3 + 1) +#define BTRFS_MAX_MIRRORS (4 + 1) #define BTRFS_MAX_LEVEL 8 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index a5aff138e2e0..a98c3c71fc54 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1937,6 +1937,8 @@ static inline int btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info, num_stripes = 2; else if (type & BTRFS_BLOCK_GROUP_RAID1C3) num_stripes = 3; + else if (type & BTRFS_BLOCK_GROUP_RAID1C4) + num_stripes = 4; else if (type & BTRFS_BLOCK_GROUP_RAID10) num_stripes = 4; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 22560062269f..238d814f83a1 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -70,6 +70,18 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .bg_flag = BTRFS_BLOCK_GROUP_RAID1C3, .mindev_error = BTRFS_ERROR_DEV_RAID1C3_MIN_NOT_MET, }, + [BTRFS_RAID_RAID1C4] = { + .sub_stripes = 1, + .dev_stripes = 1, + .devs_max = 0, + .devs_min = 4, + .tolerated_failures = 3, + .devs_increment = 4, + .ncopies = 4, + .raid_name = "raid1c4", + .bg_flag = BTRFS_BLOCK_GROUP_RAID1C4, + .mindev_error = BTRFS_ERROR_DEV_RAID1C4_MIN_NOT_MET, + }, [BTRFS_RAID_DUP] = { .sub_stripes = 1, .dev_stripes = 2, diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index a4e26b84e1b9..46987a2da786 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -547,6 +547,8 @@ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags) return BTRFS_RAID_RAID1; else if (flags & BTRFS_BLOCK_GROUP_RAID1C3) return BTRFS_RAID_RAID1C3; + else if (flags & BTRFS_BLOCK_GROUP_RAID1C4) + return BTRFS_RAID_RAID1C4; else if (flags & BTRFS_BLOCK_GROUP_DUP) return BTRFS_RAID_DUP; else if (flags & BTRFS_BLOCK_GROUP_RAID0) diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index ba22f91a3f5b..a2b761275bba 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -833,6 +833,7 @@ enum btrfs_err_code { BTRFS_ERROR_DEV_ONLY_WRITABLE, BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS, BTRFS_ERROR_DEV_RAID1C3_MIN_NOT_MET, + BTRFS_ERROR_DEV_RAID1C4_MIN_NOT_MET, }; #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index 52b2964b0311..8e322e2c7e78 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -842,6 +842,7 @@ struct btrfs_dev_replace_item { #define BTRFS_BLOCK_GROUP_RAID5 (1ULL << 7) #define BTRFS_BLOCK_GROUP_RAID6 (1ULL << 8) #define BTRFS_BLOCK_GROUP_RAID1C3 (1ULL << 9) +#define BTRFS_BLOCK_GROUP_RAID1C4 (1ULL << 10) #define BTRFS_BLOCK_GROUP_RESERVED (BTRFS_AVAIL_ALLOC_BIT_SINGLE | \ BTRFS_SPACE_INFO_GLOBAL_RSV) @@ -854,6 +855,7 @@ enum btrfs_raid_types { BTRFS_RAID_RAID5, BTRFS_RAID_RAID6, BTRFS_RAID_RAID1C3, + BTRFS_RAID_RAID1C4, BTRFS_NR_RAID_TYPES }; @@ -864,6 +866,7 @@ enum btrfs_raid_types { #define BTRFS_BLOCK_GROUP_PROFILE_MASK (BTRFS_BLOCK_GROUP_RAID0 | \ BTRFS_BLOCK_GROUP_RAID1 | \ BTRFS_BLOCK_GROUP_RAID1C3 | \ + BTRFS_BLOCK_GROUP_RAID1C4 | \ BTRFS_BLOCK_GROUP_RAID5 | \ BTRFS_BLOCK_GROUP_RAID6 | \ BTRFS_BLOCK_GROUP_DUP | \ @@ -872,7 +875,8 @@ enum btrfs_raid_types { BTRFS_BLOCK_GROUP_RAID6) #define BTRFS_BLOCK_GROUP_RAID1_MASK (BTRFS_BLOCK_GROUP_RAID1 | \ - BTRFS_BLOCK_GROUP_RAID1C3) + BTRFS_BLOCK_GROUP_RAID1C3 | \ + BTRFS_BLOCK_GROUP_RAID1C4) /* * We need a bit for restriper to be able to tell when chunks of type -- 2.23.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 3/4] btrfs: add incompat for raid1 with 3, 4 copies 2019-10-31 15:13 [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba 2019-10-31 15:13 ` [PATCH v2 1/4] btrfs: add support for 3-copy replication (raid1c3) David Sterba 2019-10-31 15:13 ` [PATCH v2 2/4] btrfs: add support for 4-copy replication (raid1c4) David Sterba @ 2019-10-31 15:13 ` David Sterba 2019-10-31 15:13 ` [PATCH v2 4/4] btrfs: drop incompat bit for raid1c34 after last block group is gone David Sterba ` (3 subsequent siblings) 6 siblings, 0 replies; 15+ messages in thread From: David Sterba @ 2019-10-31 15:13 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba The new raid1c3 and raid1c4 profiles are backward incompatible and the name shall be 'raid1c34', the status can be found in the global supported features in /sys/fs/btrfs/features or in the per-filesystem directory. Signed-off-by: David Sterba <dsterba@suse.com> --- fs/btrfs/ctree.h | 3 ++- fs/btrfs/sysfs.c | 2 ++ fs/btrfs/volumes.c | 9 +++++++++ include/uapi/linux/btrfs.h | 1 + 4 files changed, 14 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 923a8804ae94..e76b3cda13e3 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -292,7 +292,8 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF | \ BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ - BTRFS_FEATURE_INCOMPAT_METADATA_UUID) + BTRFS_FEATURE_INCOMPAT_METADATA_UUID | \ + BTRFS_FEATURE_INCOMPAT_RAID1C34) #define BTRFS_FEATURE_INCOMPAT_SAFE_SET \ (BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 4a78bc4ec62e..1725578c5464 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -259,6 +259,7 @@ BTRFS_FEAT_ATTR_INCOMPAT(skinny_metadata, SKINNY_METADATA); BTRFS_FEAT_ATTR_INCOMPAT(no_holes, NO_HOLES); BTRFS_FEAT_ATTR_INCOMPAT(metadata_uuid, METADATA_UUID); BTRFS_FEAT_ATTR_COMPAT_RO(free_space_tree, FREE_SPACE_TREE); +BTRFS_FEAT_ATTR_INCOMPAT(raid1c34, RAID1C34); /* static struct btrfs_feature_attr btrfs_attr_features_checksums_name = { @@ -283,6 +284,7 @@ static struct attribute *btrfs_supported_feature_attrs[] = { BTRFS_FEAT_ATTR_PTR(no_holes), BTRFS_FEAT_ATTR_PTR(metadata_uuid), BTRFS_FEAT_ATTR_PTR(free_space_tree), + BTRFS_FEAT_ATTR_PTR(raid1c34), NULL }; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 238d814f83a1..a674a960c7be 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4717,6 +4717,14 @@ static void check_raid56_incompat_flag(struct btrfs_fs_info *info, u64 type) btrfs_set_fs_incompat(info, RAID56); } +static void check_raid1c34_incompat_flag(struct btrfs_fs_info *info, u64 type) +{ + if (!(type & (BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4))) + return; + + btrfs_set_fs_incompat(info, RAID1C34); +} + static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, u64 start, u64 type) { @@ -4983,6 +4991,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, free_extent_map(em); check_raid56_incompat_flag(info, type); + check_raid1c34_incompat_flag(info, type); kfree(devices_info); return 0; diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index a2b761275bba..7a8bc8b920f5 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -270,6 +270,7 @@ struct btrfs_ioctl_fs_info_args { #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA (1ULL << 8) #define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9) #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID (1ULL << 10) +#define BTRFS_FEATURE_INCOMPAT_RAID1C34 (1ULL << 11) struct btrfs_ioctl_feature_flags { __u64 compat_flags; -- 2.23.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 4/4] btrfs: drop incompat bit for raid1c34 after last block group is gone 2019-10-31 15:13 [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba ` (2 preceding siblings ...) 2019-10-31 15:13 ` [PATCH v2 3/4] btrfs: add incompat for raid1 with 3, 4 copies David Sterba @ 2019-10-31 15:13 ` David Sterba 2019-10-31 18:43 ` [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 David Sterba ` (2 subsequent siblings) 6 siblings, 0 replies; 15+ messages in thread From: David Sterba @ 2019-10-31 15:13 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba When there are no raid1c3 or raid1c4 block groups left after balance (either convert or with other filters applied), remove the incompat bit. This is already done for RAID56, do the same for RAID1C34. Signed-off-by: David Sterba <dsterba@suse.com> --- fs/btrfs/block-group.c | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 1e521db3ef56..9ce9c2e318cf 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -828,27 +828,36 @@ static void clear_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) * * - RAID56 - in case there's neither RAID5 nor RAID6 profile block group * in the whole filesystem + * + * - RAID1C34 - same as above for RAID1C3 and RAID1C4 block groups */ static void clear_incompat_bg_bits(struct btrfs_fs_info *fs_info, u64 flags) { - if (flags & BTRFS_BLOCK_GROUP_RAID56_MASK) { + bool found_raid56 = false; + bool found_raid1c34 = false; + + if ((flags & BTRFS_BLOCK_GROUP_RAID56_MASK) || + (flags & BTRFS_BLOCK_GROUP_RAID1C3) || + (flags & BTRFS_BLOCK_GROUP_RAID1C4)) { struct list_head *head = &fs_info->space_info; struct btrfs_space_info *sinfo; list_for_each_entry_rcu(sinfo, head, list) { - bool found = false; - down_read(&sinfo->groups_sem); if (!list_empty(&sinfo->block_groups[BTRFS_RAID_RAID5])) - found = true; + found_raid56 = true; if (!list_empty(&sinfo->block_groups[BTRFS_RAID_RAID6])) - found = true; + found_raid56 = true; + if (!list_empty(&sinfo->block_groups[BTRFS_RAID_RAID1C3])) + found_raid1c34 = true; + if (!list_empty(&sinfo->block_groups[BTRFS_RAID_RAID1C4])) + found_raid1c34 = true; up_read(&sinfo->groups_sem); - - if (found) - return; } - btrfs_clear_fs_incompat(fs_info, RAID56); + if (found_raid56) + btrfs_clear_fs_incompat(fs_info, RAID56); + if (found_raid1c34) + btrfs_clear_fs_incompat(fs_info, RAID1C34); } } -- 2.23.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 2019-10-31 15:13 [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba ` (3 preceding siblings ...) 2019-10-31 15:13 ` [PATCH v2 4/4] btrfs: drop incompat bit for raid1c34 after last block group is gone David Sterba @ 2019-10-31 18:43 ` David Sterba 2019-10-31 18:44 ` [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba 2019-11-01 14:54 ` Neal Gompa 6 siblings, 0 replies; 15+ messages in thread From: David Sterba @ 2019-10-31 18:43 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba Add support for 3- and 4- copy variants of RAID1. This adds resiliency against 2 or resp. 3 devices lost or damaged. $ ./mkfs.btrfs -m raid1c4 -d raid1c3 /dev/sd[abcd] Label: (null) UUID: f1f988ab-6750-4bc2-957b-98a4ebe98631 Node size: 16384 Sector size: 4096 Filesystem size: 8.00GiB Block group profiles: Data: RAID1C3 273.06MiB Metadata: RAID1C4 204.75MiB System: RAID1C4 8.00MiB SSD detected: no Incompat features: extref, skinny-metadata, raid1c34 Number of devices: 4 Devices: ID SIZE PATH 1 2.00GiB /dev/sda 2 2.00GiB /dev/sdb 3 2.00GiB /dev/sdc 4 2.00GiB /dev/sdd Signed-off-by: David Sterba <dsterba@suse.com> --- cmds/balance.c | 4 ++++ cmds/filesystem-usage.c | 8 +++++++ cmds/inspect-dump-super.c | 3 ++- cmds/rescue-chunk-recover.c | 4 ++++ common/fsfeatures.c | 6 +++++ common/utils.c | 12 +++++++++- ctree.h | 8 +++++++ extent-tree.c | 4 ++++ ioctl.h | 4 +++- mkfs/main.c | 11 ++++++++- print-tree.c | 6 +++++ volumes.c | 48 +++++++++++++++++++++++++++++++++++-- volumes.h | 4 ++++ 13 files changed, 116 insertions(+), 6 deletions(-) diff --git a/cmds/balance.c b/cmds/balance.c index 32830002f3a0..2d0fb6ef52ed 100644 --- a/cmds/balance.c +++ b/cmds/balance.c @@ -46,6 +46,10 @@ static int parse_one_profile(const char *profile, u64 *flags) *flags |= BTRFS_BLOCK_GROUP_RAID0; } else if (!strcmp(profile, "raid1")) { *flags |= BTRFS_BLOCK_GROUP_RAID1; + } else if (!strcmp(profile, "raid1c3")) { + *flags |= BTRFS_BLOCK_GROUP_RAID1C3; + } else if (!strcmp(profile, "raid1c4")) { + *flags |= BTRFS_BLOCK_GROUP_RAID1C4; } else if (!strcmp(profile, "raid10")) { *flags |= BTRFS_BLOCK_GROUP_RAID10; } else if (!strcmp(profile, "raid5")) { diff --git a/cmds/filesystem-usage.c b/cmds/filesystem-usage.c index 212322188d19..744ff2de5a7f 100644 --- a/cmds/filesystem-usage.c +++ b/cmds/filesystem-usage.c @@ -374,6 +374,10 @@ static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo, ratio = 1; else if (flags & BTRFS_BLOCK_GROUP_RAID1) ratio = 2; + else if (flags & BTRFS_BLOCK_GROUP_RAID1C3) + ratio = 3; + else if (flags & BTRFS_BLOCK_GROUP_RAID1C4) + ratio = 4; else if (flags & BTRFS_BLOCK_GROUP_RAID5) ratio = 0; else if (flags & BTRFS_BLOCK_GROUP_RAID6) @@ -654,6 +658,10 @@ static u64 calc_chunk_size(struct chunk_info *ci) return ci->size / ci->num_stripes; else if (ci->type & BTRFS_BLOCK_GROUP_RAID1) return ci->size ; + else if (ci->type & BTRFS_BLOCK_GROUP_RAID1C3) + return ci->size; + else if (ci->type & BTRFS_BLOCK_GROUP_RAID1C4) + return ci->size; else if (ci->type & BTRFS_BLOCK_GROUP_DUP) return ci->size ; else if (ci->type & BTRFS_BLOCK_GROUP_RAID5) diff --git a/cmds/inspect-dump-super.c b/cmds/inspect-dump-super.c index bf380ad2b56a..b32a5ebecc86 100644 --- a/cmds/inspect-dump-super.c +++ b/cmds/inspect-dump-super.c @@ -227,7 +227,8 @@ static struct readable_flag_entry incompat_flags_array[] = { DEF_INCOMPAT_FLAG_ENTRY(RAID56), DEF_INCOMPAT_FLAG_ENTRY(SKINNY_METADATA), DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES), - DEF_INCOMPAT_FLAG_ENTRY(METADATA_UUID) + DEF_INCOMPAT_FLAG_ENTRY(METADATA_UUID), + DEF_INCOMPAT_FLAG_ENTRY(RAID1C34), }; static const int incompat_flags_num = sizeof(incompat_flags_array) / sizeof(struct readable_flag_entry); diff --git a/cmds/rescue-chunk-recover.c b/cmds/rescue-chunk-recover.c index 329a608dfc6b..5d573161905f 100644 --- a/cmds/rescue-chunk-recover.c +++ b/cmds/rescue-chunk-recover.c @@ -1582,6 +1582,10 @@ static int calc_num_stripes(u64 type) else if (type & (BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_DUP)) return 2; + else if (type & (BTRFS_BLOCK_GROUP_RAID1C3)) + return 3; + else if (type & (BTRFS_BLOCK_GROUP_RAID1C4)) + return 4; else return 1; } diff --git a/common/fsfeatures.c b/common/fsfeatures.c index 50934bd161b0..ac12d57b25a3 100644 --- a/common/fsfeatures.c +++ b/common/fsfeatures.c @@ -86,6 +86,12 @@ static const struct btrfs_fs_feature { VERSION_TO_STRING2(4,0), NULL, 0, "no explicit hole extents for files" }, + { "raid1c34", BTRFS_FEATURE_INCOMPAT_RAID1C34, + "raid1c34", + VERSION_TO_STRING2(5,5), + NULL, 0, + NULL, 0, + "RAID1 with 3 or 4 copies" }, /* Keep this one last */ { "list-all", BTRFS_FEATURE_LIST_ALL, NULL } }; diff --git a/common/utils.c b/common/utils.c index 2cf15c333f6b..23e0a7927172 100644 --- a/common/utils.c +++ b/common/utils.c @@ -1117,8 +1117,10 @@ static int group_profile_devs_min(u64 flag) case BTRFS_BLOCK_GROUP_RAID5: return 2; case BTRFS_BLOCK_GROUP_RAID6: + case BTRFS_BLOCK_GROUP_RAID1C3: return 3; case BTRFS_BLOCK_GROUP_RAID10: + case BTRFS_BLOCK_GROUP_RAID1C4: return 4; default: return -1; @@ -1135,9 +1137,10 @@ int test_num_disk_vs_raid(u64 metadata_profile, u64 data_profile, default: case 4: allowed |= BTRFS_BLOCK_GROUP_RAID10; + allowed |= BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_RAID1C4; __attribute__ ((fallthrough)); case 3: - allowed |= BTRFS_BLOCK_GROUP_RAID6; + allowed |= BTRFS_BLOCK_GROUP_RAID6 | BTRFS_BLOCK_GROUP_RAID1C3; __attribute__ ((fallthrough)); case 2: allowed |= BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 | @@ -1191,7 +1194,10 @@ int group_profile_max_safe_loss(u64 flags) case BTRFS_BLOCK_GROUP_RAID10: return 1; case BTRFS_BLOCK_GROUP_RAID6: + case BTRFS_BLOCK_GROUP_RAID1C3: return 2; + case BTRFS_BLOCK_GROUP_RAID1C4: + return 3; default: return -1; } @@ -1341,6 +1347,10 @@ const char* btrfs_group_profile_str(u64 flag) return "RAID0"; case BTRFS_BLOCK_GROUP_RAID1: return "RAID1"; + case BTRFS_BLOCK_GROUP_RAID1C3: + return "RAID1C3"; + case BTRFS_BLOCK_GROUP_RAID1C4: + return "RAID1C4"; case BTRFS_BLOCK_GROUP_RAID5: return "RAID5"; case BTRFS_BLOCK_GROUP_RAID6: diff --git a/ctree.h b/ctree.h index b2745e1e8f13..f5227c053eb2 100644 --- a/ctree.h +++ b/ctree.h @@ -489,6 +489,7 @@ struct btrfs_super_block { #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA (1ULL << 8) #define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9) #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID (1ULL << 10) +#define BTRFS_FEATURE_INCOMPAT_RAID1C34 (1ULL << 11) #define BTRFS_FEATURE_COMPAT_SUPP 0ULL @@ -512,6 +513,7 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS | \ BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ + BTRFS_FEATURE_INCOMPAT_RAID1C34 | \ BTRFS_FEATURE_INCOMPAT_METADATA_UUID) /* @@ -961,6 +963,8 @@ struct btrfs_csum_item { #define BTRFS_BLOCK_GROUP_RAID10 (1ULL << 6) #define BTRFS_BLOCK_GROUP_RAID5 (1ULL << 7) #define BTRFS_BLOCK_GROUP_RAID6 (1ULL << 8) +#define BTRFS_BLOCK_GROUP_RAID1C3 (1ULL << 9) +#define BTRFS_BLOCK_GROUP_RAID1C4 (1ULL << 10) #define BTRFS_BLOCK_GROUP_RESERVED BTRFS_AVAIL_ALLOC_BIT_SINGLE enum btrfs_raid_types { @@ -971,6 +975,8 @@ enum btrfs_raid_types { BTRFS_RAID_SINGLE, BTRFS_RAID_RAID5, BTRFS_RAID_RAID6, + BTRFS_RAID_RAID1C3, + BTRFS_RAID_RAID1C4, BTRFS_NR_RAID_TYPES }; @@ -982,6 +988,8 @@ enum btrfs_raid_types { BTRFS_BLOCK_GROUP_RAID1 | \ BTRFS_BLOCK_GROUP_RAID5 | \ BTRFS_BLOCK_GROUP_RAID6 | \ + BTRFS_BLOCK_GROUP_RAID1C3 | \ + BTRFS_BLOCK_GROUP_RAID1C4 | \ BTRFS_BLOCK_GROUP_DUP | \ BTRFS_BLOCK_GROUP_RAID10) diff --git a/extent-tree.c b/extent-tree.c index 662fb1fa2b9a..d5cd13bd4328 100644 --- a/extent-tree.c +++ b/extent-tree.c @@ -1669,6 +1669,8 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) { u64 extra_flags = flags & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID1C3 | + BTRFS_BLOCK_GROUP_RAID1C4 | BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6 | @@ -3104,6 +3106,8 @@ static u64 get_dev_extent_len(struct map_lookup *map) case 0: /* Single */ case BTRFS_BLOCK_GROUP_DUP: case BTRFS_BLOCK_GROUP_RAID1: + case BTRFS_BLOCK_GROUP_RAID1C3: + case BTRFS_BLOCK_GROUP_RAID1C4: div = 1; break; case BTRFS_BLOCK_GROUP_RAID5: diff --git a/ioctl.h b/ioctl.h index 66ee599f7a82..d3dfd6375de1 100644 --- a/ioctl.h +++ b/ioctl.h @@ -775,7 +775,9 @@ enum btrfs_err_code { BTRFS_ERROR_DEV_TGT_REPLACE, BTRFS_ERROR_DEV_MISSING_NOT_FOUND, BTRFS_ERROR_DEV_ONLY_WRITABLE, - BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS + BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS, + BTRFS_ERROR_DEV_RAID1C3_MIN_NOT_MET, + BTRFS_ERROR_DEV_RAID1C4_MIN_NOT_MET, }; /* An error code to error string mapping for the kernel diff --git a/mkfs/main.c b/mkfs/main.c index f52e8b61a460..dd1223f703e4 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -337,7 +337,7 @@ static void print_usage(int ret) printf("Usage: mkfs.btrfs [options] dev [ dev ... ]\n"); printf("Options:\n"); printf(" allocation profiles:\n"); - printf("\t-d|--data PROFILE data profile, raid0, raid1, raid5, raid6, raid10, dup or single\n"); + printf("\t-d|--data PROFILE data profile, raid0, raid1, raid1c3, raid1c4, raid5, raid6, raid10, dup or single\n"); printf("\t-m|--metadata PROFILE metadata profile, values like for data profile\n"); printf("\t-M|--mixed mix metadata and data together\n"); printf(" features:\n"); @@ -370,6 +370,10 @@ static u64 parse_profile(const char *s) return BTRFS_BLOCK_GROUP_RAID0; } else if (strcasecmp(s, "raid1") == 0) { return BTRFS_BLOCK_GROUP_RAID1; + } else if (strcasecmp(s, "raid1c3") == 0) { + return BTRFS_BLOCK_GROUP_RAID1C3; + } else if (strcasecmp(s, "raid1c4") == 0) { + return BTRFS_BLOCK_GROUP_RAID1C4; } else if (strcasecmp(s, "raid5") == 0) { return BTRFS_BLOCK_GROUP_RAID5; } else if (strcasecmp(s, "raid6") == 0) { @@ -1065,6 +1069,11 @@ int BOX_MAIN(mkfs)(int argc, char **argv) features |= BTRFS_FEATURE_INCOMPAT_RAID56; } + if ((data_profile | metadata_profile) & + (BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4)) { + features |= BTRFS_FEATURE_INCOMPAT_RAID1C34; + } + if (btrfs_check_nodesize(nodesize, sectorsize, features)) goto error; diff --git a/print-tree.c b/print-tree.c index f70ce6844a7e..35ab9234cf48 100644 --- a/print-tree.c +++ b/print-tree.c @@ -162,6 +162,12 @@ static void bg_flags_to_str(u64 flags, char *ret) case BTRFS_BLOCK_GROUP_RAID1: strcat(ret, "|RAID1"); break; + case BTRFS_BLOCK_GROUP_RAID1C3: + strcat(ret, "|RAID1C3"); + break; + case BTRFS_BLOCK_GROUP_RAID1C4: + strcat(ret, "|RAID1C4"); + break; case BTRFS_BLOCK_GROUP_DUP: strcat(ret, "|DUP"); break; diff --git a/volumes.c b/volumes.c index fbbc22b5b1b3..63e7fba975cf 100644 --- a/volumes.c +++ b/volumes.c @@ -57,6 +57,24 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .bg_flag = BTRFS_BLOCK_GROUP_RAID1, .mindev_error = BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET, }, + [BTRFS_RAID_RAID1C3] = { + .sub_stripes = 1, + .dev_stripes = 1, + .devs_max = 0, + .devs_min = 3, + .tolerated_failures = 2, + .devs_increment = 3, + .ncopies = 3, + }, + [BTRFS_RAID_RAID1C4] = { + .sub_stripes = 1, + .dev_stripes = 1, + .devs_max = 0, + .devs_min = 4, + .tolerated_failures = 3, + .devs_increment = 4, + .ncopies = 4, + }, [BTRFS_RAID_DUP] = { .sub_stripes = 1, .dev_stripes = 2, @@ -854,6 +872,8 @@ static u64 chunk_bytes_by_type(u64 type, u64 calc_size, int num_stripes, { if (type & (BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_DUP)) return calc_size; + else if (type & (BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4)) + return calc_size; else if (type & BTRFS_BLOCK_GROUP_RAID10) return calc_size * (num_stripes / sub_stripes); else if (type & BTRFS_BLOCK_GROUP_RAID5) @@ -1034,6 +1054,20 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, return -ENOSPC; min_stripes = 2; } + if (type & BTRFS_BLOCK_GROUP_RAID1C3) { + num_stripes = min_t(u64, 3, + btrfs_super_num_devices(info->super_copy)); + if (num_stripes < 3) + return -ENOSPC; + min_stripes = 3; + } + if (type & BTRFS_BLOCK_GROUP_RAID1C4) { + num_stripes = min_t(u64, 4, + btrfs_super_num_devices(info->super_copy)); + if (num_stripes < 4) + return -ENOSPC; + min_stripes = 4; + } if (type & BTRFS_BLOCK_GROUP_DUP) { num_stripes = 2; min_stripes = 2; @@ -1382,7 +1416,8 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len) } map = container_of(ce, struct map_lookup, ce); - if (map->type & (BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID1)) + if (map->type & (BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4)) ret = map->num_stripes; else if (map->type & BTRFS_BLOCK_GROUP_RAID10) ret = map->sub_stripes; @@ -1578,6 +1613,8 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, if (rw == WRITE) { if (map->type & (BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID1C3 | + BTRFS_BLOCK_GROUP_RAID1C4 | BTRFS_BLOCK_GROUP_DUP)) { stripes_required = map->num_stripes; } else if (map->type & BTRFS_BLOCK_GROUP_RAID10) { @@ -1620,6 +1657,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, stripe_offset = offset - stripe_offset; if (map->type & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4 | BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6 | BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_DUP)) { @@ -1635,7 +1673,9 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, multi->num_stripes = 1; stripe_index = 0; - if (map->type & BTRFS_BLOCK_GROUP_RAID1) { + if (map->type & (BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID1C3 | + BTRFS_BLOCK_GROUP_RAID1C4)) { if (rw == WRITE) multi->num_stripes = map->num_stripes; else if (mirror_num) @@ -1905,6 +1945,8 @@ int btrfs_check_chunk_valid(struct btrfs_fs_info *fs_info, if ((type & BTRFS_BLOCK_GROUP_RAID10 && (sub_stripes != 2 || !IS_ALIGNED(num_stripes, sub_stripes))) || (type & BTRFS_BLOCK_GROUP_RAID1 && num_stripes < 1) || + (type & BTRFS_BLOCK_GROUP_RAID1C3 && num_stripes < 3) || + (type & BTRFS_BLOCK_GROUP_RAID1C4 && num_stripes < 4) || (type & BTRFS_BLOCK_GROUP_RAID5 && num_stripes < 2) || (type & BTRFS_BLOCK_GROUP_RAID6 && num_stripes < 3) || (type & BTRFS_BLOCK_GROUP_DUP && num_stripes > 2) || @@ -2464,6 +2506,8 @@ u64 btrfs_stripe_length(struct btrfs_fs_info *fs_info, switch (profile) { case 0: /* Single profile */ case BTRFS_BLOCK_GROUP_RAID1: + case BTRFS_BLOCK_GROUP_RAID1C3: + case BTRFS_BLOCK_GROUP_RAID1C4: case BTRFS_BLOCK_GROUP_DUP: stripe_len = chunk_len; break; diff --git a/volumes.h b/volumes.h index 586588c871ab..a6351dcf0bc3 100644 --- a/volumes.h +++ b/volumes.h @@ -135,6 +135,10 @@ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags) return BTRFS_RAID_RAID10; else if (flags & BTRFS_BLOCK_GROUP_RAID1) return BTRFS_RAID_RAID1; + else if (flags & BTRFS_BLOCK_GROUP_RAID1C3) + return BTRFS_RAID_RAID1C3; + else if (flags & BTRFS_BLOCK_GROUP_RAID1C4) + return BTRFS_RAID_RAID1C4; else if (flags & BTRFS_BLOCK_GROUP_DUP) return BTRFS_RAID_DUP; else if (flags & BTRFS_BLOCK_GROUP_RAID0) -- 2.23.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v3 0/4] RAID1 with 3- and 4- copies 2019-10-31 15:13 [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba ` (4 preceding siblings ...) 2019-10-31 18:43 ` [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 David Sterba @ 2019-10-31 18:44 ` David Sterba 2019-11-01 14:54 ` Neal Gompa 6 siblings, 0 replies; 15+ messages in thread From: David Sterba @ 2019-10-31 18:44 UTC (permalink / raw) To: David Sterba; +Cc: linux-btrfs The kernel code can be pulled from (based on misc-next) git://github.com/kdave/btrfs-devel.git dev/raid1c3-5.5-final and for btrfs-progs (based on 5.3.1) git://github.com/kdave/btrfs-progs.git dev/raid1c34 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 0/4] RAID1 with 3- and 4- copies 2019-10-31 15:13 [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba ` (5 preceding siblings ...) 2019-10-31 18:44 ` [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba @ 2019-11-01 14:54 ` Neal Gompa 2019-11-01 15:09 ` David Sterba 6 siblings, 1 reply; 15+ messages in thread From: Neal Gompa @ 2019-11-01 14:54 UTC (permalink / raw) To: David Sterba; +Cc: Btrfs BTRFS On Thu, Oct 31, 2019 at 11:17 AM David Sterba <dsterba@suse.com> wrote: > > Here it goes again, RAID1 with 3- and 4- copies. I found the bug that stopped > it from inclusion last time, it was in the test itself, so the kernel code is > effectively unchanged. > > So, with 1 or 2 missing devices, replace by device id works. There's one > annoying thing but not new: regarding replace of a missing device, some > extra single/dup block groups are created during the replace process. > Example below. This can happen on plain raid1 with degraded read-write > mount as well. > > Now what's the merge target. > > The patches almost made it to 5.3, the changes build on existing code so the > actual addition of new profiles is namely in the definitions and additional > cases. So it should be safe. > > I'm for adding it to 5.5 queue, though we're at rc5 and this can be seen as a > late time for a feature. The user benefits are noticeable, raid1c3 can replace > raid6 of metadata which is the most problematic part and much more complicated > to fix (write ahead journal or something like that). The feedback regarding the > plain 3-copy as a replacement was positive, on IRC and there are mails about > that too. > What's the reasoning for not submitting this for 5.4? I think the improvements here are definitely worth pulling into the 5.4 kernel release... -- 真実はいつも一つ!/ Always, there's only one truth! ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 0/4] RAID1 with 3- and 4- copies 2019-11-01 14:54 ` Neal Gompa @ 2019-11-01 15:09 ` David Sterba 2019-11-03 0:35 ` waxhead 2019-11-14 5:13 ` Zygo Blaxell 0 siblings, 2 replies; 15+ messages in thread From: David Sterba @ 2019-11-01 15:09 UTC (permalink / raw) To: Neal Gompa; +Cc: David Sterba, Btrfs BTRFS On Fri, Nov 01, 2019 at 10:54:45AM -0400, Neal Gompa wrote: > What's the reasoning for not submitting this for 5.4? I think the > improvements here are definitely worth pulling into the 5.4 kernel > release... Because 5.4 is at rc5, new features are allowed to be merged only during the merge window, ie. before 5.4-rc1. Thats more than a month ago. From rc1-rcX only regressions or fixes can be applied, so you can see pull requests but the subject lines almost always contain 'fix'. A new feature has to be in the develoment branch at least 2 weeks before the merge window opens (for testing), so right now it's the last opportunity to get it to 5.5, 5.4 is out of question. No matter how much I or users want to get it merged. This is how the linux development process works. The raid1c34 patches are not intrusive and could be backported on top of 5.3 because all the preparatory work has been merged already. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 0/4] RAID1 with 3- and 4- copies 2019-11-01 15:09 ` David Sterba @ 2019-11-03 0:35 ` waxhead 2019-11-04 13:40 ` David Sterba 2019-11-14 5:13 ` Zygo Blaxell 1 sibling, 1 reply; 15+ messages in thread From: waxhead @ 2019-11-03 0:35 UTC (permalink / raw) To: dsterba, Neal Gompa, David Sterba, Btrfs BTRFS Would GRUB be able to boot from RAID1c34 by treating it as "regular" RAID1?! If not I think a warning could be useful. David Sterba wrote: > On Fri, Nov 01, 2019 at 10:54:45AM -0400, Neal Gompa wrote: >> What's the reasoning for not submitting this for 5.4? I think the >> improvements here are definitely worth pulling into the 5.4 kernel >> release... > > Because 5.4 is at rc5, new features are allowed to be merged only during > the merge window, ie. before 5.4-rc1. Thats more than a month ago. From > rc1-rcX only regressions or fixes can be applied, so you can see pull > requests but the subject lines almost always contain 'fix'. > > A new feature has to be in the develoment branch at least 2 weeks before > the merge window opens (for testing), so right now it's the last > opportunity to get it to 5.5, 5.4 is out of question. No matter how much > I or users want to get it merged. This is how the linux development > process works. > > The raid1c34 patches are not intrusive and could be backported on top of > 5.3 because all the preparatory work has been merged already. > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 0/4] RAID1 with 3- and 4- copies 2019-11-03 0:35 ` waxhead @ 2019-11-04 13:40 ` David Sterba 0 siblings, 0 replies; 15+ messages in thread From: David Sterba @ 2019-11-04 13:40 UTC (permalink / raw) To: waxhead; +Cc: dsterba, Neal Gompa, David Sterba, Btrfs BTRFS On Sun, Nov 03, 2019 at 01:35:34AM +0100, waxhead wrote: > Would GRUB be able to boot from RAID1c34 by treating it as "regular" > RAID1?! If not I think a warning could be useful. Currently grub will refuse to boot from that with 'unknown profile' message. Adding the support seems to be fairly easi, I'll send the patches. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 0/4] RAID1 with 3- and 4- copies 2019-11-01 15:09 ` David Sterba 2019-11-03 0:35 ` waxhead @ 2019-11-14 5:13 ` Zygo Blaxell 2019-11-15 10:28 ` David Sterba 1 sibling, 1 reply; 15+ messages in thread From: Zygo Blaxell @ 2019-11-14 5:13 UTC (permalink / raw) To: dsterba, Neal Gompa, David Sterba, Btrfs BTRFS [-- Attachment #1: Type: text/plain, Size: 1385 bytes --] On Fri, Nov 01, 2019 at 04:09:08PM +0100, David Sterba wrote: > The raid1c34 patches are not intrusive and could be backported on top of > 5.3 because all the preparatory work has been merged already. Indeed, that's how I ended up testing them. I couldn't get the 5.4-rc kernels to run long enough to do meaningful testing before they locked up. I tested with 5.3.8 + patches. I left out the last patch that removes the raid1c3 incompat flag because 5.3 didn't have the block group tree code to apply it to. I ran my raid1 and raid56 corruption recovery tests modified for raid1c3. The first test is roughly: mkfs.btrfs -draid1c3 -mraid1c3 /dev/vd[bcdef] mount /dev/vdb /test cp -a 9GB_data /test sync sysctl vm.drop_caches=3 diff -r 9GB_data /test head -c 9g /dev/urandom > /dev/vdb head -c 9g /dev/urandom > /dev/vdc sync sysctl vm.drop_caches=3 diff -r 9GB_data /test btrfs scrub start -Bd /test sysctl vm.drop_caches=3 diff -r 9GB_data /test btrfs scrub start -Bd /test sysctl vm.drop_caches=3 diff -r 9GB_data /test First scrub reported a lot of corruption on /dev/vdb and /dev/vdc. Second scrub reported no errors. diff (all instances) reported no differences. Second test is: mkfs.btrfs -draid6 -mraid1c3 /dev/vd[bcdef] # rest as above... Similar results: first scrub reported many errors as expected. Second scrub reported no errors. No diffs. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 0/4] RAID1 with 3- and 4- copies 2019-11-14 5:13 ` Zygo Blaxell @ 2019-11-15 10:28 ` David Sterba 0 siblings, 0 replies; 15+ messages in thread From: David Sterba @ 2019-11-15 10:28 UTC (permalink / raw) To: Zygo Blaxell; +Cc: dsterba, Neal Gompa, David Sterba, Btrfs BTRFS On Thu, Nov 14, 2019 at 12:13:24AM -0500, Zygo Blaxell wrote: > On Fri, Nov 01, 2019 at 04:09:08PM +0100, David Sterba wrote: > > The raid1c34 patches are not intrusive and could be backported on top of > > 5.3 because all the preparatory work has been merged already. > > Indeed, that's how I ended up testing them. I couldn't get the 5.4-rc > kernels to run long enough to do meaningful testing before they locked > up. I tested with 5.3.8 + patches. > > I left out the last patch that removes the raid1c3 incompat flag because > 5.3 didn't have the block group tree code to apply it to. > > I ran my raid1 and raid56 corruption recovery tests modified for raid1c3. > The first test is roughly: > > mkfs.btrfs -draid1c3 -mraid1c3 /dev/vd[bcdef] > mount /dev/vdb /test > cp -a 9GB_data /test > sync > sysctl vm.drop_caches=3 > diff -r 9GB_data /test > head -c 9g /dev/urandom > /dev/vdb > head -c 9g /dev/urandom > /dev/vdc > sync > sysctl vm.drop_caches=3 > diff -r 9GB_data /test > btrfs scrub start -Bd /test > sysctl vm.drop_caches=3 > diff -r 9GB_data /test > btrfs scrub start -Bd /test > sysctl vm.drop_caches=3 > diff -r 9GB_data /test > > First scrub reported a lot of corruption on /dev/vdb and /dev/vdc. Second > scrub reported no errors. diff (all instances) reported no differences. > > Second test is: > > mkfs.btrfs -draid6 -mraid1c3 /dev/vd[bcdef] > # rest as above... > > Similar results: first scrub reported many errors as expected. > Second scrub reported no errors. No diffs. Thanks for the tests. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 0/6] RAID1 with 3- and 4- copies @ 2019-06-10 12:29 David Sterba 2019-06-10 12:29 ` [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 David Sterba 0 siblings, 1 reply; 15+ messages in thread From: David Sterba @ 2019-06-10 12:29 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba Hi, this patchset brings the RAID1 with 3 and 4 copies as a separate feature as outlined in V1 (https://lore.kernel.org/linux-btrfs/cover.1531503452.git.dsterba@suse.com/). This should help a bit in the raid56 situation, where the write hole hurts most for metadata, without a block group profile that offers 2 device loss resistance. I've gathered some feedback from knowlegeable poeople on IRC and the following setup is considered good enough (certainly better than what we have now): - data: RAID6 - metadata: RAID1C3 The RAID1C3 vs RAID6 have different characteristics in terms of space consumption and repair. Space consumption ~~~~~~~~~~~~~~~~~ * RAID6 reduces overall metadata by N/(N-2), so with more devices the parity overhead ratio is small * RAID1C3 will allways consume 67% of metadata chunks for redundancy The overall size of metadata is typically in range of gigabytes to hundreds of gigabytes (depends on usecase), rough estimate is from 1%-10%. With larger filesystem the percentage is usually smaller. So, for the 3-copy raid1 the cost of redundancy is better expressed in the absolute value of gigabytes "wasted" on redundancy than as the ratio that does look scary compared to raid6. Repair ~~~~~~ RAID6 needs to access all available devices to calculate the P and Q, either 1 or 2 missing devices. RAID1C3 can utilize the independence of each copy and also the way the RAID1 works in btrfs. In the scenario with 1 missing device, one of the 2 correct copies is read and written to the repaired devices. Given how the 2-copy RAID1 works on btrfs, the block groups could be spread over several devices so the load during repair would be spread as well. Additionally, device replace works sequentially and in big chunks so on a lightly used system the read pattern is seek-friendly. Compatibility ~~~~~~~~~~~~~ The new block group types cost an incompatibility bit, so old kernel will refuse to mount filesystem with RAID1C3 feature, ie. any chunk on the filesystem with the new type. To upgrade existing filesystems use the balance filters eg. from RAID6 $ btrfs balance start -mconvert=raid1c3 /path Merge target ~~~~~~~~~~~~ I'd like to push that to misc-next for wider testing and merge to 5.3, unless something bad pops up. Given that the code changes are small and just a new types with the constraints, the rest is done by the generic code, I'm not expecting problems that can't be fixed before full release. Testing so far ~~~~~~~~~~~~~~ * mkfs with the profiles * fstests (no specific tests, only check that it does not break) * profile conversions between single/raid1/raid5/raid1c3/raid6/raid1c4/raid1c4 with added devices where needed * scrub TODO: * 1 missing device followed by repair * 2 missing devices followed by repair David Sterba (6): btrfs: add mask for all RAID1 types btrfs: use mask for RAID56 profiles btrfs: document BTRFS_MAX_MIRRORS btrfs: add support for 3-copy replication (raid1c3) btrfs: add support for 4-copy replication (raid1c4) btrfs: add incompat for raid1 with 3, 4 copies fs/btrfs/ctree.h | 14 ++++++++-- fs/btrfs/extent-tree.c | 19 +++++++------ fs/btrfs/scrub.c | 2 +- fs/btrfs/super.c | 6 +++++ fs/btrfs/sysfs.c | 2 ++ fs/btrfs/volumes.c | 48 ++++++++++++++++++++++++++++----- fs/btrfs/volumes.h | 4 +++ include/uapi/linux/btrfs.h | 5 +++- include/uapi/linux/btrfs_tree.h | 10 +++++++ 9 files changed, 90 insertions(+), 20 deletions(-) -- 2.21.0 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 2019-06-10 12:29 [PATCH v2 0/6] " David Sterba @ 2019-06-10 12:29 ` David Sterba 0 siblings, 0 replies; 15+ messages in thread From: David Sterba @ 2019-06-10 12:29 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba $ ./mkfs.btrfs -m raid1c4 -d raid1c3 /dev/sd[abcd] Label: (null) UUID: f1f988ab-6750-4bc2-957b-98a4ebe98631 Node size: 16384 Sector size: 4096 Filesystem size: 8.00GiB Block group profiles: Data: RAID1C3 273.06MiB Metadata: RAID1C4 204.75MiB System: RAID1C4 8.00MiB SSD detected: no Incompat features: extref, skinny-metadata, raid1c34 Number of devices: 4 Devices: ID SIZE PATH 1 2.00GiB /dev/sda 2 2.00GiB /dev/sdb 3 2.00GiB /dev/sdc 4 2.00GiB /dev/sdd Signed-off-by: David Sterba <dsterba@suse.com> --- chunk-recover.c | 4 ++++ cmds-balance.c | 4 ++++ cmds-fi-usage.c | 8 +++++++ cmds-inspect-dump-super.c | 3 ++- ctree.h | 8 +++++++ extent-tree.c | 4 ++++ fsfeatures.c | 6 +++++ ioctl.h | 4 +++- mkfs/main.c | 11 ++++++++- print-tree.c | 6 +++++ utils.c | 12 +++++++++- volumes.c | 48 +++++++++++++++++++++++++++++++++++++-- volumes.h | 4 ++++ 13 files changed, 116 insertions(+), 6 deletions(-) diff --git a/chunk-recover.c b/chunk-recover.c index f3e7774efc0f..005032961e71 100644 --- a/chunk-recover.c +++ b/chunk-recover.c @@ -1579,6 +1579,10 @@ static int calc_num_stripes(u64 type) else if (type & (BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_DUP)) return 2; + else if (type & (BTRFS_BLOCK_GROUP_RAID1C3)) + return 3; + else if (type & (BTRFS_BLOCK_GROUP_RAID1C4)) + return 4; else return 1; } diff --git a/cmds-balance.c b/cmds-balance.c index b533cf737584..854d7d4c380a 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -46,6 +46,10 @@ static int parse_one_profile(const char *profile, u64 *flags) *flags |= BTRFS_BLOCK_GROUP_RAID0; } else if (!strcmp(profile, "raid1")) { *flags |= BTRFS_BLOCK_GROUP_RAID1; + } else if (!strcmp(profile, "raid1c3")) { + *flags |= BTRFS_BLOCK_GROUP_RAID1C3; + } else if (!strcmp(profile, "raid1c4")) { + *flags |= BTRFS_BLOCK_GROUP_RAID1C4; } else if (!strcmp(profile, "raid10")) { *flags |= BTRFS_BLOCK_GROUP_RAID10; } else if (!strcmp(profile, "raid5")) { diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c index 9a23e17633d4..6bae4e723daf 100644 --- a/cmds-fi-usage.c +++ b/cmds-fi-usage.c @@ -373,6 +373,10 @@ static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo, ratio = 1; else if (flags & BTRFS_BLOCK_GROUP_RAID1) ratio = 2; + else if (flags & BTRFS_BLOCK_GROUP_RAID1C3) + ratio = 3; + else if (flags & BTRFS_BLOCK_GROUP_RAID1C4) + ratio = 4; else if (flags & BTRFS_BLOCK_GROUP_RAID5) ratio = 0; else if (flags & BTRFS_BLOCK_GROUP_RAID6) @@ -653,6 +657,10 @@ static u64 calc_chunk_size(struct chunk_info *ci) return ci->size / ci->num_stripes; else if (ci->type & BTRFS_BLOCK_GROUP_RAID1) return ci->size ; + else if (ci->type & BTRFS_BLOCK_GROUP_RAID1C3) + return ci->size; + else if (ci->type & BTRFS_BLOCK_GROUP_RAID1C4) + return ci->size; else if (ci->type & BTRFS_BLOCK_GROUP_DUP) return ci->size ; else if (ci->type & BTRFS_BLOCK_GROUP_RAID5) diff --git a/cmds-inspect-dump-super.c b/cmds-inspect-dump-super.c index d62f0932556c..bf9539df0dd5 100644 --- a/cmds-inspect-dump-super.c +++ b/cmds-inspect-dump-super.c @@ -229,7 +229,8 @@ static struct readable_flag_entry incompat_flags_array[] = { DEF_INCOMPAT_FLAG_ENTRY(RAID56), DEF_INCOMPAT_FLAG_ENTRY(SKINNY_METADATA), DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES), - DEF_INCOMPAT_FLAG_ENTRY(METADATA_UUID) + DEF_INCOMPAT_FLAG_ENTRY(METADATA_UUID), + DEF_INCOMPAT_FLAG_ENTRY(RAID1C34), }; static const int incompat_flags_num = sizeof(incompat_flags_array) / sizeof(struct readable_flag_entry); diff --git a/ctree.h b/ctree.h index 9156ca4de6fd..87f586991648 100644 --- a/ctree.h +++ b/ctree.h @@ -490,6 +490,7 @@ struct btrfs_super_block { #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA (1ULL << 8) #define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9) #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID (1ULL << 10) +#define BTRFS_FEATURE_INCOMPAT_RAID1C34 (1ULL << 11) #define BTRFS_FEATURE_COMPAT_SUPP 0ULL @@ -513,6 +514,7 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS | \ BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ + BTRFS_FEATURE_INCOMPAT_RAID1C34 | \ BTRFS_FEATURE_INCOMPAT_METADATA_UUID) /* @@ -962,6 +964,8 @@ struct btrfs_csum_item { #define BTRFS_BLOCK_GROUP_RAID10 (1ULL << 6) #define BTRFS_BLOCK_GROUP_RAID5 (1ULL << 7) #define BTRFS_BLOCK_GROUP_RAID6 (1ULL << 8) +#define BTRFS_BLOCK_GROUP_RAID1C3 (1ULL << 9) +#define BTRFS_BLOCK_GROUP_RAID1C4 (1ULL << 10) #define BTRFS_BLOCK_GROUP_RESERVED BTRFS_AVAIL_ALLOC_BIT_SINGLE enum btrfs_raid_types { @@ -972,6 +976,8 @@ enum btrfs_raid_types { BTRFS_RAID_SINGLE, BTRFS_RAID_RAID5, BTRFS_RAID_RAID6, + BTRFS_RAID_RAID1C3, + BTRFS_RAID_RAID1C4, BTRFS_NR_RAID_TYPES }; @@ -983,6 +989,8 @@ enum btrfs_raid_types { BTRFS_BLOCK_GROUP_RAID1 | \ BTRFS_BLOCK_GROUP_RAID5 | \ BTRFS_BLOCK_GROUP_RAID6 | \ + BTRFS_BLOCK_GROUP_RAID1C3 | \ + BTRFS_BLOCK_GROUP_RAID1C4 | \ BTRFS_BLOCK_GROUP_DUP | \ BTRFS_BLOCK_GROUP_RAID10) diff --git a/extent-tree.c b/extent-tree.c index c6516b2ba445..50a38a775147 100644 --- a/extent-tree.c +++ b/extent-tree.c @@ -1668,6 +1668,8 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) { u64 extra_flags = flags & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID1C3 | + BTRFS_BLOCK_GROUP_RAID1C4 | BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6 | @@ -3339,6 +3341,8 @@ static u64 get_dev_extent_len(struct map_lookup *map) case 0: /* Single */ case BTRFS_BLOCK_GROUP_DUP: case BTRFS_BLOCK_GROUP_RAID1: + case BTRFS_BLOCK_GROUP_RAID1C3: + case BTRFS_BLOCK_GROUP_RAID1C4: div = 1; break; case BTRFS_BLOCK_GROUP_RAID5: diff --git a/fsfeatures.c b/fsfeatures.c index 7f3ef03b8452..e3b3d63b43a5 100644 --- a/fsfeatures.c +++ b/fsfeatures.c @@ -86,6 +86,12 @@ static const struct btrfs_fs_feature { VERSION_TO_STRING2(4,0), NULL, 0, "no explicit hole extents for files" }, + { "raid1c34", BTRFS_FEATURE_INCOMPAT_RAID1C34, + "raid1c34", + VERSION_TO_STRING2(5,3), + NULL, 0, + NULL, 0, + "RAID1 with 3 or 4 copies" }, /* Keep this one last */ { "list-all", BTRFS_FEATURE_LIST_ALL, NULL } }; diff --git a/ioctl.h b/ioctl.h index 66ee599f7a82..d3dfd6375de1 100644 --- a/ioctl.h +++ b/ioctl.h @@ -775,7 +775,9 @@ enum btrfs_err_code { BTRFS_ERROR_DEV_TGT_REPLACE, BTRFS_ERROR_DEV_MISSING_NOT_FOUND, BTRFS_ERROR_DEV_ONLY_WRITABLE, - BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS + BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS, + BTRFS_ERROR_DEV_RAID1C3_MIN_NOT_MET, + BTRFS_ERROR_DEV_RAID1C4_MIN_NOT_MET, }; /* An error code to error string mapping for the kernel diff --git a/mkfs/main.c b/mkfs/main.c index 6b47a9ee0e73..88801b89e827 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -316,7 +316,7 @@ static void print_usage(int ret) printf("Usage: mkfs.btrfs [options] dev [ dev ... ]\n"); printf("Options:\n"); printf(" allocation profiles:\n"); - printf("\t-d|--data PROFILE data profile, raid0, raid1, raid5, raid6, raid10, dup or single\n"); + printf("\t-d|--data PROFILE data profile, raid0, raid1, raid1c3, raid1c4, raid5, raid6, raid10, dup or single\n"); printf("\t-m|--metadata PROFILE metadata profile, values like for data profile\n"); printf("\t-M|--mixed mix metadata and data together\n"); printf(" features:\n"); @@ -347,6 +347,10 @@ static u64 parse_profile(const char *s) return BTRFS_BLOCK_GROUP_RAID0; } else if (strcasecmp(s, "raid1") == 0) { return BTRFS_BLOCK_GROUP_RAID1; + } else if (strcasecmp(s, "raid1c3") == 0) { + return BTRFS_BLOCK_GROUP_RAID1C3; + } else if (strcasecmp(s, "raid1c4") == 0) { + return BTRFS_BLOCK_GROUP_RAID1C4; } else if (strcasecmp(s, "raid5") == 0) { return BTRFS_BLOCK_GROUP_RAID5; } else if (strcasecmp(s, "raid6") == 0) { @@ -1022,6 +1026,11 @@ int main(int argc, char **argv) features |= BTRFS_FEATURE_INCOMPAT_RAID56; } + if ((data_profile | metadata_profile) & + (BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4)) { + features |= BTRFS_FEATURE_INCOMPAT_RAID1C34; + } + if (btrfs_check_nodesize(nodesize, sectorsize, features)) goto error; diff --git a/print-tree.c b/print-tree.c index 0d0bb5109207..78931d2d8eb1 100644 --- a/print-tree.c +++ b/print-tree.c @@ -163,6 +163,12 @@ static void bg_flags_to_str(u64 flags, char *ret) case BTRFS_BLOCK_GROUP_RAID1: strcat(ret, "|RAID1"); break; + case BTRFS_BLOCK_GROUP_RAID1C3: + strcat(ret, "|RAID1C3"); + break; + case BTRFS_BLOCK_GROUP_RAID1C4: + strcat(ret, "|RAID1C4"); + break; case BTRFS_BLOCK_GROUP_DUP: strcat(ret, "|DUP"); break; diff --git a/utils.c b/utils.c index 0b271517551b..6320d1a496cd 100644 --- a/utils.c +++ b/utils.c @@ -1900,8 +1900,10 @@ static int group_profile_devs_min(u64 flag) case BTRFS_BLOCK_GROUP_RAID5: return 2; case BTRFS_BLOCK_GROUP_RAID6: + case BTRFS_BLOCK_GROUP_RAID1C3: return 3; case BTRFS_BLOCK_GROUP_RAID10: + case BTRFS_BLOCK_GROUP_RAID1C4: return 4; default: return -1; @@ -1918,9 +1920,10 @@ int test_num_disk_vs_raid(u64 metadata_profile, u64 data_profile, default: case 4: allowed |= BTRFS_BLOCK_GROUP_RAID10; + allowed |= BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_RAID1C4; __attribute__ ((fallthrough)); case 3: - allowed |= BTRFS_BLOCK_GROUP_RAID6; + allowed |= BTRFS_BLOCK_GROUP_RAID6 | BTRFS_BLOCK_GROUP_RAID1C3; __attribute__ ((fallthrough)); case 2: allowed |= BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 | @@ -1975,7 +1978,10 @@ int group_profile_max_safe_loss(u64 flags) case BTRFS_BLOCK_GROUP_RAID10: return 1; case BTRFS_BLOCK_GROUP_RAID6: + case BTRFS_BLOCK_GROUP_RAID1C3: return 2; + case BTRFS_BLOCK_GROUP_RAID1C4: + return 3; default: return -1; } @@ -2199,6 +2205,10 @@ const char* btrfs_group_profile_str(u64 flag) return "RAID0"; case BTRFS_BLOCK_GROUP_RAID1: return "RAID1"; + case BTRFS_BLOCK_GROUP_RAID1C3: + return "RAID1C3"; + case BTRFS_BLOCK_GROUP_RAID1C4: + return "RAID1C4"; case BTRFS_BLOCK_GROUP_RAID5: return "RAID5"; case BTRFS_BLOCK_GROUP_RAID6: diff --git a/volumes.c b/volumes.c index 3a91b43b378b..66a88c7b7c17 100644 --- a/volumes.c +++ b/volumes.c @@ -49,6 +49,24 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_increment = 2, .ncopies = 2, }, + [BTRFS_RAID_RAID1C3] = { + .sub_stripes = 1, + .dev_stripes = 1, + .devs_max = 0, + .devs_min = 3, + .tolerated_failures = 2, + .devs_increment = 3, + .ncopies = 3, + }, + [BTRFS_RAID_RAID1C4] = { + .sub_stripes = 1, + .dev_stripes = 1, + .devs_max = 0, + .devs_min = 4, + .tolerated_failures = 3, + .devs_increment = 4, + .ncopies = 4, + }, [BTRFS_RAID_DUP] = { .sub_stripes = 1, .dev_stripes = 2, @@ -826,6 +844,8 @@ static u64 chunk_bytes_by_type(u64 type, u64 calc_size, int num_stripes, { if (type & (BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_DUP)) return calc_size; + else if (type & (BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4)) + return calc_size; else if (type & BTRFS_BLOCK_GROUP_RAID10) return calc_size * (num_stripes / sub_stripes); else if (type & BTRFS_BLOCK_GROUP_RAID5) @@ -1006,6 +1026,20 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, return -ENOSPC; min_stripes = 2; } + if (type & BTRFS_BLOCK_GROUP_RAID1C3) { + num_stripes = min_t(u64, 3, + btrfs_super_num_devices(info->super_copy)); + if (num_stripes < 3) + return -ENOSPC; + min_stripes = 3; + } + if (type & BTRFS_BLOCK_GROUP_RAID1C4) { + num_stripes = min_t(u64, 4, + btrfs_super_num_devices(info->super_copy)); + if (num_stripes < 4) + return -ENOSPC; + min_stripes = 4; + } if (type & BTRFS_BLOCK_GROUP_DUP) { num_stripes = 2; min_stripes = 2; @@ -1354,7 +1388,8 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len) } map = container_of(ce, struct map_lookup, ce); - if (map->type & (BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID1)) + if (map->type & (BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4)) ret = map->num_stripes; else if (map->type & BTRFS_BLOCK_GROUP_RAID10) ret = map->sub_stripes; @@ -1550,6 +1585,8 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, if (rw == WRITE) { if (map->type & (BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID1C3 | + BTRFS_BLOCK_GROUP_RAID1C4 | BTRFS_BLOCK_GROUP_DUP)) { stripes_required = map->num_stripes; } else if (map->type & BTRFS_BLOCK_GROUP_RAID10) { @@ -1592,6 +1629,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, stripe_offset = offset - stripe_offset; if (map->type & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4 | BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6 | BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_DUP)) { @@ -1607,7 +1645,9 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, multi->num_stripes = 1; stripe_index = 0; - if (map->type & BTRFS_BLOCK_GROUP_RAID1) { + if (map->type & (BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID1C3 | + BTRFS_BLOCK_GROUP_RAID1C4)) { if (rw == WRITE) multi->num_stripes = map->num_stripes; else if (mirror_num) @@ -1877,6 +1917,8 @@ int btrfs_check_chunk_valid(struct btrfs_fs_info *fs_info, if ((type & BTRFS_BLOCK_GROUP_RAID10 && (sub_stripes != 2 || !IS_ALIGNED(num_stripes, sub_stripes))) || (type & BTRFS_BLOCK_GROUP_RAID1 && num_stripes < 1) || + (type & BTRFS_BLOCK_GROUP_RAID1C3 && num_stripes < 3) || + (type & BTRFS_BLOCK_GROUP_RAID1C4 && num_stripes < 4) || (type & BTRFS_BLOCK_GROUP_RAID5 && num_stripes < 2) || (type & BTRFS_BLOCK_GROUP_RAID6 && num_stripes < 3) || (type & BTRFS_BLOCK_GROUP_DUP && num_stripes > 2) || @@ -2436,6 +2478,8 @@ u64 btrfs_stripe_length(struct btrfs_fs_info *fs_info, switch (profile) { case 0: /* Single profile */ case BTRFS_BLOCK_GROUP_RAID1: + case BTRFS_BLOCK_GROUP_RAID1C3: + case BTRFS_BLOCK_GROUP_RAID1C4: case BTRFS_BLOCK_GROUP_DUP: stripe_len = chunk_len; break; diff --git a/volumes.h b/volumes.h index dbe9d3dea647..6fa39cf31b9d 100644 --- a/volumes.h +++ b/volumes.h @@ -130,6 +130,10 @@ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags) return BTRFS_RAID_RAID10; else if (flags & BTRFS_BLOCK_GROUP_RAID1) return BTRFS_RAID_RAID1; + else if (flags & BTRFS_BLOCK_GROUP_RAID1C3) + return BTRFS_RAID_RAID1C3; + else if (flags & BTRFS_BLOCK_GROUP_RAID1C4) + return BTRFS_RAID_RAID1C4; else if (flags & BTRFS_BLOCK_GROUP_DUP) return BTRFS_RAID_DUP; else if (flags & BTRFS_BLOCK_GROUP_RAID0) -- 2.21.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 0/4] 3- and 4- copy RAID1 @ 2018-07-13 18:46 David Sterba 2018-07-13 18:46 ` [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 David Sterba 0 siblings, 1 reply; 15+ messages in thread From: David Sterba @ 2018-07-13 18:46 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba Hi, I have some goodies that go into the RAID56 problem, although not implementing all the remaining features, it can be useful independently. This time my hackweek project https://hackweek.suse.com/17/projects/do-something-about-btrfs-and-raid56 aimed to implement the fix for the write hole problem but I spent more time with analysis and design of the solution and don't have a working prototype for that yet. This patchset brings a feature that will be used by the raid56 log, the log has to be on the same redundancy level and thus we need a 3-copy replication for raid6. As it was easy to extend to higher replication, I've added a 4-copy replication, that would allow triple copy raid (that does not have a standardized name). The number of copies is fixed, so it's not N-copy for an arbitrary N. This would complicate the implementation too much, though I'd be willing to add a 5-copy replication for a small bribe. The new raid profiles and covered by an incompatibility bit, called extended_raid, the (idealistic) plan is to stuff as many new raid-related features as possible. The patch 4/4 mentions the 3- 4- copy raid1, configurable stripe length, write hole log and triple parity. If the plan turns out to be too ambitious, the ready and implemented features will be split and merged. An interesting question is the naming of the extended profiles. I picked something that can be easily understood but it's not a final proposal. Years ago, Hugo proposed a naming scheme that described the non-standard raid varieties of the btrfs flavor: https://marc.info/?l=linux-btrfs&m=136286324417767 Switching to this naming would be a good addition to the extended raid. Regarding the missing raid56 features, I'll continue working on them as time permits in the following weeks/months, as I'm not aware of anybody working on that actively enough so to speak. Anyway, git branches with the patches: kernel: git://github.com/kdave/btrfs-devel dev/extended-raid-ncopies progs: git://github.com/kdave/btrfs-progs dev/extended-raid-ncopies David Sterba (4): btrfs: refactor block group replication factor calculation to a helper btrfs: add support for 3-copy replication (raid1c3) btrfs: add support for 4-copy replication (raid1c4) btrfs: add incompatibility bit for extended raid features fs/btrfs/ctree.h | 1 + fs/btrfs/extent-tree.c | 45 +++++++----------- fs/btrfs/relocation.c | 1 + fs/btrfs/scrub.c | 4 +- fs/btrfs/super.c | 17 +++---- fs/btrfs/sysfs.c | 2 + fs/btrfs/volumes.c | 84 ++++++++++++++++++++++++++++++--- fs/btrfs/volumes.h | 6 +++ include/uapi/linux/btrfs.h | 12 ++++- include/uapi/linux/btrfs_tree.h | 6 +++ 10 files changed, 134 insertions(+), 44 deletions(-) -- 2.18.0 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 2018-07-13 18:46 [PATCH 0/4] 3- and 4- copy RAID1 David Sterba @ 2018-07-13 18:46 ` David Sterba 0 siblings, 0 replies; 15+ messages in thread From: David Sterba @ 2018-07-13 18:46 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba $ ./mkfs.btrfs -m raid1c4 -d raid1c3 /dev/sd[abcd] Label: (null) UUID: f1f988ab-6750-4bc2-957b-98a4ebe98631 Node size: 16384 Sector size: 4096 Filesystem size: 8.00GiB Block group profiles: Data: RAID1C3 273.06MiB Metadata: RAID1C4 204.75MiB System: RAID1C4 8.00MiB SSD detected: no Incompat features: extref, skinny-metadata, extraid Number of devices: 4 Devices: ID SIZE PATH 1 2.00GiB /dev/sda 2 2.00GiB /dev/sdb 3 2.00GiB /dev/sdc 4 2.00GiB /dev/sdd Signed-off-by: David Sterba <dsterba@suse.com> --- chunk-recover.c | 4 ++++ cmds-balance.c | 4 ++++ cmds-fi-usage.c | 8 +++++++ cmds-inspect-dump-super.c | 3 ++- ctree.h | 8 +++++++ extent-tree.c | 4 ++++ fsfeatures.c | 6 +++++ ioctl.h | 3 ++- mkfs/main.c | 11 ++++++++- print-tree.c | 6 +++++ utils.c | 13 +++++++++-- volumes.c | 48 +++++++++++++++++++++++++++++++++++++-- volumes.h | 4 ++++ 13 files changed, 115 insertions(+), 7 deletions(-) diff --git a/chunk-recover.c b/chunk-recover.c index 1d30db51d8ed..661a3bcb4f92 100644 --- a/chunk-recover.c +++ b/chunk-recover.c @@ -1569,6 +1569,10 @@ static int calc_num_stripes(u64 type) else if (type & (BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_DUP)) return 2; + else if (type & (BTRFS_BLOCK_GROUP_RAID1C3)) + return 3; + else if (type & (BTRFS_BLOCK_GROUP_RAID1C4)) + return 4; else return 1; } diff --git a/cmds-balance.c b/cmds-balance.c index 6cc26c358f95..dab8cec5d105 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -46,6 +46,10 @@ static int parse_one_profile(const char *profile, u64 *flags) *flags |= BTRFS_BLOCK_GROUP_RAID0; } else if (!strcmp(profile, "raid1")) { *flags |= BTRFS_BLOCK_GROUP_RAID1; + } else if (!strcmp(profile, "raid1c3")) { + *flags |= BTRFS_BLOCK_GROUP_RAID1C3; + } else if (!strcmp(profile, "raid1c4")) { + *flags |= BTRFS_BLOCK_GROUP_RAID1C4; } else if (!strcmp(profile, "raid10")) { *flags |= BTRFS_BLOCK_GROUP_RAID10; } else if (!strcmp(profile, "raid5")) { diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c index dca2e8d0365f..4e4a415f0d7c 100644 --- a/cmds-fi-usage.c +++ b/cmds-fi-usage.c @@ -373,6 +373,10 @@ static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo, ratio = 1; else if (flags & BTRFS_BLOCK_GROUP_RAID1) ratio = 2; + else if (flags & BTRFS_BLOCK_GROUP_RAID1C3) + ratio = 3; + else if (flags & BTRFS_BLOCK_GROUP_RAID1C4) + ratio = 4; else if (flags & BTRFS_BLOCK_GROUP_RAID5) ratio = 0; else if (flags & BTRFS_BLOCK_GROUP_RAID6) @@ -653,6 +657,10 @@ static u64 calc_chunk_size(struct chunk_info *ci) return ci->size / ci->num_stripes; else if (ci->type & BTRFS_BLOCK_GROUP_RAID1) return ci->size ; + else if (ci->type & BTRFS_BLOCK_GROUP_RAID1C3) + return ci->size; + else if (ci->type & BTRFS_BLOCK_GROUP_RAID1C4) + return ci->size; else if (ci->type & BTRFS_BLOCK_GROUP_DUP) return ci->size ; else if (ci->type & BTRFS_BLOCK_GROUP_RAID5) diff --git a/cmds-inspect-dump-super.c b/cmds-inspect-dump-super.c index e965267c5d96..6984386dbec4 100644 --- a/cmds-inspect-dump-super.c +++ b/cmds-inspect-dump-super.c @@ -228,7 +228,8 @@ static struct readable_flag_entry incompat_flags_array[] = { DEF_INCOMPAT_FLAG_ENTRY(EXTENDED_IREF), DEF_INCOMPAT_FLAG_ENTRY(RAID56), DEF_INCOMPAT_FLAG_ENTRY(SKINNY_METADATA), - DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES) + DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES), + DEF_INCOMPAT_FLAG_ENTRY(EXTENDED_RAID), }; static const int incompat_flags_num = sizeof(incompat_flags_array) / sizeof(struct readable_flag_entry); diff --git a/ctree.h b/ctree.h index 04a77550c715..f49d11e3d178 100644 --- a/ctree.h +++ b/ctree.h @@ -489,6 +489,7 @@ struct btrfs_super_block { #define BTRFS_FEATURE_INCOMPAT_RAID56 (1ULL << 7) #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA (1ULL << 8) #define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9) +#define BTRFS_FEATURE_INCOMPAT_EXTENDED_RAID (1ULL << 10) #define BTRFS_FEATURE_COMPAT_SUPP 0ULL @@ -509,6 +510,7 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_RAID56 | \ BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS | \ BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ + BTRFS_FEATURE_INCOMPAT_EXTENDED_RAID | \ BTRFS_FEATURE_INCOMPAT_NO_HOLES) /* @@ -958,6 +960,8 @@ struct btrfs_csum_item { #define BTRFS_BLOCK_GROUP_RAID10 (1ULL << 6) #define BTRFS_BLOCK_GROUP_RAID5 (1ULL << 7) #define BTRFS_BLOCK_GROUP_RAID6 (1ULL << 8) +#define BTRFS_BLOCK_GROUP_RAID1C3 (1ULL << 9) +#define BTRFS_BLOCK_GROUP_RAID1C4 (1ULL << 10) #define BTRFS_BLOCK_GROUP_RESERVED BTRFS_AVAIL_ALLOC_BIT_SINGLE enum btrfs_raid_types { @@ -968,6 +972,8 @@ enum btrfs_raid_types { BTRFS_RAID_SINGLE, BTRFS_RAID_RAID5, BTRFS_RAID_RAID6, + BTRFS_RAID_RAID1C3, + BTRFS_RAID_RAID1C4, BTRFS_NR_RAID_TYPES }; @@ -979,6 +985,8 @@ enum btrfs_raid_types { BTRFS_BLOCK_GROUP_RAID1 | \ BTRFS_BLOCK_GROUP_RAID5 | \ BTRFS_BLOCK_GROUP_RAID6 | \ + BTRFS_BLOCK_GROUP_RAID1C3 | \ + BTRFS_BLOCK_GROUP_RAID1C4 | \ BTRFS_BLOCK_GROUP_DUP | \ BTRFS_BLOCK_GROUP_RAID10) diff --git a/extent-tree.c b/extent-tree.c index 0643815bd41c..836cd4e9c088 100644 --- a/extent-tree.c +++ b/extent-tree.c @@ -1848,6 +1848,8 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) { u64 extra_flags = flags & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID1C3 | + BTRFS_BLOCK_GROUP_RAID1C4 | BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6 | @@ -3629,6 +3631,8 @@ static u64 get_dev_extent_len(struct map_lookup *map) case 0: /* Single */ case BTRFS_BLOCK_GROUP_DUP: case BTRFS_BLOCK_GROUP_RAID1: + case BTRFS_BLOCK_GROUP_RAID1C3: + case BTRFS_BLOCK_GROUP_RAID1C4: div = 1; break; case BTRFS_BLOCK_GROUP_RAID5: diff --git a/fsfeatures.c b/fsfeatures.c index 7d85d60f1277..50547bad8db2 100644 --- a/fsfeatures.c +++ b/fsfeatures.c @@ -86,6 +86,12 @@ static const struct btrfs_fs_feature { VERSION_TO_STRING2(4,0), NULL, 0, "no explicit hole extents for files" }, + { "extraid", BTRFS_FEATURE_INCOMPAT_EXTENDED_RAID, + "extended_raid", + VERSION_TO_STRING2(4,17), + NULL, 0, + NULL, 0, + "extended raid features: raid1c3, raid1c4" }, /* Keep this one last */ { "list-all", BTRFS_FEATURE_LIST_ALL, NULL } }; diff --git a/ioctl.h b/ioctl.h index 709e996f401c..ae8f60515533 100644 --- a/ioctl.h +++ b/ioctl.h @@ -682,7 +682,8 @@ enum btrfs_err_code { BTRFS_ERROR_DEV_TGT_REPLACE, BTRFS_ERROR_DEV_MISSING_NOT_FOUND, BTRFS_ERROR_DEV_ONLY_WRITABLE, - BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS + BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS, + BTRFS_ERROR_DEV_RAID1c3_MIN_NOT_MET, }; /* An error code to error string mapping for the kernel diff --git a/mkfs/main.c b/mkfs/main.c index b76462a735cf..099b38bbc80c 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -346,7 +346,7 @@ static void print_usage(int ret) printf("Usage: mkfs.btrfs [options] dev [ dev ... ]\n"); printf("Options:\n"); printf(" allocation profiles:\n"); - printf("\t-d|--data PROFILE data profile, raid0, raid1, raid5, raid6, raid10, dup or single\n"); + printf("\t-d|--data PROFILE data profile, raid0, raid1, raid1c3, raid1c4, raid5, raid6, raid10, dup or single\n"); printf("\t-m|--metadata PROFILE metadata profile, values like for data profile\n"); printf("\t-M|--mixed mix metadata and data together\n"); printf(" features:\n"); @@ -377,6 +377,10 @@ static u64 parse_profile(const char *s) return BTRFS_BLOCK_GROUP_RAID0; } else if (strcasecmp(s, "raid1") == 0) { return BTRFS_BLOCK_GROUP_RAID1; + } else if (strcasecmp(s, "raid1c3") == 0) { + return BTRFS_BLOCK_GROUP_RAID1C3; + } else if (strcasecmp(s, "raid1c4") == 0) { + return BTRFS_BLOCK_GROUP_RAID1C4; } else if (strcasecmp(s, "raid5") == 0) { return BTRFS_BLOCK_GROUP_RAID5; } else if (strcasecmp(s, "raid6") == 0) { @@ -958,6 +962,11 @@ int main(int argc, char **argv) features |= BTRFS_FEATURE_INCOMPAT_RAID56; } + if ((data_profile | metadata_profile) & + (BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4)) { + features |= BTRFS_FEATURE_INCOMPAT_EXTENDED_RAID; + } + if (btrfs_check_nodesize(nodesize, sectorsize, features)) goto error; diff --git a/print-tree.c b/print-tree.c index a09ecfbb28f0..f816a851ea65 100644 --- a/print-tree.c +++ b/print-tree.c @@ -163,6 +163,12 @@ static void bg_flags_to_str(u64 flags, char *ret) case BTRFS_BLOCK_GROUP_RAID1: strcat(ret, "|RAID1"); break; + case BTRFS_BLOCK_GROUP_RAID1C3: + strcat(ret, "|RAID1C3"); + break; + case BTRFS_BLOCK_GROUP_RAID1C4: + strcat(ret, "|RAID1C4"); + break; case BTRFS_BLOCK_GROUP_DUP: strcat(ret, "|DUP"); break; diff --git a/utils.c b/utils.c index d4395b1f32f8..4e942cff40d0 100644 --- a/utils.c +++ b/utils.c @@ -1884,8 +1884,10 @@ static int group_profile_devs_min(u64 flag) case BTRFS_BLOCK_GROUP_RAID5: return 2; case BTRFS_BLOCK_GROUP_RAID6: + case BTRFS_BLOCK_GROUP_RAID1C3: return 3; case BTRFS_BLOCK_GROUP_RAID10: + case BTRFS_BLOCK_GROUP_RAID1C4: return 4; default: return -1; @@ -1901,9 +1903,9 @@ int test_num_disk_vs_raid(u64 metadata_profile, u64 data_profile, switch (dev_cnt) { default: case 4: - allowed |= BTRFS_BLOCK_GROUP_RAID10; + allowed |= BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_RAID1C4; case 3: - allowed |= BTRFS_BLOCK_GROUP_RAID6; + allowed |= BTRFS_BLOCK_GROUP_RAID6 | BTRFS_BLOCK_GROUP_RAID1C3; case 2: allowed |= BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID5; @@ -1955,7 +1957,10 @@ int group_profile_max_safe_loss(u64 flags) case BTRFS_BLOCK_GROUP_RAID10: return 1; case BTRFS_BLOCK_GROUP_RAID6: + case BTRFS_BLOCK_GROUP_RAID1C3: return 2; + case BTRFS_BLOCK_GROUP_RAID1C4: + return 3; default: return -1; } @@ -2170,6 +2175,10 @@ const char* btrfs_group_profile_str(u64 flag) return "RAID0"; case BTRFS_BLOCK_GROUP_RAID1: return "RAID1"; + case BTRFS_BLOCK_GROUP_RAID1C3: + return "RAID1C3"; + case BTRFS_BLOCK_GROUP_RAID1C4: + return "RAID1C4"; case BTRFS_BLOCK_GROUP_RAID5: return "RAID5"; case BTRFS_BLOCK_GROUP_RAID6: diff --git a/volumes.c b/volumes.c index 24eb3e8b2578..ae571da95094 100644 --- a/volumes.c +++ b/volumes.c @@ -94,6 +94,24 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_increment = 1, .ncopies = 3, }, + [BTRFS_RAID_RAID1C3] = { + .sub_stripes = 1, + .dev_stripes = 1, + .devs_max = 0, + .devs_min = 3, + .tolerated_failures = 2, + .devs_increment = 3, + .ncopies = 3, + }, + [BTRFS_RAID_RAID1C4] = { + .sub_stripes = 1, + .dev_stripes = 1, + .devs_max = 0, + .devs_min = 4, + .tolerated_failures = 3, + .devs_increment = 4, + .ncopies = 4, + }, }; struct stripe { @@ -795,6 +813,8 @@ static u64 chunk_bytes_by_type(u64 type, u64 calc_size, int num_stripes, { if (type & (BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_DUP)) return calc_size; + else if (type & (BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4)) + return calc_size; else if (type & BTRFS_BLOCK_GROUP_RAID10) return calc_size * (num_stripes / sub_stripes); else if (type & BTRFS_BLOCK_GROUP_RAID5) @@ -971,6 +991,20 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, return -ENOSPC; min_stripes = 2; } + if (type & BTRFS_BLOCK_GROUP_RAID1C3) { + num_stripes = min_t(u64, 3, + btrfs_super_num_devices(info->super_copy)); + if (num_stripes < 3) + return -ENOSPC; + min_stripes = 3; + } + if (type & BTRFS_BLOCK_GROUP_RAID1C4) { + num_stripes = min_t(u64, 4, + btrfs_super_num_devices(info->super_copy)); + if (num_stripes < 4) + return -ENOSPC; + min_stripes = 4; + } if (type & BTRFS_BLOCK_GROUP_DUP) { num_stripes = 2; min_stripes = 2; @@ -1315,7 +1349,8 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len) } map = container_of(ce, struct map_lookup, ce); - if (map->type & (BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID1)) + if (map->type & (BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4)) ret = map->num_stripes; else if (map->type & BTRFS_BLOCK_GROUP_RAID10) ret = map->sub_stripes; @@ -1511,6 +1546,8 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, if (rw == WRITE) { if (map->type & (BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID1C3 | + BTRFS_BLOCK_GROUP_RAID1C4 | BTRFS_BLOCK_GROUP_DUP)) { stripes_required = map->num_stripes; } else if (map->type & BTRFS_BLOCK_GROUP_RAID10) { @@ -1553,6 +1590,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, stripe_offset = offset - stripe_offset; if (map->type & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4 | BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6 | BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_DUP)) { @@ -1568,7 +1606,9 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, multi->num_stripes = 1; stripe_index = 0; - if (map->type & BTRFS_BLOCK_GROUP_RAID1) { + if (map->type & (BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID1C3 | + BTRFS_BLOCK_GROUP_RAID1C4)) { if (rw == WRITE) multi->num_stripes = map->num_stripes; else if (mirror_num) @@ -1838,6 +1878,8 @@ int btrfs_check_chunk_valid(struct btrfs_fs_info *fs_info, if ((type & BTRFS_BLOCK_GROUP_RAID10 && (sub_stripes != 2 || !IS_ALIGNED(num_stripes, sub_stripes))) || (type & BTRFS_BLOCK_GROUP_RAID1 && num_stripes < 1) || + (type & BTRFS_BLOCK_GROUP_RAID1C3 && num_stripes < 3) || + (type & BTRFS_BLOCK_GROUP_RAID1C4 && num_stripes < 4) || (type & BTRFS_BLOCK_GROUP_RAID5 && num_stripes < 2) || (type & BTRFS_BLOCK_GROUP_RAID6 && num_stripes < 3) || (type & BTRFS_BLOCK_GROUP_DUP && num_stripes > 2) || @@ -2391,6 +2433,8 @@ u64 btrfs_stripe_length(struct btrfs_fs_info *fs_info, switch (profile) { case 0: /* Single profile */ case BTRFS_BLOCK_GROUP_RAID1: + case BTRFS_BLOCK_GROUP_RAID1C3: + case BTRFS_BLOCK_GROUP_RAID1C4: case BTRFS_BLOCK_GROUP_DUP: stripe_len = chunk_len; break; diff --git a/volumes.h b/volumes.h index b4ea93f0bec3..6f74aee998e3 100644 --- a/volumes.h +++ b/volumes.h @@ -126,6 +126,10 @@ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags) return BTRFS_RAID_RAID10; else if (flags & BTRFS_BLOCK_GROUP_RAID1) return BTRFS_RAID_RAID1; + else if (flags & BTRFS_BLOCK_GROUP_RAID1C3) + return BTRFS_RAID_RAID1C3; + else if (flags & BTRFS_BLOCK_GROUP_RAID1C4) + return BTRFS_RAID_RAID1C4; else if (flags & BTRFS_BLOCK_GROUP_DUP) return BTRFS_RAID_DUP; else if (flags & BTRFS_BLOCK_GROUP_RAID0) -- 2.18.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2019-11-15 10:28 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-10-31 15:13 [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba 2019-10-31 15:13 ` [PATCH v2 1/4] btrfs: add support for 3-copy replication (raid1c3) David Sterba 2019-10-31 15:13 ` [PATCH v2 2/4] btrfs: add support for 4-copy replication (raid1c4) David Sterba 2019-10-31 15:13 ` [PATCH v2 3/4] btrfs: add incompat for raid1 with 3, 4 copies David Sterba 2019-10-31 15:13 ` [PATCH v2 4/4] btrfs: drop incompat bit for raid1c34 after last block group is gone David Sterba 2019-10-31 18:43 ` [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 David Sterba 2019-10-31 18:44 ` [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba 2019-11-01 14:54 ` Neal Gompa 2019-11-01 15:09 ` David Sterba 2019-11-03 0:35 ` waxhead 2019-11-04 13:40 ` David Sterba 2019-11-14 5:13 ` Zygo Blaxell 2019-11-15 10:28 ` David Sterba -- strict thread matches above, loose matches on Subject: below -- 2019-06-10 12:29 [PATCH v2 0/6] " David Sterba 2019-06-10 12:29 ` [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 David Sterba 2018-07-13 18:46 [PATCH 0/4] 3- and 4- copy RAID1 David Sterba 2018-07-13 18:46 ` [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 David Sterba
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).