From: David Sterba <dsterba@suse.com>
To: linux-btrfs@vger.kernel.org
Cc: David Sterba <dsterba@suse.com>
Subject: [PATCH v3 0/4] RAID1 with 3- and 4- copies
Date: Thu, 31 Oct 2019 16:13:41 +0100 [thread overview]
Message-ID: <cover.1572534591.git.dsterba@suse.com> (raw)
Here it goes again, RAID1 with 3- and 4- copies. I found the bug that stopped
it from inclusion last time, it was in the test itself, so the kernel code is
effectively unchanged.
So, with 1 or 2 missing devices, replace by device id works. There's one
annoying thing but not new: regarding replace of a missing device, some
extra single/dup block groups are created during the replace process.
Example below. This can happen on plain raid1 with degraded read-write
mount as well.
Now what's the merge target.
The patches almost made it to 5.3, the changes build on existing code so the
actual addition of new profiles is namely in the definitions and additional
cases. So it should be safe.
I'm for adding it to 5.5 queue, though we're at rc5 and this can be seen as a
late time for a feature. The user benefits are noticeable, raid1c3 can replace
raid6 of metadata which is the most problematic part and much more complicated
to fix (write ahead journal or something like that). The feedback regarding the
plain 3-copy as a replacement was positive, on IRC and there are mails about
that too.
Further information can be found in the 5.3-time submission:
https://lore.kernel.org/linux-btrfs/cover.1559917235.git.dsterba@suse.com/
--
Example of 2 devices gone missing and replaced
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- mkfs -d raid1c3 -m raidc3 /dev/sda10 /dev/sda11 /dev/sda12
- delete devices 2 and 3 from the system
Data Metadata System
Id Path RAID1C3 RAID1C3 RAID1C3 Unallocated
-- ---------- --------- --------- -------- -----------
1 /dev/sda10 1.00GiB 256.00MiB 8.00MiB 8.74GiB
2 missing 1.00GiB 256.00MiB 8.00MiB -1.26GiB
3 missing 1.00GiB 256.00MiB 8.00MiB -1.26GiB
-- ---------- --------- --------- -------- -----------
Total 1.00GiB 256.00MiB 8.00MiB 6.23GiB
Used 200.31MiB 320.00KiB 16.00KiB
- mount -o degraded
- btrfs replace 2 /dev/sda13
Data Metadata Metadata System System
Id Path RAID1C3 single RAID1C3 single RAID1C3 Unallocated
-- ---------- --------- --------- --------- -------- ------- -----------
1 /dev/sda10 1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB 8.46GiB
2 /dev/sda13 1.00GiB - 256.00MiB - 8.00MiB 8.74GiB
3 missing 1.00GiB - 256.00MiB - 8.00MiB -1.26GiB
-- ---------- --------- --------- --------- -------- ------- -----------
Total 1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB 15.95GiB
Used 200.31MiB 0.00B 320.00KiB 16.00KiB 0.00B
- btrfs replace 3 /dev/sda14
Data Metadata Metadata System System
Id Path RAID1C3 single RAID1C3 single RAID1C3 Unallocated
-- ---------- --------- --------- --------- -------- ------- -----------
1 /dev/sda10 1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB 8.46GiB
2 /dev/sda13 1.00GiB - 256.00MiB - 8.00MiB 8.74GiB
3 /dev/sda14 1.00GiB - 256.00MiB - 8.00MiB 8.74GiB
-- ---------- --------- --------- --------- -------- ------- -----------
Total 1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB 25.95GiB
Used 200.31MiB 0.00B 320.00KiB 16.00KiB 0.00B
There you can see the metadata/single and system/single chunks, that are
otherwise unused if there are no other writes happening during replace.
Running 'balance start -mconvert=raid1c3,profiles=single' should get rid of
them.
This is an annoyance, we have a plan to avoid that but it needs to change
behaviour with degraded mount and enabled writes.
Implementation details: The new profiles are reduced from the expected ones
(raid1 -> single or dup) to allow writes without breaking the raid
constraints. To relax that condition, allow writing to "half" of the raid
with a missing device will skip creating the block groups.
This is similar to MD-RAID that allows writing to just one of the RAID1
devices, and then sync to the other when it's available again.
With the btrfs style raid1 we can do better in case there are enough other
devices that would satify the raid1 constraint (yet with a missing device).
--
David Sterba (4):
btrfs: add support for 3-copy replication (raid1c3)
btrfs: add support for 4-copy replication (raid1c4)
btrfs: add incompat for raid1 with 3, 4 copies
btrfs: drop incompat bit for raid1c34 after last block group is gone
fs/btrfs/block-group.c | 27 ++++++++++++++--------
fs/btrfs/ctree.h | 7 +++---
fs/btrfs/super.c | 4 ++++
fs/btrfs/sysfs.c | 2 ++
fs/btrfs/volumes.c | 40 +++++++++++++++++++++++++++++++--
fs/btrfs/volumes.h | 4 ++++
include/uapi/linux/btrfs.h | 5 ++++-
include/uapi/linux/btrfs_tree.h | 10 ++++++++-
8 files changed, 83 insertions(+), 16 deletions(-)
--
2.23.0
next reply other threads:[~2019-10-31 15:13 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-31 15:13 David Sterba [this message]
2019-10-31 15:13 ` [PATCH v2 1/4] btrfs: add support for 3-copy replication (raid1c3) David Sterba
2019-10-31 15:13 ` [PATCH v2 2/4] btrfs: add support for 4-copy replication (raid1c4) David Sterba
2019-10-31 15:13 ` [PATCH v2 3/4] btrfs: add incompat for raid1 with 3, 4 copies David Sterba
2019-10-31 15:13 ` [PATCH v2 4/4] btrfs: drop incompat bit for raid1c34 after last block group is gone David Sterba
2019-10-31 18:43 ` [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 David Sterba
2019-10-31 18:44 ` [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba
2019-11-01 14:54 ` Neal Gompa
2019-11-01 15:09 ` David Sterba
2019-11-03 0:35 ` waxhead
2019-11-04 13:40 ` David Sterba
2019-11-14 5:13 ` Zygo Blaxell
2019-11-15 10:28 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1572534591.git.dsterba@suse.com \
--to=dsterba@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).