From: David Sterba <dsterba@suse.com>
To: linux-btrfs@vger.kernel.org
Cc: David Sterba <dsterba@suse.com>
Subject: [PATCH v2 0/6] RAID1 with 3- and 4- copies
Date: Mon, 10 Jun 2019 14:29:40 +0200 [thread overview]
Message-ID: <cover.1559917235.git.dsterba@suse.com> (raw)
Hi,
this patchset brings the RAID1 with 3 and 4 copies as a separate
feature as outlined in V1
(https://lore.kernel.org/linux-btrfs/cover.1531503452.git.dsterba@suse.com/).
This should help a bit in the raid56 situation, where the write hole
hurts most for metadata, without a block group profile that offers 2
device loss resistance.
I've gathered some feedback from knowlegeable poeople on IRC and the
following setup is considered good enough (certainly better than what we
have now):
- data: RAID6
- metadata: RAID1C3
The RAID1C3 vs RAID6 have different characteristics in terms of space
consumption and repair.
Space consumption
~~~~~~~~~~~~~~~~~
* RAID6 reduces overall metadata by N/(N-2), so with more devices the
parity overhead ratio is small
* RAID1C3 will allways consume 67% of metadata chunks for redundancy
The overall size of metadata is typically in range of gigabytes to
hundreds of gigabytes (depends on usecase), rough estimate is from
1%-10%. With larger filesystem the percentage is usually smaller.
So, for the 3-copy raid1 the cost of redundancy is better expressed in
the absolute value of gigabytes "wasted" on redundancy than as the
ratio that does look scary compared to raid6.
Repair
~~~~~~
RAID6 needs to access all available devices to calculate the P and Q,
either 1 or 2 missing devices.
RAID1C3 can utilize the independence of each copy and also the way the
RAID1 works in btrfs. In the scenario with 1 missing device, one of the
2 correct copies is read and written to the repaired devices.
Given how the 2-copy RAID1 works on btrfs, the block groups could be
spread over several devices so the load during repair would be spread as
well.
Additionally, device replace works sequentially and in big chunks so on
a lightly used system the read pattern is seek-friendly.
Compatibility
~~~~~~~~~~~~~
The new block group types cost an incompatibility bit, so old kernel
will refuse to mount filesystem with RAID1C3 feature, ie. any chunk on
the filesystem with the new type.
To upgrade existing filesystems use the balance filters eg. from RAID6
$ btrfs balance start -mconvert=raid1c3 /path
Merge target
~~~~~~~~~~~~
I'd like to push that to misc-next for wider testing and merge to 5.3,
unless something bad pops up. Given that the code changes are small and
just a new types with the constraints, the rest is done by the generic
code, I'm not expecting problems that can't be fixed before full
release.
Testing so far
~~~~~~~~~~~~~~
* mkfs with the profiles
* fstests (no specific tests, only check that it does not break)
* profile conversions between single/raid1/raid5/raid1c3/raid6/raid1c4/raid1c4
with added devices where needed
* scrub
TODO:
* 1 missing device followed by repair
* 2 missing devices followed by repair
David Sterba (6):
btrfs: add mask for all RAID1 types
btrfs: use mask for RAID56 profiles
btrfs: document BTRFS_MAX_MIRRORS
btrfs: add support for 3-copy replication (raid1c3)
btrfs: add support for 4-copy replication (raid1c4)
btrfs: add incompat for raid1 with 3, 4 copies
fs/btrfs/ctree.h | 14 ++++++++--
fs/btrfs/extent-tree.c | 19 +++++++------
fs/btrfs/scrub.c | 2 +-
fs/btrfs/super.c | 6 +++++
fs/btrfs/sysfs.c | 2 ++
fs/btrfs/volumes.c | 48 ++++++++++++++++++++++++++++-----
fs/btrfs/volumes.h | 4 +++
include/uapi/linux/btrfs.h | 5 +++-
include/uapi/linux/btrfs_tree.h | 10 +++++++
9 files changed, 90 insertions(+), 20 deletions(-)
--
2.21.0
next reply other threads:[~2019-06-10 12:28 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-10 12:29 David Sterba [this message]
2019-06-10 12:29 ` [PATCH v2 1/6] btrfs: add mask for all RAID1 types David Sterba
2019-06-10 12:29 ` [PATCH v2 2/6] btrfs: use mask for RAID56 profiles David Sterba
2019-06-10 12:29 ` [PATCH v2 3/6] btrfs: document BTRFS_MAX_MIRRORS David Sterba
2019-06-10 12:29 ` [PATCH v2 4/6] btrfs: add support for 3-copy replication (raid1c3) David Sterba
2019-06-10 12:29 ` [PATCH v2 5/6] btrfs: add support for 4-copy replication (raid1c4) David Sterba
2019-06-10 12:29 ` [PATCH v2 6/6] btrfs: add incompat for raid1 with 3, 4 copies David Sterba
2019-06-10 12:29 ` [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 David Sterba
2019-06-10 12:42 ` [PATCH v2 0/6] RAID1 with 3- and 4- copies Hugo Mills
2019-06-10 14:02 ` David Sterba
2019-06-10 14:48 ` Hugo Mills
2019-06-11 9:53 ` David Sterba
2019-06-11 12:03 ` David Sterba
2019-06-25 17:47 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1559917235.git.dsterba@suse.com \
--to=dsterba@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).