From: pg@btrfs.list.sabi.co.UK (Peter Grandi)
To: Linux fs Btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Unexpected raid1 behaviour
Date: Tue, 19 Dec 2017 12:59:09 +0000 [thread overview]
Message-ID: <23097.3357.442045.199@tree.ty.sabi.co.uk> (raw)
In-Reply-To: <CAJCQCtSeLcpyWG1SKqqfgw2MFn7ugZu1Cby6BGd=cfSbEgUmRA@mail.gmail.com>
[ ... ]
> The advantage of writing single chunks when degraded, is in
> the case where a missing device returns (is readded,
> intact). Catching up that device with the first drive, is a
> manual but simple invocation of 'btrfs balance start
> -dconvert=raid1,soft -mconvert=raid1,soft' The alternative is
> a full balance or full scrub. It's pretty tedious for big
> arrays.
That is merely an after-the-fact rationalization for a design
that is at the same time entirely logical and quite broken: that
the intended replication factor is the same as the current
number of members of the volume, so if a volume has (currently)
only one member, than only "single" chunks gets created.
A design that would work better for operations would be to have
"profiles" to be a concept entirely independent of number of
members, or perhaps more precisely to have the "desired" profile
of a chunk be distinct from the "actual" profile (dependent on
the actual number of members of a volume) of that chunk, so that
if a volume has only one member chunks could be created that
have "desired" profile 'raid1' but "actual" profile 'single', or
perhaps more sensibly 'raid1-with-missing-mirror', with checks
that "actual" profile be usable else the volume is not
mountable.
Note: ideally every chunk would have both a static desired
profile and a desired stripe width, and a computed actual
profile and a actual stripe width. Or perhaps the desired
profile and width would be properties of the volume (for each of
the three types of data).
For example in MD RAID it is perfectly legitimate to create a
RAID6 set with "desired" width of 6 and "actual" width of 4 (in
which case it can be activated as degraded) or a RAID5 set with
"desired" width of 5 and actual width of 3 (in which case it
cannot be activated at all until at least another member is
added).
The difference with MD RAID is that in MD RAID there is (except
in one case , during conversion) an exact match between
"desired" profile stripe width and number of members, while at
least in principle a Btrfs volume can have any number of chunks
of any profile of any desired stripe size (except that current
implementation is not so flexible in most profiles).
That would require scanning all chunks to determine whether a
volume is mountable at all or mountable only as degraded, while
MD RAID can just count the members. Apparently recent versions
of the Btrfs 'raid1' profile do just that.
next prev parent reply other threads:[~2017-12-19 12:59 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-16 19:50 Unexpected raid1 behaviour Dark Penguin
2017-12-17 11:58 ` Duncan
2017-12-17 15:48 ` Peter Grandi
2017-12-17 20:42 ` Chris Murphy
2017-12-18 8:49 ` Anand Jain
2017-12-18 8:49 ` Anand Jain
2017-12-18 10:36 ` Peter Grandi
2017-12-18 12:10 ` Nikolay Borisov
2017-12-18 13:43 ` Anand Jain
2017-12-18 22:28 ` Chris Murphy
2017-12-18 22:29 ` Chris Murphy
2017-12-19 12:30 ` Adam Borowski
2017-12-19 12:54 ` Andrei Borzenkov
2017-12-19 12:59 ` Peter Grandi [this message]
2017-12-18 13:06 ` Austin S. Hemmelgarn
2017-12-18 19:43 ` Tomasz Pala
2017-12-18 22:01 ` Peter Grandi
2017-12-19 12:46 ` Austin S. Hemmelgarn
2017-12-19 12:25 ` Austin S. Hemmelgarn
2017-12-19 14:46 ` Tomasz Pala
2017-12-19 16:35 ` Austin S. Hemmelgarn
2017-12-19 17:56 ` Tomasz Pala
2017-12-19 19:47 ` Chris Murphy
2017-12-19 21:17 ` Tomasz Pala
2017-12-20 0:08 ` Chris Murphy
2017-12-23 4:08 ` Tomasz Pala
2017-12-23 5:23 ` Duncan
2017-12-20 16:53 ` Andrei Borzenkov
2017-12-20 16:57 ` Austin S. Hemmelgarn
2017-12-20 20:02 ` Chris Murphy
2017-12-20 20:07 ` Chris Murphy
2017-12-20 20:14 ` Austin S. Hemmelgarn
2017-12-21 1:34 ` Chris Murphy
2017-12-21 11:49 ` Andrei Borzenkov
2017-12-19 20:11 ` Austin S. Hemmelgarn
2017-12-19 21:58 ` Tomasz Pala
2017-12-20 13:10 ` Austin S. Hemmelgarn
2017-12-19 23:53 ` Chris Murphy
2017-12-20 13:12 ` Austin S. Hemmelgarn
2017-12-19 18:31 ` George Mitchell
2017-12-19 20:28 ` Tomasz Pala
2017-12-19 19:35 ` Chris Murphy
2017-12-19 20:41 ` Tomasz Pala
2017-12-19 20:47 ` Austin S. Hemmelgarn
2017-12-19 22:23 ` Tomasz Pala
2017-12-20 13:33 ` Austin S. Hemmelgarn
2017-12-20 17:28 ` Duncan
2017-12-21 11:44 ` Andrei Borzenkov
2017-12-21 12:27 ` Austin S. Hemmelgarn
2017-12-22 16:05 ` Tomasz Pala
2017-12-22 21:04 ` Chris Murphy
2017-12-23 2:52 ` Tomasz Pala
2017-12-23 5:40 ` Duncan
2017-12-19 23:59 ` Chris Murphy
2017-12-20 8:34 ` Tomasz Pala
2017-12-20 8:51 ` Tomasz Pala
2017-12-20 19:49 ` Chris Murphy
2017-12-18 5:11 ` Anand Jain
2017-12-18 1:20 ` Qu Wenruo
2017-12-18 13:31 ` Austin S. Hemmelgarn
2018-01-12 12:26 ` Dark Penguin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=23097.3357.442045.199@tree.ty.sabi.co.uk \
--to=pg@btrfs.list.sabi.co.uk \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox