From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Data single *and* raid?
Date: Fri, 7 Aug 2015 06:25:12 +0000 (UTC) [thread overview]
Message-ID: <pan$3604e$7f253329$74ce0fc2$93d6f380@cox.net> (raw)
In-Reply-To: 55C43F14.7080402@friedels.name
Hendrik Friedel posted on Fri, 07 Aug 2015 07:16:04 +0200 as excerpted:
>>> But then:
>>> # btrfs fi df /mnt/__Complete_Disk/
>>> Data, RAID5: total=3.83TiB, used=3.78TiB
>>> System, RAID5: total=32.00MiB, used=576.00KiB
>>> Metadata, RAID5: total=6.46GiB, used=4.84GiB
>>> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> [T]his seems to be a RAID5 now, right?
> Well, that's what I want, but the command was:
> btrfs balance start -dprofiles=single -mprofiles=raid1
> /mnt/__Complete_Disk/
>
> So, we would expect raid1 here, no?
No. The behavior might be a bit counterintuitive on first glance, but
once the logic is understood, it makes sense.
1) You had tried the initial raid5 convert using an earlier kernel that
had incomplete raid5 support, as evidenced by the lack of the global-
reserve line in btrfs fi df, on a new enough userspace that it should
have had it.
2) That initial attempt ran out of space, possibly because it was keeping
the single and raid1 chunks around due to fragmentation (Hugo's guess),
or due to a now fixed raid5 conversion bugs in the old kernel[1] (my
guess), or possibly due to some other bug that's apparently fixed in
newer kernels, thus the successful completion of the conversion below.
3) But that initial attempt still did one critical thing -- set the
default new-chunk type to raid5, for both data and metadata.
4) So when the second btrfs balance attempt came along, this one
primarily intended to clean up that fragmentation that Hugo expected, and
thus targeted at those old single data and raid1 metadata chunks, when it
rewrote those chunks it used the new chunk default, rewriting them into
raid5.
That was a result that Hugo obviously didn't predict as his instructions
suggested following up with another balance command to complete the
conversion. And neither Chris (apparently) nor I (definitely!) foresaw
it either. But the behavior does make sense, once you take into account
the default chunk type, and that a balance-convert does normally change
it.
And FWIW, the precise behavior of this default chunk type selector and
when it falls back to single data and raid1 or dup metadata (as it will
in some instances with a degraded filesystem), has both been problematic
before, and is being debated in a current thread, due to the implications
for writable mounts of degraded single-device raid1s, for instance. It's
behavior in corner-cases like these that is much of the reason most
regulars on this list don't consider btrfs fully stable and mature, just
yet, because sometimes that corner-case behavior can mean the filesystem
doing the wrong thing, going read-only, without any way to correct the
problem even tho things are generally still fine, because correcting the
problem would require a writable filesystem, thus creating a chicken and
egg situation where correcting the problem requires a writable
filesystem, but a writable filesystem isn't allowed until the problem is
corrected, for instance. (As of now, in that situation a user has little
choice but to copy the data on that read-only filesystem elsewhere, do a
mkfs to wipe away the problem, and restore to the fresh filesystem.
Technically, that shouldn't be required.)
---
[1] FWIW, for "online" tasks like btrfs balance, the btrfs-progs
userspace simply issues the commands to the kernel, which does the real
work. For "offline" tasks such as btrfs check or btrfs restore,
userspace is the real brains and the kernel simply relays the commands at
the device level, without much involvement by the kernel's btrfs code at
all. So while you had a current userspace, the old kernel was the
critical part since btrfs balance is an online command in which it's the
kernel's btrfs code that does the real work.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2015-08-07 6:25 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-01 20:09 Data single *and* raid? Hendrik Friedel
2015-08-01 20:24 ` Chris Murphy
2015-08-01 20:32 ` Hugo Mills
2015-08-01 20:44 ` Chris Murphy
2015-08-01 21:45 ` Duncan
2015-08-01 22:26 ` Chris Murphy
2015-08-01 22:34 ` Hugo Mills
2015-08-02 0:27 ` Duncan
2015-08-02 1:14 ` Chris Murphy
2015-08-02 3:46 ` Duncan
2015-08-02 18:31 ` Chris Murphy
2015-08-02 19:06 ` Hugo Mills
2015-08-02 12:54 ` Hendrik Friedel
2015-08-06 18:57 ` Hendrik Friedel
2015-08-07 1:26 ` Qu Wenruo
2015-08-07 5:16 ` Hendrik Friedel
2015-08-07 6:25 ` Duncan [this message]
2015-08-07 8:11 ` Hugo Mills
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$3604e$7f253329$74ce0fc2$93d6f380@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.