Re: Data single *and* raid?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Data single *and* raid?
Date: Fri, 7 Aug 2015 06:25:12 +0000 (UTC)	[thread overview]
Message-ID: <pan$3604e$7f253329$74ce0fc2$93d6f380@cox.net> (raw)
In-Reply-To: 55C43F14.7080402@friedels.name

Hendrik Friedel posted on Fri, 07 Aug 2015 07:16:04 +0200 as excerpted:

>>> But then:
>>> # btrfs fi df /mnt/__Complete_Disk/
>>> Data, RAID5: total=3.83TiB, used=3.78TiB
>>> System, RAID5: total=32.00MiB, used=576.00KiB
>>> Metadata, RAID5: total=6.46GiB, used=4.84GiB
>>> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> [T]his seems to be a RAID5 now, right?
> Well, that's what I want, but the command was:
> btrfs balance start -dprofiles=single -mprofiles=raid1
> /mnt/__Complete_Disk/
> 
> So, we would expect raid1 here, no?

No.  The behavior might be a bit counterintuitive on first glance, but 
once the logic is understood, it makes sense.

1) You had tried the initial raid5 convert using an earlier kernel that 
had incomplete raid5 support, as evidenced by the lack of the global-
reserve line in btrfs fi df, on a new enough userspace that it should 
have had it.

2) That initial attempt ran out of space, possibly because it was keeping 
the single and raid1 chunks around due to fragmentation (Hugo's guess), 
or due to a now fixed raid5 conversion bugs in the old kernel[1] (my 
guess), or possibly due to some other bug that's apparently fixed in 
newer kernels, thus the successful completion of the conversion below.

3) But that initial attempt still did one critical thing -- set the 
default new-chunk type to raid5, for both data and metadata.

4) So when the second btrfs balance attempt came along, this one 
primarily intended to clean up that fragmentation that Hugo expected, and 
thus targeted at those old single data and raid1 metadata chunks, when it 
rewrote those chunks it used the new chunk default, rewriting them into 
raid5.

That was a result that Hugo obviously didn't predict as his instructions 
suggested following up with another balance command to complete the 
conversion.  And neither Chris (apparently) nor I (definitely!) foresaw 
it either.  But the behavior does make sense, once you take into account 
the default chunk type, and that a balance-convert does normally change 
it.

And FWIW, the precise behavior of this default chunk type selector and 
when it falls back to single data and raid1 or dup metadata (as it will 
in some instances with a degraded filesystem), has both been problematic 
before, and is being debated in a current thread, due to the implications 
for writable mounts of degraded single-device raid1s, for instance.  It's 
behavior in corner-cases like these that is much of the reason most 
regulars on this list don't consider btrfs fully stable and mature, just 
yet, because sometimes that corner-case behavior can mean the filesystem 
doing the wrong thing, going read-only, without any way to correct the 
problem even tho things are generally still fine, because correcting the 
problem would require a writable filesystem, thus creating a chicken and 
egg situation where correcting the problem requires a writable 
filesystem, but a writable filesystem isn't allowed until the problem is 
corrected, for instance.  (As of now, in that situation a user has little 
choice but to copy the data on that read-only filesystem elsewhere, do a 
mkfs to wipe away the problem, and restore to the fresh filesystem.  
Technically, that shouldn't be required.)

---
[1] FWIW, for "online" tasks like btrfs balance, the btrfs-progs 
userspace simply issues the commands to the kernel, which does the real 
work.  For "offline" tasks such as btrfs check or btrfs restore, 
userspace is the real brains and the kernel simply relays the commands at 
the device level, without much involvement by the kernel's btrfs code at 
all.  So while you had a current userspace, the old kernel was the 
critical part since btrfs balance is an online command in which it's the 
kernel's btrfs code that does the real work.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2015-08-07  6:25 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-01 20:09 Data single *and* raid? Hendrik Friedel
2015-08-01 20:24 ` Chris Murphy
2015-08-01 20:32 ` Hugo Mills
2015-08-01 20:44   ` Chris Murphy
2015-08-01 21:45     ` Duncan
2015-08-01 22:26       ` Chris Murphy
2015-08-01 22:34         ` Hugo Mills
2015-08-02  0:27           ` Duncan
2015-08-02  1:14             ` Chris Murphy
2015-08-02  3:46               ` Duncan
2015-08-02 18:31                 ` Chris Murphy
2015-08-02 19:06                   ` Hugo Mills
2015-08-02 12:54     ` Hendrik Friedel
2015-08-06 18:57     ` Hendrik Friedel
2015-08-07  1:26       ` Qu Wenruo
2015-08-07  5:16         ` Hendrik Friedel
2015-08-07  6:25           ` Duncan [this message]
2015-08-07  8:11           ` Hugo Mills

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$3604e$7f253329$74ce0fc2$93d6f380@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.