From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Hans Deragon <hans@deragon.biz>, linux-btrfs@vger.kernel.org
Subject: Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
Date: Wed, 8 Feb 2017 07:50:22 -0500 [thread overview]
Message-ID: <6e4b9b4f-d4df-5679-5645-a52dbc15b424@gmail.com> (raw)
In-Reply-To: <6e26ee22-3d67-d2aa-22b7-c03f9ffc18f2@deragon.biz>
On 2017-02-07 22:21, Hans Deragon wrote:
> Greetings,
>
> On 2017-02-02 10:06, Austin S. Hemmelgarn wrote:
>> On 2017-02-02 09:25, Adam Borowski wrote:
>>> On Thu, Feb 02, 2017 at 07:49:50AM -0500, Austin S. Hemmelgarn wrote:
>>>> This is a severe bug that makes a not all that uncommon (albeit bad) use
>>>> case fail completely. The fix had no dependencies itself and
>>>
>>> I don't see what's bad in mounting a RAID degraded. Yeah, it provides no
>>> redundancy but that's no worse than using a single disk from the start.
>>> And most people not doing storage/server farm don't have a stack of spare
>>> disks at hand, so getting a replacement might take a while.
>> Running degraded is bad. Period. If you don't have a disk on hand to
>> replace the failed one (and if you care about redundancy, you should
>> have at least one spare on hand), you should be converting to a single
>> disk, not continuing to run in degraded mode until you get a new disk.
>> The moment you start talking about running degraded long enough that you
>> will be _booting_ the system with the array degraded, you need to be
>> converting to a single disk. This is of course impractical for
>> something like a hardware array or an LVM volume, but it's _trivial_
>> with BTRFS, and protects you from all kinds of bad situations that can't
>> happen with a single disk but can completely destroy the filesystem if
>> it's a degraded array. Running a single disk is not exactly the same as
>> running a degraded array, it's actually marginally safer (even if you
>> aren't using dup profile for metadata) because there are fewer moving
>> parts to go wrong. It's also exponentially more efficient.
>>>
>>> Being able to continue to run when a disk fails is the whole point of
>>> RAID
>>> -- despite what some folks think, RAIDs are not for backups but for
>>> uptime.
>>> And if your uptime goes to hell because the moment a disk fails you
>>> need to
>>> drop everything and replace the disk immediately, why would you use RAID?
>> Because just replacing a disk and rebuilding the array is almost always
>> much cheaper in terms of time than rebuilding the system from a backup.
>> IOW, even if you have to drop everything and replace the disk
>> immediately, it's still less time consuming than restoring from a
>> backup. It also has the advantage that you don't lose any data.
>
> We disagree on letting people run degraded, which I support, you not. I
> respect your opinion. However, I have to ask who decides these rules?
> Obviously, not me since I am a simple btrfs home user.
This is a pretty typical stance among seasoned system administrators.
It's worth pointing out that I'm not saying you shouldn't run with a
single disk for an extended period of time, I'm saying you should
_convert_ to single disk profiles until you can get a replacement, and
then convert back to raid profiles once you have the replacement. It is
exponentially safer in BTRFS to run single data single metadata than
half raid1 data half raid1 metadata. This is one of the big reasons
that I've avoided MD over the years, it's functionally impossible to do
this with MD arrays.
>
> Since Oracle is funding btrfs development, is that Oracle's official
> stand on how to handle a failed disk? Who decides of btrfs's roadmap?
> I have no clue who is who on this mailing list and who influences the
> features of btrfs.
>
> Oracle is obviously using raid systems internally. How do the operators
> of these raid systems feel about this "not let the system run in
> degraded mode"?
They replace the disks immediately, so it's irrelevant to them. Oracle
isn't the sole source of funding (I'm actually not even sure they are
anymore CLM works for Facebook now last I knew), but you have to
understand that it has been developed primarily as an _enterprise_
filesystem. This means that certain perfectly reasonable assumptions
are made about the conditions under which it will be used.
>
> As a home user, I do not want to have a disk always available. This is
> paying a disk very expensively when the raid system can run easily for
> two years without disk failure. I want to buy the new disk (asap, of
> course) once one died. At that moment, the cost of a drive would have
> fallen drastically. Yes, I can live with running my home system (which
> has backups) for a day or two, in degraded rw mode until I purchase and
> can install a new disk. Chances are low that both disks will quit at
> around the same time.
You're missing my point. I have zero issue with running with one disk
when the other fails. I have issue with not telling the FS that it
won't have another disk for a while. IOW, in that situation, I would run:
btrfs balance start -dconvert=single -mconvert=dup /whatever
To convert to profiles _designed_ for a single device and then convert
back to raid1 when I got another disk. The issue you've stumbled across
is only partial motivation for this, the bigger motivation is that
running half a 2 disk array is more risky than running a single disk by
itself.
>
> Simply because I cannot run in degraded mode and cannot add a disk to my
> current degraded raid1, despite having my replacement disk in my hands,
> I must resort to switch to mdadm or zfs.
Both MDADM and ZFS still have the issue that it is more dangerous to run
half a 2 disk RAID1 array than a single disk. That doesn't change just
because the software handles things a bit differently.
>
> Having a policy that limits user's options for the sake that they are
> too stupid to understand the implications is wrong. Its ok for
> applications, but not at the operating system; there should be a way to
> force this. A
> --yes-i-know-what-i-am-doing-now-please-mount-rw-degraded-so-i-can-install-the-new-disk
> parameter must be implemented. Currently, it is like disallowing root
> to run mkfs over an existing filesystem because people could erase data
> by mistake. Let people do what they want and let them live with the
> consequences.
A patch exists to fix the particular issue you encountered. Trust me, I
wish it had been merged like it should have been too, then we could just
tell people in this situation to upgrade their kernel instead of telling
them to rebuild it with a patch.
>
> hdparm has a --yes-i-know-what-i-am-doing flag. btrfs needs one.
>
> Whoever decides about btrfs features to add, please consider this one.
next prev parent reply other threads:[~2017-02-08 12:51 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-24 18:57 raid1: cannot add disk to replace faulty because can only mount fs as read-only Hans Deragon
2017-01-24 19:48 ` Adam Borowski
[not found] ` <W75Sc6PDCBok7W75TcCgc7@videotron.ca>
2017-01-27 16:47 ` Hans Deragon
2017-01-27 20:03 ` Austin S. Hemmelgarn
2017-01-27 20:28 ` Adam Borowski
2017-01-28 9:17 ` Andrei Borzenkov
2017-01-30 12:18 ` Austin S. Hemmelgarn
[not found] ` <YAvBcoM9EImXYYAvCcegSf@videotron.ca>
2017-02-01 2:51 ` Hans Deragon
2017-02-01 5:23 ` Duncan
2017-02-01 11:55 ` Adam Borowski
2017-02-01 22:48 ` Duncan
2017-02-02 12:49 ` Austin S. Hemmelgarn
2017-02-02 14:25 ` Adam Borowski
2017-02-02 15:06 ` Austin S. Hemmelgarn
[not found] ` <ZIyPcL4cW36fIZIyQcB9Hs@videotron.ca>
2017-02-08 3:21 ` Hans Deragon
2017-02-08 12:50 ` Austin S. Hemmelgarn [this message]
2017-02-08 13:46 ` Tomasz Torcz
2017-02-08 19:06 ` Austin S. Hemmelgarn
2017-02-03 9:35 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6e4b9b4f-d4df-5679-5645-a52dbc15b424@gmail.com \
--to=ahferroin7@gmail.com \
--cc=hans@deragon.biz \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).