* Raid 5/6 Stability
@ 2015-12-23 22:52 jwalmer
2015-12-24 0:38 ` Duncan
0 siblings, 1 reply; 7+ messages in thread
From: jwalmer @ 2015-12-23 22:52 UTC (permalink / raw)
To: linux-btrfs
Hello dev crew,
Just an avid follower of the project checking in. It has been about nine months since the initial Raid 5/6 features were released in 3.19 and they are still listed as incomplete/experimental on the Wiki.
Admittedly, I don't understand how such a large and distributed project prioritizes features for development, but I haven't been able to find a clear roadmap anywhere.
I'm wondering if anyone here is able to give me some insight about when the Raid 5/6 feature will next be updated, or even when they are scheduled to lose their incomplete/experimental designation.
Thanks!
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Raid 5/6 Stability
2015-12-23 22:52 Raid 5/6 Stability jwalmer
@ 2015-12-24 0:38 ` Duncan
2015-12-24 2:38 ` Chris Murphy
2015-12-24 10:29 ` Gerald Hopf
0 siblings, 2 replies; 7+ messages in thread
From: Duncan @ 2015-12-24 0:38 UTC (permalink / raw)
To: linux-btrfs
jwalmer posted on Wed, 23 Dec 2015 17:52:10 -0500 as excerpted:
> Just an avid follower of the project checking in. It has been about nine
> months since the initial Raid 5/6 features were released in 3.19 and
> they are still listed as incomplete/experimental on the Wiki.
>
> Admittedly, I don't understand how such a large and distributed project
> prioritizes features for development, but I haven't been able to find a
> clear roadmap anywhere.
>
> I'm wondering if anyone here is able to give me some insight about when
> the Raid 5/6 feature will next be updated, or even when they are
> scheduled to lose their incomplete/experimental designation.
Addressing the wiki side first, then the question you're probably more
interested in. =:^)
FWIW, the wiki gets updated... when a volunteer (which could be you =:^)
updates it. It often has quite current information... somewhere on the
wiki, but often not all mentions of a feature get updated at the same
time, and some may lag behind.
That said, while btrfs raid56 is no longer experimental, I'd not call it
entirely stable, even to the point of the rest of btrfs (which is
stabilizing but not fully stable or mature yet), just yet.
I've personally long stated that raid56 feature stability, to the point
of the rest of btrfs anyway, can be expected roughly a year after nominal
feature completion, with an additional requirement of at least two kernel
cycles without major bugs in the feature. At five kernel releases a year
that would put it more or less at 4.4, which is soon to be released and
quite good timing, as 4.4 is an LTS release, and indeed, the last major
raid56 bug was fixed early in the 4.2 cycle (well before 4.2 release), so
4.4 meets the requirement in that regard as well. =:^)
Now I'm just an active list regular and btrfs user, not a dev, but I
began making that recommendation/prediction before 3.19's release, when
it was clear 3.19 would bring nominal raid56 code completion, and in the
immediately following releases as well, when people were (I thought)
jumping the gun, and indeed, getting their data eaten by remaining
critical bugs, and nobody has argued it otherwise in the intervening
time, so I'd suggest it's a reasonably solid recommendation.
So 4.4 is what I'd consider the magical raid56-stability release, and I'd
actually expect the wiki to be updated shortly thereafter, tho 4.4 is
close enough now, and there have been no major raid56 bugs reported in
the 4.3 and 4.4 cycles, that arguably the wiki's raid56 status could be
updated now to reflect that.
(Personally, I'm more a newsgroups and mailing lists guy, and while I
read web/wiki resources and will in fact often quote them, I tend to
treat them as read-only and very seldom personally edit them, leaving
that to others, who occasionally even quote my list posts more or less
verbatim when they update the wiki. So again, you're invited to do so if
that's your thing, but it's nothing I'm likely to do personally. And
FWIW, there are a few folks that watch wiki updates and revert spam and
anything crazy, so as long as the edits are honestly trying to make
things better, any help you can be in editing the wiki is highly
appreciated, and you don't have to worry too much about any mistakes you
inadvertently make, as others will be along to fix them. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Raid 5/6 Stability
2015-12-24 0:38 ` Duncan
@ 2015-12-24 2:38 ` Chris Murphy
2015-12-24 3:56 ` Duncan
2015-12-24 10:29 ` Gerald Hopf
1 sibling, 1 reply; 7+ messages in thread
From: Chris Murphy @ 2015-12-24 2:38 UTC (permalink / raw)
To: Btrfs BTRFS
There's a worthwhile distinction between stability of raid56 vs all
other profiles, and btrfs multiple device failure behavior. Right now
there's no monitoring or notification of failures to user space. In
fact Btrfs itself doesn't really understand device failures, a device
can spit out many read or write errors and Btrfs keeps trying to read
and write. So there's no equivalent to faultiness like with md/mdadm.
Therefore you'll have to figure out a way to monitor kernel messages,
maybe via a script that parses for btrfs messages and emails any such
messages ever 10m or whatever.
Chris Murphy.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Raid 5/6 Stability
2015-12-24 2:38 ` Chris Murphy
@ 2015-12-24 3:56 ` Duncan
0 siblings, 0 replies; 7+ messages in thread
From: Duncan @ 2015-12-24 3:56 UTC (permalink / raw)
To: linux-btrfs
Chris Murphy posted on Wed, 23 Dec 2015 19:38:23 -0700 as excerpted:
> There's a worthwhile distinction between stability of raid56 vs all
> other profiles, and btrfs multiple device failure behavior. Right now
> there's no monitoring or notification of failures to user space. In
> fact Btrfs itself doesn't really understand device failures, a device
> can spit out many read or write errors and Btrfs keeps trying to read
> and write. So there's no equivalent to faultiness like with md/mdadm.
> Therefore you'll have to figure out a way to monitor kernel messages,
> maybe via a script that parses for btrfs messages and emails any such
> messages ever 10m or whatever.
Absolutely. Raid56 mode may be stabilizing, but there's still no user-
side multi-device filesystem health monitoring application, either for
raid56 or in general, for the raid1/10 modes which are in fact reasonably
stable and mature on btrfs and have been considered at the level of btrfs
itself for quite awhile (several years), now.
Thanks for that addendum, Chris. It could be quite helpful to someone
just setting up a new installation, particularly on a server where the
user and/or admin is unlikely to be directly observing things and thus
know when things go wrong due to the observed change in behavior,
regardless of formal monitoring or the lack thereof, as would likely be
the case on a desktop/workstation.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Raid 5/6 Stability
2015-12-24 0:38 ` Duncan
2015-12-24 2:38 ` Chris Murphy
@ 2015-12-24 10:29 ` Gerald Hopf
2015-12-24 13:56 ` jwalmer
1 sibling, 1 reply; 7+ messages in thread
From: Gerald Hopf @ 2015-12-24 10:29 UTC (permalink / raw)
To: Duncan, linux-btrfs
Duncan wrote:
> So 4.4 is what I'd consider the magical raid56-stability release, and
> I'd actually expect the wiki to be updated shortly thereafter, tho 4.4
> is close enough now, and there have been no major raid56 bugs reported
> in the 4.3 and 4.4 cycles, that arguably the wiki's raid56 status
> could be updated now to reflect that.
I don't think the wiki should be updated to show raid5/6 as production
ready. The state of raid5/6 is still bad:
1) you STILL can't even properly check for free space
btrfs fi usage /my/device
WARNING: RAID56 detected, not implemented
WARNING: RAID56 detected, not implemented
WARNING: RAID56 detected, not implemented
(btrfs-progs v4.3.1-31-g0ab3d31)
2) Scrub is STILL horribly slow. Basically takes forever, unusable for
anything large (and who uses raid5/6 for something small?)
3) the already mentioned problem that unlike mdadm there is no email
notification and no proper fault handling if problems occur
And all those 3 problems are unlikely to be fixed in kernel 4.4 cycle at
least as far as I was able to observe.
However: I'm using btrfs-raid5 and I'm mostly HAPPY with it. But I
consider my use experimental and I rsync my btrfs-raid5 contents to an
external off-site backup storage bimonthly and I can live with a worst
case of 2 months of data loss for what I'm storing on it. Would love to
see 1+2+3 fixed though.
Gerald
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Raid 5/6 Stability
2015-12-24 10:29 ` Gerald Hopf
@ 2015-12-24 13:56 ` jwalmer
2015-12-25 0:48 ` Duncan
0 siblings, 1 reply; 7+ messages in thread
From: jwalmer @ 2015-12-24 13:56 UTC (permalink / raw)
To: Gerald Hopf; +Cc: Duncan, linux-btrfs
Thanks for the speedy replies! Earlier Duncan said, "there's still no user-side multi-device filesystem health monitoring application." I'm mostly worried about device errors/failures, not my filesystem health. Since my implimentation of btrfs will be on a storage array, I'm not going to be doing anything unusual that should lend itself to creating filesystem errors.
How serious of a concern should it be that the filesystem health is not easily monitored? i.e., Since this is not a RAID-level-specific-issue, should the lack of filesystem monitoring be enough to stop me from playing with btrfs deployments for now?
On Thu, 24 Dec 2015 11:29:37 +0100, Gerald Hopf <gerald.hopf@nv-systems.net> wrote:
> Duncan wrote:
> > So 4.4 is what I'd consider the magical raid56-stability release, and
> > I'd actually expect the wiki to be updated shortly thereafter, tho 4.4
> > is close enough now, and there have been no major raid56 bugs reported
> > in the 4.3 and 4.4 cycles, that arguably the wiki's raid56 status
> > could be updated now to reflect that.
>
> I don't think the wiki should be updated to show raid5/6 as production
> ready. The state of raid5/6 is still bad:
>
> 1) you STILL can't even properly check for free space
> btrfs fi usage /my/device
> WARNING: RAID56 detected, not implemented
> WARNING: RAID56 detected, not implemented
> WARNING: RAID56 detected, not implemented
> (btrfs-progs v4.3.1-31-g0ab3d31)
>
> 2) Scrub is STILL horribly slow. Basically takes forever, unusable for
> anything large (and who uses raid5/6 for something small?)
>
> 3) the already mentioned problem that unlike mdadm there is no email
> notification and no proper fault handling if problems occur
>
> And all those 3 problems are unlikely to be fixed in kernel 4.4 cycle at
> least as far as I was able to observe.
>
> However: I'm using btrfs-raid5 and I'm mostly HAPPY with it. But I
> consider my use experimental and I rsync my btrfs-raid5 contents to an
> external off-site backup storage bimonthly and I can live with a worst
> case of 2 months of data loss for what I'm storing on it. Would love to
> see 1+2+3 fixed though.
>
> Gerald
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Raid 5/6 Stability
2015-12-24 13:56 ` jwalmer
@ 2015-12-25 0:48 ` Duncan
0 siblings, 0 replies; 7+ messages in thread
From: Duncan @ 2015-12-25 0:48 UTC (permalink / raw)
To: linux-btrfs
jwalmer posted on Thu, 24 Dec 2015 08:56:15 -0500 as excerpted:
> Thanks for the speedy replies! Earlier Duncan said, "there's still no
> user-side multi-device filesystem health monitoring application." I'm
> mostly worried about device errors/failures, not my filesystem health.
EUNFORESEEN_AMBIGUITY. Unfortunately, I seem to run into this error in
my posts more than I'd like. =:^(
The ambiguity here is that btrfs is more than a filesystem, it's a multi-
device raid (which would traditionally be at the block layer, not the
filesystem layer) as well.
> Since my implimentation of btrfs will be on a storage array, I'm not
> going to be doing anything unusual that should lend itself to creating
> filesystem errors.
>
> How serious of a concern should it be that the filesystem health is not
> easily monitored? i.e., Since this is not a RAID-level-specific-issue,
> should the lack of filesystem monitoring be enough to stop me from
> playing with btrfs deployments for now?
What I /meant/ was the previously discussed lack of raid-level device
failure notification, which is arguably filesystem health notification
when that filesystem incorporates multi-device raid as well, as btrfs
does, but would in traditional filesystems be nothing they'd deal with at
all as they don't do raid themselves, leaving that to other layers, which
means it's not filesystem health in the traditional sense, but something
beyond that, because btrfs is itself untraditional in that sense.
Since your concern continues to separate out the traditional filesystem
health from the raid health, and I was talking about the latter while you
are more concerned with the former, it wouldn't appear to be a concern in
your case. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-12-25 0:48 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-23 22:52 Raid 5/6 Stability jwalmer
2015-12-24 0:38 ` Duncan
2015-12-24 2:38 ` Chris Murphy
2015-12-24 3:56 ` Duncan
2015-12-24 10:29 ` Gerald Hopf
2015-12-24 13:56 ` jwalmer
2015-12-25 0:48 ` Duncan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.