All of lore.kernel.org
 help / color / mirror / Atom feed
* Raid 5/6 Stability
@ 2015-12-23 22:52 jwalmer
  2015-12-24  0:38 ` Duncan
  0 siblings, 1 reply; 7+ messages in thread
From: jwalmer @ 2015-12-23 22:52 UTC (permalink / raw)
  To: linux-btrfs

Hello dev crew,

Just an avid follower of the project checking in. It has been about nine months since the initial Raid 5/6 features were released in 3.19 and they are still listed as incomplete/experimental on the Wiki.

Admittedly, I don't understand how such a large and distributed project prioritizes features for development, but I haven't been able to find a clear roadmap anywhere. 

I'm wondering if anyone here is able to give me some insight about when the Raid 5/6 feature will next be updated, or even when they are scheduled to lose their incomplete/experimental designation.

Thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Raid 5/6 Stability
  2015-12-23 22:52 Raid 5/6 Stability jwalmer
@ 2015-12-24  0:38 ` Duncan
  2015-12-24  2:38   ` Chris Murphy
  2015-12-24 10:29   ` Gerald Hopf
  0 siblings, 2 replies; 7+ messages in thread
From: Duncan @ 2015-12-24  0:38 UTC (permalink / raw)
  To: linux-btrfs

jwalmer posted on Wed, 23 Dec 2015 17:52:10 -0500 as excerpted:

> Just an avid follower of the project checking in. It has been about nine
> months since the initial Raid 5/6 features were released in 3.19 and
> they are still listed as incomplete/experimental on the Wiki.
> 
> Admittedly, I don't understand how such a large and distributed project
> prioritizes features for development, but I haven't been able to find a
> clear roadmap anywhere.
> 
> I'm wondering if anyone here is able to give me some insight about when
> the Raid 5/6 feature will next be updated, or even when they are
> scheduled to lose their incomplete/experimental designation.

Addressing the wiki side first, then the question you're probably more 
interested in. =:^)

FWIW, the wiki gets updated... when a volunteer (which could be you =:^) 
updates it.  It often has quite current information... somewhere on the 
wiki, but often not all mentions of a feature get updated at the same 
time, and some may lag behind.

That said, while btrfs raid56 is no longer experimental, I'd not call it 
entirely stable, even to the point of the rest of btrfs (which is 
stabilizing but not fully stable or mature yet), just yet.

I've personally long stated that raid56 feature stability, to the point 
of the rest of btrfs anyway, can be expected roughly a year after nominal 
feature completion, with an additional requirement of at least two kernel 
cycles without major bugs in the feature.  At five kernel releases a year 
that would put it more or less at 4.4, which is soon to be released and 
quite good timing, as 4.4 is an LTS release, and indeed, the last major 
raid56 bug was fixed early in the 4.2 cycle (well before 4.2 release), so 
4.4 meets the requirement in that regard as well. =:^)

Now I'm just an active list regular and btrfs user, not a dev, but I 
began making that recommendation/prediction before 3.19's release, when 
it was clear 3.19 would bring nominal raid56 code completion, and in the 
immediately following releases as well, when people were (I thought) 
jumping the gun, and indeed, getting their data eaten by remaining 
critical bugs, and nobody has argued it otherwise in the intervening 
time, so I'd suggest it's a reasonably solid recommendation. 

So 4.4 is what I'd consider the magical raid56-stability release, and I'd 
actually expect the wiki to be updated shortly thereafter, tho 4.4 is 
close enough now, and there have been no major raid56 bugs reported in 
the 4.3 and 4.4 cycles, that arguably the wiki's raid56 status could be 
updated now to reflect that.

(Personally, I'm more a newsgroups and mailing lists guy, and while I 
read web/wiki resources and will in fact often quote them, I tend to 
treat them as read-only and very seldom personally edit them, leaving 
that to others, who occasionally even quote my list posts more or less 
verbatim when they update the wiki.  So again, you're invited to do so if 
that's your thing, but it's nothing I'm likely to do personally.  And 
FWIW, there are a few folks that watch wiki updates and revert spam and 
anything crazy, so as long as the edits are honestly trying to make 
things better, any help you can be in editing the wiki is highly 
appreciated, and you don't have to worry too much about any mistakes you 
inadvertently make, as others will be along to fix them. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Raid 5/6 Stability
  2015-12-24  0:38 ` Duncan
@ 2015-12-24  2:38   ` Chris Murphy
  2015-12-24  3:56     ` Duncan
  2015-12-24 10:29   ` Gerald Hopf
  1 sibling, 1 reply; 7+ messages in thread
From: Chris Murphy @ 2015-12-24  2:38 UTC (permalink / raw)
  To: Btrfs BTRFS

There's a worthwhile distinction between stability of raid56 vs all
other profiles, and btrfs multiple device failure behavior. Right now
there's no monitoring or notification of failures to user space. In
fact Btrfs itself doesn't really understand device failures, a device
can spit out many read or write errors and Btrfs keeps trying to read
and write. So there's no equivalent to faultiness like with md/mdadm.
Therefore you'll have to figure out a way to monitor kernel messages,
maybe via a script that parses for btrfs messages and emails any such
messages ever 10m or whatever.

Chris Murphy.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Raid 5/6 Stability
  2015-12-24  2:38   ` Chris Murphy
@ 2015-12-24  3:56     ` Duncan
  0 siblings, 0 replies; 7+ messages in thread
From: Duncan @ 2015-12-24  3:56 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Wed, 23 Dec 2015 19:38:23 -0700 as excerpted:

> There's a worthwhile distinction between stability of raid56 vs all
> other profiles, and btrfs multiple device failure behavior. Right now
> there's no monitoring or notification of failures to user space. In
> fact Btrfs itself doesn't really understand device failures, a device
> can spit out many read or write errors and Btrfs keeps trying to read
> and write. So there's no equivalent to faultiness like with md/mdadm.
> Therefore you'll have to figure out a way to monitor kernel messages,
> maybe via a script that parses for btrfs messages and emails any such
> messages ever 10m or whatever.

Absolutely.  Raid56 mode may be stabilizing, but there's still no user-
side multi-device filesystem health monitoring application, either for 
raid56 or in general, for the raid1/10 modes which are in fact reasonably 
stable and mature on btrfs and have been considered at the level of btrfs 
itself for quite awhile (several years), now.

Thanks for that addendum, Chris.  It could be quite helpful to someone 
just setting up a new installation, particularly on a server where the 
user and/or admin is unlikely to be directly observing things and thus 
know when things go wrong due to the observed change in behavior, 
regardless of formal monitoring or the lack thereof, as would likely be 
the case on a desktop/workstation.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Raid 5/6 Stability
  2015-12-24  0:38 ` Duncan
  2015-12-24  2:38   ` Chris Murphy
@ 2015-12-24 10:29   ` Gerald Hopf
  2015-12-24 13:56     ` jwalmer
  1 sibling, 1 reply; 7+ messages in thread
From: Gerald Hopf @ 2015-12-24 10:29 UTC (permalink / raw)
  To: Duncan, linux-btrfs

Duncan wrote:
> So 4.4 is what I'd consider the magical raid56-stability release, and 
> I'd actually expect the wiki to be updated shortly thereafter, tho 4.4 
> is close enough now, and there have been no major raid56 bugs reported 
> in the 4.3 and 4.4 cycles, that arguably the wiki's raid56 status 
> could be updated now to reflect that.

I don't think the wiki should be updated to show raid5/6 as production 
ready. The state of raid5/6 is still bad:

1) you STILL can't even properly check for free space
btrfs fi usage /my/device
WARNING: RAID56 detected, not implemented
WARNING: RAID56 detected, not implemented
WARNING: RAID56 detected, not implemented
(btrfs-progs v4.3.1-31-g0ab3d31)

2) Scrub is STILL horribly slow. Basically takes forever, unusable for 
anything large (and who uses raid5/6 for something small?)

3) the already mentioned problem that unlike mdadm there is no email 
notification and no proper fault handling if problems occur

And all those 3 problems are unlikely to be fixed in kernel 4.4 cycle at 
least as far as I was able to observe.

However: I'm using btrfs-raid5 and I'm mostly HAPPY with it. But I 
consider my use experimental and I rsync my btrfs-raid5 contents to an 
external off-site backup storage bimonthly and I can live with a worst 
case of 2 months of data loss for what I'm storing on it. Would love to 
see 1+2+3 fixed though.

Gerald

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Raid 5/6 Stability
  2015-12-24 10:29   ` Gerald Hopf
@ 2015-12-24 13:56     ` jwalmer
  2015-12-25  0:48       ` Duncan
  0 siblings, 1 reply; 7+ messages in thread
From: jwalmer @ 2015-12-24 13:56 UTC (permalink / raw)
  To: Gerald Hopf; +Cc: Duncan, linux-btrfs

Thanks for the speedy replies! Earlier Duncan said, "there's still no user-side multi-device filesystem health monitoring application." I'm mostly worried about device errors/failures, not my filesystem health. Since my implimentation of btrfs will be on a storage array, I'm not going to be doing anything unusual that should lend itself to creating filesystem errors.

How serious of a concern should it be that the filesystem health is not easily monitored? i.e., Since this is not a RAID-level-specific-issue, should the lack of filesystem monitoring be enough to stop me from playing with btrfs deployments for now?

On Thu, 24 Dec 2015 11:29:37 +0100, Gerald Hopf <gerald.hopf@nv-systems.net> wrote:

> Duncan wrote:
> > So 4.4 is what I'd consider the magical raid56-stability release, and 
> > I'd actually expect the wiki to be updated shortly thereafter, tho 4.4 
> > is close enough now, and there have been no major raid56 bugs reported 
> > in the 4.3 and 4.4 cycles, that arguably the wiki's raid56 status 
> > could be updated now to reflect that.
> 
> I don't think the wiki should be updated to show raid5/6 as production 
> ready. The state of raid5/6 is still bad:
> 
> 1) you STILL can't even properly check for free space
> btrfs fi usage /my/device
> WARNING: RAID56 detected, not implemented
> WARNING: RAID56 detected, not implemented
> WARNING: RAID56 detected, not implemented
> (btrfs-progs v4.3.1-31-g0ab3d31)
> 
> 2) Scrub is STILL horribly slow. Basically takes forever, unusable for 
> anything large (and who uses raid5/6 for something small?)
> 
> 3) the already mentioned problem that unlike mdadm there is no email 
> notification and no proper fault handling if problems occur
> 
> And all those 3 problems are unlikely to be fixed in kernel 4.4 cycle at 
> least as far as I was able to observe.
> 
> However: I'm using btrfs-raid5 and I'm mostly HAPPY with it. But I 
> consider my use experimental and I rsync my btrfs-raid5 contents to an 
> external off-site backup storage bimonthly and I can live with a worst 
> case of 2 months of data loss for what I'm storing on it. Would love to 
> see 1+2+3 fixed though.
> 
> Gerald
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Raid 5/6 Stability
  2015-12-24 13:56     ` jwalmer
@ 2015-12-25  0:48       ` Duncan
  0 siblings, 0 replies; 7+ messages in thread
From: Duncan @ 2015-12-25  0:48 UTC (permalink / raw)
  To: linux-btrfs

jwalmer posted on Thu, 24 Dec 2015 08:56:15 -0500 as excerpted:

> Thanks for the speedy replies! Earlier Duncan said, "there's still no
> user-side multi-device filesystem health monitoring application." I'm
> mostly worried about device errors/failures, not my filesystem health.

EUNFORESEEN_AMBIGUITY.  Unfortunately, I seem to run into this error in 
my posts more than I'd like. =:^(

The ambiguity here is that btrfs is more than a filesystem, it's a multi-
device raid (which would traditionally be at the block layer, not the 
filesystem layer) as well.

> Since my implimentation of btrfs will be on a storage array, I'm not
> going to be doing anything unusual that should lend itself to creating
> filesystem errors.
> 
> How serious of a concern should it be that the filesystem health is not
> easily monitored? i.e., Since this is not a RAID-level-specific-issue,
> should the lack of filesystem monitoring be enough to stop me from
> playing with btrfs deployments for now?

What I /meant/ was the previously discussed lack of raid-level device 
failure notification, which is arguably filesystem health notification 
when that filesystem incorporates multi-device raid as well, as btrfs 
does, but would in traditional filesystems be nothing they'd deal with at 
all as they don't do raid themselves, leaving that to other layers, which 
means it's not filesystem health in the traditional sense, but something 
beyond that, because btrfs is itself untraditional in that sense.

Since your concern continues to separate out the traditional filesystem 
health from the raid health, and I was talking about the latter while you 
are more concerned with the former, it wouldn't appear to be a concern in 
your case. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-12-25  0:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-23 22:52 Raid 5/6 Stability jwalmer
2015-12-24  0:38 ` Duncan
2015-12-24  2:38   ` Chris Murphy
2015-12-24  3:56     ` Duncan
2015-12-24 10:29   ` Gerald Hopf
2015-12-24 13:56     ` jwalmer
2015-12-25  0:48       ` Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.