* Raid 5/6 Stability @ 2015-12-23 22:52 jwalmer 2015-12-24 0:38 ` Duncan 0 siblings, 1 reply; 7+ messages in thread From: jwalmer @ 2015-12-23 22:52 UTC (permalink / raw) To: linux-btrfs Hello dev crew, Just an avid follower of the project checking in. It has been about nine months since the initial Raid 5/6 features were released in 3.19 and they are still listed as incomplete/experimental on the Wiki. Admittedly, I don't understand how such a large and distributed project prioritizes features for development, but I haven't been able to find a clear roadmap anywhere. I'm wondering if anyone here is able to give me some insight about when the Raid 5/6 feature will next be updated, or even when they are scheduled to lose their incomplete/experimental designation. Thanks! ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Raid 5/6 Stability 2015-12-23 22:52 Raid 5/6 Stability jwalmer @ 2015-12-24 0:38 ` Duncan 2015-12-24 2:38 ` Chris Murphy 2015-12-24 10:29 ` Gerald Hopf 0 siblings, 2 replies; 7+ messages in thread From: Duncan @ 2015-12-24 0:38 UTC (permalink / raw) To: linux-btrfs jwalmer posted on Wed, 23 Dec 2015 17:52:10 -0500 as excerpted: > Just an avid follower of the project checking in. It has been about nine > months since the initial Raid 5/6 features were released in 3.19 and > they are still listed as incomplete/experimental on the Wiki. > > Admittedly, I don't understand how such a large and distributed project > prioritizes features for development, but I haven't been able to find a > clear roadmap anywhere. > > I'm wondering if anyone here is able to give me some insight about when > the Raid 5/6 feature will next be updated, or even when they are > scheduled to lose their incomplete/experimental designation. Addressing the wiki side first, then the question you're probably more interested in. =:^) FWIW, the wiki gets updated... when a volunteer (which could be you =:^) updates it. It often has quite current information... somewhere on the wiki, but often not all mentions of a feature get updated at the same time, and some may lag behind. That said, while btrfs raid56 is no longer experimental, I'd not call it entirely stable, even to the point of the rest of btrfs (which is stabilizing but not fully stable or mature yet), just yet. I've personally long stated that raid56 feature stability, to the point of the rest of btrfs anyway, can be expected roughly a year after nominal feature completion, with an additional requirement of at least two kernel cycles without major bugs in the feature. At five kernel releases a year that would put it more or less at 4.4, which is soon to be released and quite good timing, as 4.4 is an LTS release, and indeed, the last major raid56 bug was fixed early in the 4.2 cycle (well before 4.2 release), so 4.4 meets the requirement in that regard as well. =:^) Now I'm just an active list regular and btrfs user, not a dev, but I began making that recommendation/prediction before 3.19's release, when it was clear 3.19 would bring nominal raid56 code completion, and in the immediately following releases as well, when people were (I thought) jumping the gun, and indeed, getting their data eaten by remaining critical bugs, and nobody has argued it otherwise in the intervening time, so I'd suggest it's a reasonably solid recommendation. So 4.4 is what I'd consider the magical raid56-stability release, and I'd actually expect the wiki to be updated shortly thereafter, tho 4.4 is close enough now, and there have been no major raid56 bugs reported in the 4.3 and 4.4 cycles, that arguably the wiki's raid56 status could be updated now to reflect that. (Personally, I'm more a newsgroups and mailing lists guy, and while I read web/wiki resources and will in fact often quote them, I tend to treat them as read-only and very seldom personally edit them, leaving that to others, who occasionally even quote my list posts more or less verbatim when they update the wiki. So again, you're invited to do so if that's your thing, but it's nothing I'm likely to do personally. And FWIW, there are a few folks that watch wiki updates and revert spam and anything crazy, so as long as the edits are honestly trying to make things better, any help you can be in editing the wiki is highly appreciated, and you don't have to worry too much about any mistakes you inadvertently make, as others will be along to fix them. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Raid 5/6 Stability 2015-12-24 0:38 ` Duncan @ 2015-12-24 2:38 ` Chris Murphy 2015-12-24 3:56 ` Duncan 2015-12-24 10:29 ` Gerald Hopf 1 sibling, 1 reply; 7+ messages in thread From: Chris Murphy @ 2015-12-24 2:38 UTC (permalink / raw) To: Btrfs BTRFS There's a worthwhile distinction between stability of raid56 vs all other profiles, and btrfs multiple device failure behavior. Right now there's no monitoring or notification of failures to user space. In fact Btrfs itself doesn't really understand device failures, a device can spit out many read or write errors and Btrfs keeps trying to read and write. So there's no equivalent to faultiness like with md/mdadm. Therefore you'll have to figure out a way to monitor kernel messages, maybe via a script that parses for btrfs messages and emails any such messages ever 10m or whatever. Chris Murphy. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Raid 5/6 Stability 2015-12-24 2:38 ` Chris Murphy @ 2015-12-24 3:56 ` Duncan 0 siblings, 0 replies; 7+ messages in thread From: Duncan @ 2015-12-24 3:56 UTC (permalink / raw) To: linux-btrfs Chris Murphy posted on Wed, 23 Dec 2015 19:38:23 -0700 as excerpted: > There's a worthwhile distinction between stability of raid56 vs all > other profiles, and btrfs multiple device failure behavior. Right now > there's no monitoring or notification of failures to user space. In > fact Btrfs itself doesn't really understand device failures, a device > can spit out many read or write errors and Btrfs keeps trying to read > and write. So there's no equivalent to faultiness like with md/mdadm. > Therefore you'll have to figure out a way to monitor kernel messages, > maybe via a script that parses for btrfs messages and emails any such > messages ever 10m or whatever. Absolutely. Raid56 mode may be stabilizing, but there's still no user- side multi-device filesystem health monitoring application, either for raid56 or in general, for the raid1/10 modes which are in fact reasonably stable and mature on btrfs and have been considered at the level of btrfs itself for quite awhile (several years), now. Thanks for that addendum, Chris. It could be quite helpful to someone just setting up a new installation, particularly on a server where the user and/or admin is unlikely to be directly observing things and thus know when things go wrong due to the observed change in behavior, regardless of formal monitoring or the lack thereof, as would likely be the case on a desktop/workstation. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Raid 5/6 Stability 2015-12-24 0:38 ` Duncan 2015-12-24 2:38 ` Chris Murphy @ 2015-12-24 10:29 ` Gerald Hopf 2015-12-24 13:56 ` jwalmer 1 sibling, 1 reply; 7+ messages in thread From: Gerald Hopf @ 2015-12-24 10:29 UTC (permalink / raw) To: Duncan, linux-btrfs Duncan wrote: > So 4.4 is what I'd consider the magical raid56-stability release, and > I'd actually expect the wiki to be updated shortly thereafter, tho 4.4 > is close enough now, and there have been no major raid56 bugs reported > in the 4.3 and 4.4 cycles, that arguably the wiki's raid56 status > could be updated now to reflect that. I don't think the wiki should be updated to show raid5/6 as production ready. The state of raid5/6 is still bad: 1) you STILL can't even properly check for free space btrfs fi usage /my/device WARNING: RAID56 detected, not implemented WARNING: RAID56 detected, not implemented WARNING: RAID56 detected, not implemented (btrfs-progs v4.3.1-31-g0ab3d31) 2) Scrub is STILL horribly slow. Basically takes forever, unusable for anything large (and who uses raid5/6 for something small?) 3) the already mentioned problem that unlike mdadm there is no email notification and no proper fault handling if problems occur And all those 3 problems are unlikely to be fixed in kernel 4.4 cycle at least as far as I was able to observe. However: I'm using btrfs-raid5 and I'm mostly HAPPY with it. But I consider my use experimental and I rsync my btrfs-raid5 contents to an external off-site backup storage bimonthly and I can live with a worst case of 2 months of data loss for what I'm storing on it. Would love to see 1+2+3 fixed though. Gerald ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Raid 5/6 Stability 2015-12-24 10:29 ` Gerald Hopf @ 2015-12-24 13:56 ` jwalmer 2015-12-25 0:48 ` Duncan 0 siblings, 1 reply; 7+ messages in thread From: jwalmer @ 2015-12-24 13:56 UTC (permalink / raw) To: Gerald Hopf; +Cc: Duncan, linux-btrfs Thanks for the speedy replies! Earlier Duncan said, "there's still no user-side multi-device filesystem health monitoring application." I'm mostly worried about device errors/failures, not my filesystem health. Since my implimentation of btrfs will be on a storage array, I'm not going to be doing anything unusual that should lend itself to creating filesystem errors. How serious of a concern should it be that the filesystem health is not easily monitored? i.e., Since this is not a RAID-level-specific-issue, should the lack of filesystem monitoring be enough to stop me from playing with btrfs deployments for now? On Thu, 24 Dec 2015 11:29:37 +0100, Gerald Hopf <gerald.hopf@nv-systems.net> wrote: > Duncan wrote: > > So 4.4 is what I'd consider the magical raid56-stability release, and > > I'd actually expect the wiki to be updated shortly thereafter, tho 4.4 > > is close enough now, and there have been no major raid56 bugs reported > > in the 4.3 and 4.4 cycles, that arguably the wiki's raid56 status > > could be updated now to reflect that. > > I don't think the wiki should be updated to show raid5/6 as production > ready. The state of raid5/6 is still bad: > > 1) you STILL can't even properly check for free space > btrfs fi usage /my/device > WARNING: RAID56 detected, not implemented > WARNING: RAID56 detected, not implemented > WARNING: RAID56 detected, not implemented > (btrfs-progs v4.3.1-31-g0ab3d31) > > 2) Scrub is STILL horribly slow. Basically takes forever, unusable for > anything large (and who uses raid5/6 for something small?) > > 3) the already mentioned problem that unlike mdadm there is no email > notification and no proper fault handling if problems occur > > And all those 3 problems are unlikely to be fixed in kernel 4.4 cycle at > least as far as I was able to observe. > > However: I'm using btrfs-raid5 and I'm mostly HAPPY with it. But I > consider my use experimental and I rsync my btrfs-raid5 contents to an > external off-site backup storage bimonthly and I can live with a worst > case of 2 months of data loss for what I'm storing on it. Would love to > see 1+2+3 fixed though. > > Gerald > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Raid 5/6 Stability 2015-12-24 13:56 ` jwalmer @ 2015-12-25 0:48 ` Duncan 0 siblings, 0 replies; 7+ messages in thread From: Duncan @ 2015-12-25 0:48 UTC (permalink / raw) To: linux-btrfs jwalmer posted on Thu, 24 Dec 2015 08:56:15 -0500 as excerpted: > Thanks for the speedy replies! Earlier Duncan said, "there's still no > user-side multi-device filesystem health monitoring application." I'm > mostly worried about device errors/failures, not my filesystem health. EUNFORESEEN_AMBIGUITY. Unfortunately, I seem to run into this error in my posts more than I'd like. =:^( The ambiguity here is that btrfs is more than a filesystem, it's a multi- device raid (which would traditionally be at the block layer, not the filesystem layer) as well. > Since my implimentation of btrfs will be on a storage array, I'm not > going to be doing anything unusual that should lend itself to creating > filesystem errors. > > How serious of a concern should it be that the filesystem health is not > easily monitored? i.e., Since this is not a RAID-level-specific-issue, > should the lack of filesystem monitoring be enough to stop me from > playing with btrfs deployments for now? What I /meant/ was the previously discussed lack of raid-level device failure notification, which is arguably filesystem health notification when that filesystem incorporates multi-device raid as well, as btrfs does, but would in traditional filesystems be nothing they'd deal with at all as they don't do raid themselves, leaving that to other layers, which means it's not filesystem health in the traditional sense, but something beyond that, because btrfs is itself untraditional in that sense. Since your concern continues to separate out the traditional filesystem health from the raid health, and I was talking about the latter while you are more concerned with the former, it wouldn't appear to be a concern in your case. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-12-25 0:48 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-12-23 22:52 Raid 5/6 Stability jwalmer 2015-12-24 0:38 ` Duncan 2015-12-24 2:38 ` Chris Murphy 2015-12-24 3:56 ` Duncan 2015-12-24 10:29 ` Gerald Hopf 2015-12-24 13:56 ` jwalmer 2015-12-25 0:48 ` Duncan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.