From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cn.fujitsu.com ([59.151.112.132]:30802 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1752415AbbLNHcY (ORCPT ); Mon, 14 Dec 2015 02:32:24 -0500 Subject: Re: Still not production ready To: Duncan <1i5t5.duncan@cox.net>, References: <8336788.myI8ELqtIK@merkaba> <566E2490.8080905@cn.fujitsu.com> From: Qu Wenruo Message-ID: <566E7072.8020108@cn.fujitsu.com> Date: Mon, 14 Dec 2015 15:32:02 +0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: Duncan wrote on 2015/12/14 06:21 +0000: > Qu Wenruo posted on Mon, 14 Dec 2015 10:08:16 +0800 as excerpted: > >> Martin Steigerwald wrote on 2015/12/13 23:35 +0100: >>> Hi! >>> >>> For me it is still not production ready. >> >> Yes, this is the *FACT* and not everyone has a good reason to deny it. > > In the above sentence, I /think/ you (Qu) agree with Martin (and I) that > btrfs shouldn't be considered production ready... yet, and the first part > of the sentence makes it very clear that you feel strongly about the > *FACT*, but the second half of the sentence (after *FACT*) doesn't parse > well in English, thus leaving the entire sentence open to interpretation, > tho it's obvious either way that you feel strongly about it. =:^\ Oh, my poor English... :( The latter half is just in case someone consider btrfs is stable in some respects. > > At the risk of getting it completely wrong, what I /think/ you meant to > say is (as expanded in typically Duncan fashion =:^)... > > Yes, this is the *FACT*, though some people have reasons to deny it. Right! That's what I want to say!! > > Presumably, said reasons would include the fact that various distros are > trying to sell enterprise support contracts to customers very eager to > have the features that btrfs provides, and said customers are willing to > pay for assurances that the solutions they're buying are "production > ready", whether that's actually the case or not, presumably because said > payment is (in practice) simply ensuring there's someone else to pin the > blame on if things go bad. > > And the demonstration of that would be the continued fact that people > otherwise unnecessarily continue to pay rather large sums of money for > that very assurance, when in practice, they'd get equal or better support > not worrying about that payment, but instead actually making use of free- > of-cost resources such as this list. > > > [Linguistic analysis, see frequent discussion of this topic at Language > Log, which I happen to subscribe to as I find this sort of thing > interesting, for more commentary and examples of the same general issue: > http://languagelog.net ] > > The problem with the sentence as originally written, is that English > doesn't deal well with multi-negation, sometimes considering each > negation an inversion of the previous (as do most programming languages > and thus programmers), while other times or as read/heard/interpreted by > others repeated negation may be considered a strengthening of the > original negation. > > Regardless, mis-negation due to speaker/writer confusion is quite common > even among native English speakers/writers. > > The negating words in question here are "not" and "deny". If you will > note, my rewrite kept "deny", but rewrote the "not" out of the sentence, > so there's only one negative to worry about, making the meaning much > clearer as the reader's mind isn't left trying to figure out what the > speaker meant with the double-negative (mistake? deliberate canceling out > of the first negative with the second? deliberate intensifier?) and thus > unable to be sure one way or the other what was meant. > > And just in case there would have been doubt, the explanation then makes > doubly obvious what I think your intent was by expanding on it. Of > course that's easy to do as I entirely agree. > > OTOH if I'm mistaken as to your intent and you meant it the other way... > well then you'll need to do the explaining as then the implication is > that some people have good reasons to deny it and you agree with them, > but without further expansion, I wouldn't know where you're trying to go > with that claim. > > > Just in case there's any doubt left of my own opinion on the original > claim of not production ready in the above discussion, let me be > explicit: I (too) agree with Martin (and I think with Qu) that btrfs > isn't yet production ready. But I don't believe you'll find many on the > list taking issue with that, as I think everybody on-list agrees, btrfs > /isn't/ production ready. Certainly pretty much just that has been > repeatedly stated in individualized style by many posters including > myself, and I've yet to see anyone take serious issue with it. > >>> No matter whether SLES 12 uses it as default for root, no matter >>> whether Fujitsu and Facebook use it: I will not let this onto any >>> customer machine without lots and lots of underprovisioning and >>> rigorous free space monitoring. >>> Actually I will renew my recommendations in my trainings to be careful >>> with BTRFS. > > ... And were I to put money on it, my money would be on every regular on- > list poster 100% agreeing with that. =:^) > >>> >>> From my experience the monitoring would check for: >>> >>> merkaba:~> btrfs fi show /home >>> Label: 'home' uuid: […] >>> Total devices 2 FS bytes used 156.31GiB >>> devid 1 size 170.00GiB used 164.13GiB path /dev/[path1] >>> devid 2 size 170.00GiB used 164.13GiB path /dev/[path2] >>> >>> If "used" is same as "size" then make big fat alarm. It is not >>> sufficient for it to happen. It can run for quite some time just fine >>> without any issues, but I never have seen a kworker thread using 100% >>> of one core for extended period of time blocking everything else on the >>> fs without this condition being met. > > Astutely observed. =:^) > > >> And specially advice on the device size from myself: >> Don't use devices over 100G but less than 500G. >> Over 100G will leads btrfs to use big chunks, where data chunks can be >> at most 10G and metadata to be 1G. > > Thanks, Qu. This is the first time I've seen such specifics both in > terms of the big-chunks trigger (minimum 100 GiB effective usable > filesystem size) and in terms of how big those big chunks are (10 GiB > data, 1 GiB metadata). > > Filed away for further reference. =:^) > >> I have seen a lot of users with about 100~200G device, and hit >> unbalanced chunk allocation (10G data chunk easily takes the last >> available space and makes later metadata no where to store) > > That does indeed seem to be a reoccurring theme. Now I know why, and > where the big-chunks trigger is. =:^) > > And to add, while the kernel now does empty-chunk reaping, returning them > to the unallocated pool, the chances of a 10 GiB chunk being mostly empty > but still having at least one small extent still locking it in place as > not entirely empty, and thus not reapable, are obviously going to be at > least an order of magnitude higher (and in practice likely more, due to a > likely unlinearly greater share of files being under 10 GiB size than > under 1 GiB size) than the chances at the 1 GiB chunk size. > >> And unfortunately, your fs is already in the dangerous zone. >> (And you are using RAID1, which means it's the same as one 170G btrfs >> with SINGLE data/meta) > > That raid1 parenthetical is why I chose the "effective usable filesystem > size" wording above, to try to word it broadly enough to include all the > different replication/parity variants. > >>> Reported in another thread here that got completely ignored >>> so far. I think I could go back to 4.2 kernel to make this work. >> >> Unfortunately, this happens a lot of times, even you posted it to mail >> list. >> Devs here are always busy locating bugs or adding new features or >> enhancing current behavior. >> >> So *PLEASE* be patient about such slow response. > > Yes indeed. > > Generally speaking, one post/thread alone isn't likely to get the eye of > a dev unless they happen to be between bug-hunting projects at that > moment. But several posts/threads, particularly over a couple kernel > cycles or from multiple posters, a trend makes, and then it's much more > likely to catch attention. > >> BTW, you may not want to revert to 4.2 until some bug fix is backported >> to 4.2. >> As qgroup rework in 4.2 has broken delayed ref and caused some scrub >> bugs. (My fault) > > Good point. (Tho I never happened to trigger those scrub bugs here, but > I strongly suspect that's because I both use quite small filesystems, > well under that 100 GiB effective size barrier mentioned above, and > relatively fast ssds, so my scrubs are done in under a minute and don't > tend to be subject to the same sort of IO bottlenecking and races that > scrubs on spinning rust at 100 GiB plus filesystem sizes tend to be.) > >>> I think it got somewhat better. It took much longer to come into that >>> state again than last time, but still, blocking like this is *no* >>> option for a *production ready* filesystem. > > Agreed on both counts. The problem should be markedly better since the > empty-chunk-reaping went into (IIRC) 3.17, to the point that we're only > now beginning to see reports of it being triggered again, while > previously people were seeing it repeatedly, often monthly or more > frequently. > > But it's still not hitting the expectations for a production-ready > filesystem, but then again, I've yet to see a list regular actually make > anything like a claim that btrfs is in fact production ready; rather the > opposite, in fact, and repeatedly. > > What distros might be claiming is another matter, but arguably, people > relying on their claims should be following up by demanding support from > the distros making them, based on the claims they made. Meanwhile, on > this list we're /not/ making those claims and thus cannot reasonably be > held to them as if we were. > >>> I am seriously consider to switch to XFS for my production laptop >>> again. Cause I never saw any of these free space issues with any of the >>> XFS or Ext4 filesystems I used in the last 10 years. >> >> Yes, xfs and ext4 is very stable for normal use case. >> >> But at least, I won't recommend xfs yet, and considering the nature or >> journal based fs, I'll recommend backup power supply in crash recovery >> for both of them. >> >> Xfs already messed up several test environment of mine, and an >> unfortunate double power loss has destroyed my whole /home ext4 >> partition years ago. >> >> [xfs story] >> After several crash, xfs makes several corrupted file just to 0 size. >> Including my kernel .git directory. Then I won't trust it any longer. >> No to mention that grub2 support for xfs v5 is not here yet. >> >> [ext4 story] >> For ext4, when recovering my /home partition after a power loss, a new >> power loss happened, and my home partition is doomed. >> Only several non-sense files are savaged. > > As they say YMMV, but FWIW, despite the stories from the pre-data=ordered- > by-default era, and with the acknowledgment that a single anecdote or > even a small but unrandomized sampling of anecdotes doesn't a scientific > study make, Yes, that's right, all what I had is just some unfortunately sample. But for people, that will bring a bad impression though. Thanks, Qu > I've actually had surprisingly good luck with reiserfs here, > even on hardware that I had little reason to expect a filesystem to > actually work reliably on (bad memory incidents, overheated and head- > crashed drive incident where after cooldown I took the mounted at the > time partitions out of use and successfully and reliably continued to use > other partitions on the drive, old and burst capacitor and thus power- > unstable mobo incident,... etc, tho not all at once, fortunately!). > > ATM I use btrfs on my SSDs but continue to use reiserfs on my spinning > rust, and FWIW, reiserfs has continued to be as reliable as I'd expect a > deeply mature and stable filesystem to be, while btrfs... has been as > occasionally but arguably dependably buggy as I'd expect a still under > heavy development tho past "experimental", still stabilizing and not yet > mature filesystem to be. > > > Tho pre-ordered-by-default era, I remember a few of those 0-size- > truncated files on reiserfs, too. But the ordered-by-default > introduction was long in the past even when the 3.0 kernel was new, so is > pretty well pre-history, by now (which I guess qualifies me as a Linux > old fogey by now, even if I didn't really get into it to speak of until > the turn of the century or so, after MS gave me the push by very > specifically and deliberately shipping malware in eXPrivacy, thus > crossing a line I was never to cross with them). >