Re: Is stability a joke? - Austin S. Hemmelgarn

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Hugo Mills <hugo@carfax.org.uk>, Waxhead <waxhead@online.no>,
	Martin Steigerwald <martin@lichtvoll.de>,
	linux-btrfs@vger.kernel.org
Subject: Re: Is stability a joke?
Date: Mon, 12 Sep 2016 08:20:20 -0400	[thread overview]
Message-ID: <be04c51d-c35d-39fe-c5f7-a7ab13d72cc5@gmail.com> (raw)
In-Reply-To: <20160911130221.GE7138@carfax.org.uk>

On 2016-09-11 09:02, Hugo Mills wrote:
> On Sun, Sep 11, 2016 at 02:39:14PM +0200, Waxhead wrote:
>> Martin Steigerwald wrote:
>>> Am Sonntag, 11. September 2016, 13:43:59 CEST schrieb Martin Steigerwald:
>>>>>> Thing is: This just seems to be when has a feature been implemented
>>>>>> matrix.
>>>>>> Not when it is considered to be stable. I think this could be done with
>>>>>> colors or so. Like red for not supported, yellow for implemented and
>>>>>> green for production ready.
>>>>> Exactly, just like the Nouveau matrix. It clearly shows what you can
>>>>> expect from it.
>>> I mentioned this matrix as a good *starting* point. And I think it would be
>>> easy to extent it:
>>>
>>> Just add another column called "Production ready". Then research / ask about
>>> production stability of each feature. The only challenge is: Who is
>>> authoritative on that? I´d certainly ask the developer of a feature, but I´d
>>> also consider user reports to some extent.
>>>
>>> Maybe thats the real challenge.
>>>
>>> If you wish, I´d go through each feature there and give my own estimation. But
>>> I think there are others who are deeper into this.
>> That is exactly the same reason I don't edit the wiki myself. I
>> could of course get it started and hopefully someone will correct
>> what I write, but I feel that if I start this off I don't have deep
>> enough knowledge to do a proper start. Perhaps I will change my mind
>> about this.
>
>    Given that nobody else has done it yet, what are the odds that
> someone else will step up to do it now? I would say that you should at
> least try. Yes, you don't have as much knowledge as some others, but
> if you keep working at it, you'll gain that knowledge. Yes, you'll
> probably get it wrong to start with, but you probably won't get it
> *very* wrong. You'll probably get it horribly wrong at some point, but
> even the more knowledgable people you're deferring to didn't identify
> the problems with parity RAID until Zygo and Austin and Chris (and
> others) put in the work to pin down the exact issues.
FWIW, here's a list of what I personally consider stable (as in, I'm 
willing to bet against reduced uptime to use this stuff on production 
systems at work and personal systems at home):
1. Single device mode, including DUP data profiles on single device 
without mixed-bg.
2. Multi-device raid0, raid1, and raid10 profiles with symmetrical 
devices (all devices are the same size).
3. Multi-device single profiles with asymmetrical devices.
4. Small numbers (max double digit) of snapshots, taken at infrequent 
intervals (no more than once an hour).  I use single snapshots regularly 
to get stable images of the filesystem for backups, and I keep hourly 
ones of my home directory for about 48 hours.
5. Subvolumes used to isolate parts of a filesystem from snapshots.  I 
use this regularly to isolate areas of my filesystems from backups.
6. Non-incremental send/receive (no clone source, no parent's, no 
deduplication).  I use this regularly for cloning virtual machines.
7. Checksumming and scrubs using any of the profiles I've listed above.
8. Defragmentation, including autodefrag.
9. All of the compat_features, including no-holes and skinny-metadata.

Things I consider stable enough that I'm willing to use them on my 
personal systems but not systems at work:
1. In-line data compression with compress=lzo.  I use this on my laptop 
and home server system.  I've never had any issues with it myself, but I 
know that other people have, and it does seem to make other things more 
likely to have issues.
2. Batch deduplication.  I only use this on the back-end filesystems for 
my personal storage cluster, and only because I have multiple copies as 
a result of GlusterFS on top of BTRFS.  I've not had any significant 
issues with it, and I don't remember any reports of data loss resulting 
from it, but it's something that people should not be using if they 
don't understand all the implications.

Things that I don't consider stable but some people do:
1. Quotas and qgroups.  Some people (such as SUSE) consider these to be 
stable.  There are a couple of known issues with them still however 
(such as returning the wrong errno when a quota is hit (should be 
returning -EDQUOT, instead returns -ENOSPC)).
2. RAID5/6.  There are a few people who use this, but it's generally 
agreed to be unstable.  There are still at least 3 known bugs which can 
cause complete loss of a filesystem, and there's also a known issue with 
rebuilds taking insanely long, which puts data at risk as well.
3. Multi device filesystems with asymmetrical devices running raid0, 
raid1, or raid10.  The issue I have here is that it's much easier to hit 
errors regarding free space than a reliable system should be.  It's 
possible to avoid with careful planning (for example, a 3 disk raid1 
profile with 1 disk exactly twice the size of the other two will work 
fine, albeit with more load on the larger disk).

There's probably some stuff I've missed, but that should cover most of 
the widely known features.  The problem ends up becoming that what's 
'stable' depends a lot on what you consider stable.  SUSE obviously 
considers qgroups stable (they're enabled by default in all current SUSE 
distributions), but I wouldn't be willing to use them, and I'd be 
willing to bet most of the developers wouldn't either.

As far as what I consider stable, I've been using just about everything 
I listed above in the first two lists for the past year or so with no 
issues that were due to BTRFS itself (I've had some hardware issues, but 
BTRFS actually saved my data in those cases).  I'm also not a typical 
user though, both in terms of use cases (I use LVM for storing VM images 
and then set ACL's on the device nodes so I can use them as a regular 
user, and I do regular maintenance on all the databases on my systems), 
and relative knowledge of the filesystem (I've fixed BTRFS filesystems 
by hand with a hex editor before, not something I ever want to do again, 
but I know I can do it if I need to), and both of those impact my 
confidence in using some features.
>
>    So I'd strongly encourage you to set up and maintain the stability
> matrix yourself -- you have the motivation at least, and the knowledge
> will come with time and effort. Just keep reading the mailing list and
> IRC and bugzilla, and try to identify where you see lots of repeated
> problems, and where bugfixes in those areas happen.
Exactly this.  Most people don't start working on something for the 
first time with huge amounts of preexisting knowledge about it.  Heaven 
knows I didn't both when I first started using Linux, and when I started 
using BTRFS.  One of the big advantages of open source in this respect 
though is that you generally can find people willing to help you without 
much effort, and there's generally relatively good support.

As far as documentation though, we [BTRFS] really do need to get our act 
together.  It really doesn't look good to have most of the best 
documentation be in the distro's wikis instead of ours.  I'm not trying 
to say the distros shouldn't be documenting BTRFS, but the point at 
which Debian (for example) has better documentation of the upstream 
version of BTRFS than the upstream project itself does, that starts to 
look bad.
>
>    So, go for it. You have a lot to offer the community.
>
>    Hugo.
>
>>> I do think for example that scrubbing and auto raid repair are stable, except
>>> for RAID 5/6. Also device statistics and RAID 0 and 1 I consider to be stable.
>>> I think RAID 10 is also stable, but as I do not run it, I don´t know. For me
>>> also skinny-metadata is stable. For me so far even compress=lzo seems to be
>>> stable, but well for others it may not.
>>>
>>> Since what kernel version? Now, there you go. I have no idea. All I know I
>>> started BTRFS with Kernel 2.6.38 or 2.6.39 on my laptop, but not as RAID 1 at
>>> that time.
>>>
>>> See, the implementation time of a feature is much easier to assess. Maybe
>>> thats part of the reason why there is not stability matrix: Maybe no one
>>> *exactly* knows *for sure*. How could you? So I would even put a footnote on
>>> that "production ready" row explaining "Considered to be stable by developer
>>> and user oppinions".
>>>
>>> Of course additionally it would be good to read about experiences of corporate
>>> usage of BTRFS. I know at least Fujitsu, SUSE, Facebook, Oracle are using it.
>>> But I don´t know in what configurations and with what experiences. One Oracle
>>> developer invests a lot of time to bring BTRFS like features to XFS and RedHat
>>> still favors XFS over BTRFS, even SLES defaults to XFS for /home and other non
>>> /-filesystems. That also tells a story.
>>>
>>> Some ideas you can get from SUSE releasenotes. Even if you do not want to use
>>> it, it tells something and I bet is one of the better sources of information
>>> regarding your question you can get at this time. Cause I believe SUSE
>>> developers invested some time to assess the stability of features. Cause they
>>> would carefully assess what they can support in enterprise environments. There
>>> is also someone from Fujitsu who shared experiences in a talk, I can search
>>> the URL to the slides again.
>> By all means, SUSE's wiki is very valuable. I just said that I
>> *prefer* to have that stuff on the BTRFS wiki and feel that is the
>> right place for it.
>>>
>>> I bet Chris Mason and other BTRFS developers at Facebook have some idea on
>>> what they use within Facebook as well. To what extent they are allowed to talk
>>> about it… I don´t know. My personal impression is that as soon as Chris went
>>> to Facebook he became quite quiet. Maybe just due to being busy. Maybe due to
>>> Facebook being concerned much more about the privacy of itself than of its
>>> users.
>>>
>>> Thanks,
>>
>

next prev parent reply	other threads:[~2016-09-12 12:20 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-11  8:55 Is stability a joke? Waxhead
2016-09-11  9:56 ` Steven Haigh
2016-09-11 10:23 ` Martin Steigerwald
2016-09-11 11:21   ` Zoiled
2016-09-11 11:43     ` Martin Steigerwald
2016-09-11 12:05       ` Martin Steigerwald
2016-09-11 12:39         ` Waxhead
2016-09-11 13:02           ` Hugo Mills
2016-09-11 14:59             ` Martin Steigerwald
2016-09-11 20:14             ` Chris Murphy
2016-09-12 12:20             ` Austin S. Hemmelgarn [this message]
2016-09-12 12:59               ` Michel Bouissou
2016-09-12 13:14                 ` Austin S. Hemmelgarn
2016-09-12 14:04                 ` Lionel Bouton
2016-09-15  1:05               ` Nicholas D Steeves
2016-09-15  8:02                 ` Martin Steigerwald
2016-09-16  7:13                 ` Helmut Eller
2016-09-15  5:55               ` Kai Krakow
2016-09-15  8:05                 ` Martin Steigerwald
2016-09-11 14:54           ` Martin Steigerwald
2016-09-11 15:19             ` Martin Steigerwald
2016-09-11 20:21             ` Chris Murphy
2016-09-11 17:46           ` Marc MERLIN
2016-09-20 16:33             ` Chris Murphy
2016-09-11 17:11         ` Duncan
2016-09-12 12:26           ` Austin S. Hemmelgarn
2016-09-11 12:30       ` Waxhead
2016-09-11 14:36         ` Martin Steigerwald
2016-09-12 12:48   ` Swâmi Petaramesh
2016-09-12 13:53 ` Chris Mason
2016-09-12 17:36   ` Zoiled
2016-09-12 17:44     ` Waxhead
2016-09-15  1:12     ` Nicholas D Steeves
2016-09-12 14:27 ` David Sterba
2016-09-12 14:54   ` Austin S. Hemmelgarn
2016-09-12 16:51     ` David Sterba
2016-09-12 17:31       ` Austin S. Hemmelgarn
2016-09-15  1:07         ` Nicholas D Steeves
2016-09-15  1:13           ` Steven Haigh
2016-09-15  2:14             ` stability matrix (was: Is stability a joke?) Christoph Anton Mitterer
2016-09-15  9:49               ` stability matrix Hans van Kranenburg
2016-09-15 11:54                 ` Austin S. Hemmelgarn
2016-09-15 14:15                   ` Chris Murphy
2016-09-15 14:56                   ` Martin Steigerwald
2016-09-19 14:38                   ` David Sterba
2016-09-19 15:27               ` stability matrix (was: Is stability a joke?) David Sterba
2016-09-19 17:18                 ` stability matrix Austin S. Hemmelgarn
2016-09-19 19:52                   ` Christoph Anton Mitterer
2016-09-19 20:07                     ` Chris Mason
2016-09-19 20:36                       ` Christoph Anton Mitterer
2016-09-19 21:03                         ` Chris Mason
2016-09-19 19:45                 ` stability matrix (was: Is stability a joke?) Christoph Anton Mitterer
2016-09-20  7:59                   ` Duncan
2016-09-20  8:19                     ` Hugo Mills
2016-09-20  8:34                   ` David Sterba
2016-09-19 15:38         ` Is stability a joke? David Sterba
2016-09-19 21:25           ` Hans van Kranenburg
2016-09-12 16:27   ` Is stability a joke? (wiki updated) David Sterba
2016-09-12 16:56     ` Austin S. Hemmelgarn
2016-09-12 17:29       ` Filipe Manana
2016-09-12 17:42         ` Austin S. Hemmelgarn
2016-09-12 20:08       ` Chris Murphy
2016-09-13 11:35         ` Austin S. Hemmelgarn
2016-09-15 18:01           ` Chris Murphy
2016-09-15 18:20             ` Austin S. Hemmelgarn
2016-09-15 19:02               ` Chris Murphy
2016-09-15 20:16                 ` Hugo Mills
2016-09-15 20:26                   ` Chris Murphy
2016-09-16 12:00                     ` Austin S. Hemmelgarn
2016-09-19  2:57                       ` Zygo Blaxell
2016-09-19 12:37                         ` Austin S. Hemmelgarn
2016-09-19  4:08                 ` Zygo Blaxell
2016-09-19 15:27                   ` Sean Greenslade
2016-09-19 17:38                   ` Austin S. Hemmelgarn
2016-09-19 18:27                     ` Chris Murphy
2016-09-19 18:34                       ` Austin S. Hemmelgarn
2016-09-19 20:15                     ` Zygo Blaxell
2016-09-20 12:09                       ` Austin S. Hemmelgarn
2016-09-15 21:23               ` Christoph Anton Mitterer
2016-09-16 12:13                 ` Austin S. Hemmelgarn
2016-09-19  3:47       ` Zygo Blaxell
2016-09-19 12:32         ` Austin S. Hemmelgarn
2016-09-19 15:33           ` Zygo Blaxell
2016-09-12 19:57     ` Martin Steigerwald
2016-09-12 20:21       ` Pasi Kärkkäinen
2016-09-12 20:35         ` Martin Steigerwald
2016-09-12 20:44           ` Chris Murphy
2016-09-13 11:28             ` Austin S. Hemmelgarn
2016-09-13 11:39               ` Martin Steigerwald
2016-09-14  5:53             ` Marc Haber
2016-09-12 20:48         ` Waxhead
2016-09-13  8:38           ` Timofey Titovets
2016-09-13 11:26             ` Austin S. Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=be04c51d-c35d-39fe-c5f7-a7ab13d72cc5@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=hugo@carfax.org.uk \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=martin@lichtvoll.de \
    --cc=waxhead@online.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).