From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f179.google.com ([209.85.223.179]:34255 "EHLO mail-io0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751367AbcBJOAC (ORCPT ); Wed, 10 Feb 2016 09:00:02 -0500 Received: by mail-io0-f179.google.com with SMTP id 9so21734489iom.1 for ; Wed, 10 Feb 2016 06:00:02 -0800 (PST) Subject: Re: BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions To: Chris Murphy , Mackenzie Meyer References: Cc: Btrfs BTRFS From: "Austin S. Hemmelgarn" Message-ID: <56BB41E3.8050906@gmail.com> Date: Wed, 10 Feb 2016 08:57:55 -0500 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 2016-02-09 15:39, Chris Murphy wrote: > On Fri, Feb 5, 2016 at 12:36 PM, Mackenzie Meyer wrote: > >> >> RAID 6 write holes? > > I don't even understand the nature of the write hole on Btrfs. If > modification is still always COW, then either an fs block, a strip, or > whole stripe write happens, I'm not sure where the hole comes from. It > suggests some raid56 writes are not atomic. It's an issue of torn writes in this case, not of atomicity of BTRFS. Disks can't atomically write more than sector size chunks, which means that almost all BTRFS filesystems are doing writes that disks can't atomically complete. Add to that that we serialized writes to different devices, and it becomes trivial to lose some data if the system crashes while BTRFS is writing out a stripe (it shouldn't screw up existing data though, you'll just loose whatever you were trying to write). One way to minimize this which would also boost performance on slow storage would be to avoid writing parts of the stripe that aren't changed (so for example, if only one disk in the stripe actually has changed data, only write that and the parities). > > If you're worried about raid56 write holes, then a.) you need a server > running this raid where power failures or crashes don't happen b.) > don't use raid56 c.) use ZFS. It's not just BTRFS that has this issue though, ZFS does too, it just recovers more gracefully than BTRFS does, and even with the journaled RAID{5,6} support that's being added in MDRAID (and by extension DM-RAID and therefore LVM), it still has the same issue, it just moves it elsewhere (in this case, it has problems if there's a torn write to the journal). > >> RAID 6 stability? >> Any articles I've tried looking for online seem to be from early 2014, >> I can't find anything recent discussing the stability of RAID 5 or 6. >> Are there or have there recently been any data corruption bugs which >> impact RAID 6? Would you consider RAID 6 safe/stable enough for >> production use? > > It's not stable for your use case, if you have to ask others if it's > stable enough for your use case. Simple as that. Right now some raid6 > users are experiencing remarkably slow balances, on the order of > weeks. If device replacement rebuild times are that long, I'd say it's > disqualifying for most any use case, just because there are > alternatives that have better fail over behavior than this. So far > there's no word from any developers what the problem might be, or > where to gather more information. So chances are they're already aware > of it but haven't reproduced it, or isolated it, or have a fix for it > yet. Double on this, we should probably put something similar on the wiki, and this really applies to any feature, not just raid56. > >> Do you still strongly recommend backups, or has stability reached a >> point where backups aren't as critical? I'm thinking from a data >> consistency standpoint, not a hardware failure standpoint. > > You can't separate them. On completely stable hardware, stem to stern, > you'd have no backups, no Btrfs or ZFS, you'd just run linear/concat > arrays with XFS, for example. So you can't just hand wave the hardware > part away. There are bugs in the entire storage stack, there are > connectors that can become intermittent, the system could crash. All > of these affect data consistency. I may be wrong, but I believe the intent of this question was to try and figure out how likely BTRFS itself is to cause crashes or data corruption, independent of the hardware. In other words, 'Do I need to worry significantly about BTRFS in planning for disaster recovery, or can I focus primarily on the hardware itself?' or 'Is the most likely failure mode going to be hardware failure, or software?'. In general, right now I'd say that using BTRFS in traditional multi-device setup (nothing more than raid1 or possibly raid10), you've got roughly a 50% chance of an arbitrary crash being a software issue instead of hardware. Single disk, I'd say it's probably closer to 25%, and raid56 I'd say it's probably closer to 75%. By comparison, I'd say that with ZFS it's maybe a 5% chance (ZFS is developed as enterprise level software, it has to work, period), and with XFS on LVM raid, probably about 15% (similar to ZFS, XFS is supposed to be enterprise level software, the difference here comes from LVM, which has had some interesting issues recently due to incomplete testing of certain things before they got pushed upstream). > > Stability has not reach a point where backups aren't as critical. I > don't really even know what that means though. No matter Btrfs or not, > you need to be doing backups such that if the primary stack is a 100% > loss without notice, is not a disaster. Plan on having to use it. If > you don't like the sound of that, look elsewhere. What your using has impact on how you need to do backups. For someone who can afford long periods of down time for example, it may be perfectly fine to use something like Amazon S3 Glacier storage (which has a 4 hour lead time on restoration for read access) for backups. OTOH, if you can't afford more than a few minutes of down time and want to use BTRFS, you should probably have full on-line on-site backups which you can switch in on a moments notice while you fix things.