From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from plane.gmane.org ([80.91.229.3]:53635 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752576AbcAIKIj (ORCPT ); Sat, 9 Jan 2016 05:08:39 -0500 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1aHqRp-0001rd-1C for linux-btrfs@vger.kernel.org; Sat, 09 Jan 2016 11:08:37 +0100 Received: from ip98-167-165-199.ph.ph.cox.net ([98.167.165.199]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 09 Jan 2016 11:08:37 +0100 Received: from 1i5t5.duncan by ip98-167-165-199.ph.ph.cox.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 09 Jan 2016 11:08:37 +0100 To: linux-btrfs@vger.kernel.org From: Duncan <1i5t5.duncan@cox.net> Subject: Re: Purposely using btrfs RAID1 in degraded mode ? Date: Sat, 9 Jan 2016 10:08:30 +0000 (UTC) Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: Chris Murphy posted on Mon, 04 Jan 2016 10:41:09 -0700 as excerpted: > On Mon, Jan 4, 2016 at 10:00 AM, Alphazo wrote: > >> I have tested the above use case with a couple of USB flash drive and >> even used btrfs over dm-crypt partitions and it seemed to work fine but >> I wanted to get some advices from the community if this is really a bad >> practice that should not be used on the long run. Is there any >> limitation/risk to read/write to/from a degraded filesystem knowing it >> will be re-synced later? > > As long as you realize you're testing a sort of edge case, but an > important one (it should work, that's the point of rw degraded mounts > being possible), then I think it's fine. > > The warning though is, you need to designate a specific drive for the > rw,degraded mounts. If you were to separately rw,degraded mount the two > drives, the fs will become irreparably corrupt if they are rejoined. And > you'll probably lose everything on the volume. The other thing is that > to "resync" you have to manually initiate a scrub, it's not going to > resync automatically, and it has to read everything on both drives to > compare and fix what's missing. There is no equivalent to a write intent > bitmap on Btrfs like with mdadm (the information ostensibly could be > inferred from btrfs generation metadata similar to how incremental > snapshot send/receive works) but that work isn't done. In addition to what CMurphy says above (which I see you/Alphazo acked), be aware that btrfs' chunk-writing behavior isn't particularly well suited to this sort of split-raid1 application. In general, btrfs allocates space in two steps. First, it allocates rather large "chunks" of space, data chunks separately from metadata (unless you use --mixed mode, when you first setup the filesystem with mkfs.btrfs, then data and metadata are mixed together in the same chunks). Data chunks are typically 1 GiB in size except on filesystems over 100 GiB (where they're larger), while metadata chunks are typically 256 MiB (as are mixed-mode chunks). Then btrfs uses space from these chunks until they get full, at which point it will attempt to allocate more chunks. Older btrfs (before kernel 3.17, IIRC) could allocate chunks, but didn't know how to deallocate chunks when empty, so a common problem back then was that over time, all free space would be allocated to empty data chunks, and people would run into ENOSPC errors when metadata chunks ran out of space, but more couldn't be created because all the empty space was in data chunks. Newer btrfs automatically reclaims empty chunks, so this doesn't happen so often. But here comes the problem for the use-case you've described. Btrfs can't allocate raid1 chunks if there's only a single device, because raid1 requires two devices. So what's likely to happen is that at some point, you'll be away from home and the existing raid1 chunks, either data or metadata, will fill up, and btrfs will try to allocate more. But you'll be running in degraded mode with only a single device, and it wouldn't be able to allocate raid1 chunks with just that single device. Oops! Big problem! Now until very recently (I believe thru current 4.3), what would happen in this case is that btrfs would find that it couldn't create a new chunk in raid1 mode, and if operating degraded, would then fall back to creating it in single mode. Which lets you continue writing, so all is well. Except... once you unmounted and attempted to mount the device again, still degraded, it would see the single-mode chunks on a filesystem that was supposed to have two devices, and would refuse to mount degraded,rw again. You could only mount degraded,ro. Of course in your use-case, you could still wait until you got home and mount undegraded again, which would allow you to mount writable. But a scrub wouldn't sync the single chunks. For that, after the scrub, you'd need to run a filtered balance-convert, to convert the single chunks back to raid1. Something like this (one command): btrfs balance start -dprofile=single,convert=raid1 -mprofile=single,convert=raid1 There are very new patches that should solve the problem of not being able to mount degraded,rw after single mode chunks are found, provided all those single mode chunks actually exist on the found device(s). I think but I'm not sure, that they're in 4.4. That would give you more flexibility in terms of mounting degraded,rw after single chunks have been created on the device you have with you, but you'd still need to run both a scrub, to sync the raid1 chunks, and a balance, to convert the single chunks to raid1 and sync them, once you had both devices connected. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman