Re: Purposely using btrfs RAID1 in degraded mode ?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Purposely using btrfs RAID1 in degraded mode ?
Date: Sat, 9 Jan 2016 10:08:30 +0000 (UTC)	[thread overview]
Message-ID: <pan$c865f$a56b47b9$50596374$de688f5d@cox.net> (raw)
In-Reply-To: CAJCQCtS2ePxA44YCUS=Q7xS1sWszPmzDe13idDVvxwpeqHD05A@mail.gmail.com

Chris Murphy posted on Mon, 04 Jan 2016 10:41:09 -0700 as excerpted:

> On Mon, Jan 4, 2016 at 10:00 AM, Alphazo <alphazo@gmail.com> wrote:
> 
>> I have tested the above use case with a couple of USB flash drive and
>> even used btrfs over dm-crypt partitions and it seemed to work fine but
>> I wanted to get some advices from the community if this is really a bad
>> practice that should not be used on the long run. Is there any
>> limitation/risk to read/write to/from a degraded filesystem knowing it
>> will be re-synced later?
> 
> As long as you realize you're testing a sort of edge case, but an
> important one (it should work, that's the point of rw degraded mounts
> being possible), then I think it's fine.
> 
> The warning though is, you need to designate a specific drive for the
> rw,degraded mounts. If you were to separately rw,degraded mount the two
> drives, the fs will become irreparably corrupt if they are rejoined. And
> you'll probably lose everything on the volume. The other thing is that
> to "resync" you have to manually initiate a scrub, it's not going to
> resync automatically, and it has to read everything on both drives to
> compare and fix what's missing. There is no equivalent to a write intent
> bitmap on Btrfs like with mdadm (the information ostensibly could be
> inferred from btrfs generation metadata similar to how incremental
> snapshot send/receive works) but that work isn't done.

In addition to what CMurphy says above (which I see you/Alphazo acked), 
be aware that btrfs' chunk-writing behavior isn't particularly well 
suited to this sort of split-raid1 application.

In general, btrfs allocates space in two steps.  First, it allocates 
rather large "chunks" of space, data chunks separately from metadata 
(unless you use --mixed mode, when you first setup the filesystem with 
mkfs.btrfs, then data and metadata are mixed together in the same 
chunks).  Data chunks are typically 1 GiB in size except on filesystems 
over 100 GiB (where they're larger), while metadata chunks are typically 
256 MiB (as are mixed-mode chunks).

Then btrfs uses space from these chunks until they get full, at which 
point it will attempt to allocate more chunks.

Older btrfs (before kernel 3.17, IIRC) could allocate chunks, but didn't 
know how to deallocate chunks when empty, so a common problem back then 
was that over time, all free space would be allocated to empty data 
chunks, and people would run into ENOSPC errors when metadata chunks ran 
out of space, but more couldn't be created because all the empty space 
was in data chunks.

Newer btrfs automatically reclaims empty chunks, so this doesn't happen 
so often.

But here comes the problem for the use-case you've described.  Btrfs 
can't allocate raid1 chunks if there's only a single device, because 
raid1 requires two devices.

So what's likely to happen is that at some point, you'll be away from 
home and the existing raid1 chunks, either data or metadata, will fill 
up, and btrfs will try to allocate more.  But you'll be running in 
degraded mode with only a single device, and it wouldn't be able to 
allocate raid1 chunks with just that single device.

Oops!  Big problem!

Now until very recently (I believe thru current 4.3), what would happen 
in this case is that btrfs would find that it couldn't create a new chunk 
in raid1 mode, and if operating degraded, would then fall back to 
creating it in single mode.  Which lets you continue writing, so all is 
well.  Except... once you unmounted and attempted to mount the device 
again, still degraded, it would see the single-mode chunks on a 
filesystem that was supposed to have two devices, and would refuse to 
mount degraded,rw again.  You could only mount degraded,ro.  Of course in 
your use-case, you could still wait until you got home and mount 
undegraded again, which would allow you to mount writable.

But a scrub wouldn't sync the single chunks.  For that, after the scrub, 
you'd need to run a filtered balance-convert, to convert the single 
chunks back to raid1.  Something like this (one command):

btrfs balance start -dprofile=single,convert=raid1 
-mprofile=single,convert=raid1

There are very new patches that should solve the problem of not being 
able to mount degraded,rw after single mode chunks are found, provided 
all those single mode chunks actually exist on the found device(s).  I 
think but I'm not sure, that they're in 4.4.  That would give you more 
flexibility in terms of mounting degraded,rw after single chunks have 
been created on the device you have with you, but you'd still need to run 
both a scrub, to sync the raid1 chunks, and a balance, to convert the 
single chunks to raid1 and sync them, once you had both devices connected.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2016-01-09 10:08 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-04 17:00 Purposely using btrfs RAID1 in degraded mode ? Alphazo
2016-01-04 17:41 ` Chris Murphy
2016-01-06 12:30   ` Alphazo
2016-01-09 10:08   ` Duncan [this message]
2016-01-11 22:17     ` Alphazo
2016-01-05 16:34 ` Psalle
2016-01-06 12:34   ` Alphazo
2016-01-07 12:57     ` Psalle
2016-01-07 13:09       ` Alphazo
2016-01-07 17:34         ` Sree Harsha Totakura
2016-01-11 14:25         ` Psalle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$c865f$a56b47b9$50596374$de688f5d@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).