From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Recover btrfs volume which can only be mounded in read-only mode
Date: Mon, 26 Oct 2015 07:09:10 +0000 (UTC) [thread overview]
Message-ID: <pan$64d73$867e5ae0$e0eaca78$1ba6fac6@cox.net> (raw)
In-Reply-To: 562369E8.60709@gmail.com
Dmitry Katsubo posted on Sun, 18 Oct 2015 11:44:08 +0200 as excerpted:
[Regarding the btrfs raid1 "device-with-the-most-space" chunk-allocation
strategy.]
> I think the mentioned strategy (fill in the device with most free space)
> is not most effective. If the data is spread equally, the read
> performance would be higher (reading from 3 disks instead of 2). In my
> case this is even crucial, because the smallest drive is SSD (and it is
> not loaded at all).
>
> Maybe I don't see the benefit from the strategy which is currently
> implemented (besides that it is robust and well-tested)?
Two comments:
1) As Hugo alluded to, in striped mode (raid0/5/6 and I believe 10), the
chunk allocator goes wide, allocating a chunk from each device with free
space, then striping at something smaller (64 KiB maybe?). When the
smallest device is full, it reduces the width by one and continues
allocating, down to the minimum stripe width for the raid type. However,
raid1 and single do device-with-the-most-space first, thus, particularly
for raid1, ensuring maximum usage of available space.
Were raid1 to do width-first, capacity would be far lower and much more
of the largest device would remain unusable, because some chunk pairs
would be allocated entirely on the smaller devices, meaning less of the
largest device would be used before the smaller devices fill up and no
more raid1 chunks could be allocated as only the single largest device
has free space left and raid1 requires allocation on two separate devices.
In the three-device raid1 case, the difference in usable capacity would
be 1/3 the capacity of the smallest device, since until it is full, 1/3
of all allocations would be to the two smaller devices, leaving that much
more space unusable on the largest device.
So you see there's a reason for most-space-first, that being that it
forces one chunk from each pair-allocation to the largest device, thereby
most efficiently distributing space so as to leave as little space as
possible unusable due to only one device left when pair-allocation is
required.
2) There has been talk of a more flexible chunk allocator with an admin-
specified strategy allowing smart use of hybrid ssd/disk filesystems, for
instance. Perhaps put the metadata on the ssds, for instance, since
btrfs metadata is relatively hot as in addition to the traditional
metadata, it contains the checksums which btrfs of course checks on read.
However, this sort of thing is likely to be some time off, as it's
relatively lower priority than various other possible features.
Unfortunately, given the rate of btrfs development, "some time off" is in
practice likely to be at least five years out.
In the mean time, there's technologies such as bcache that allow hybrid
caching of "hot" data, designed to present themselves as virtual block
devices so btrfs as well as other filesystems can layer on top.
And in fact, we have some regular users that have btrfs on top of bcache
actually deployed, and from reports, it now works quite well. (There
were some problems awhile in the past, but they're several years in the
past now, back well before the last couple LTS kernel series that's the
oldest recommended for btrfs deployment.)
If you're interested, start a new thread with btrfs on bcache in the
subject line, and you'll likely get some very useful replies. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2015-10-26 7:09 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-14 14:28 Recover btrfs volume which can only be mounded in read-only mode Dmitry Katsubo
2015-10-14 14:40 ` Anand Jain
2015-10-14 20:27 ` Dmitry Katsubo
2015-10-15 0:48 ` Duncan
2015-10-15 14:10 ` Dmitry Katsubo
2015-10-15 14:55 ` Hugo Mills
2015-10-16 8:18 ` Duncan
2015-10-18 9:44 ` Dmitry Katsubo
2015-10-26 7:09 ` Duncan [this message]
2015-10-26 9:14 ` Duncan
2015-10-26 9:24 ` Hugo Mills
2015-10-27 5:58 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$64d73$867e5ae0$e0eaca78$1ba6fac6@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).