From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Scrub on btrfs single device only to detect errors, not correct them?
Date: Tue, 8 Dec 2015 13:38:46 +0000 (UTC) [thread overview]
Message-ID: <pan$55430$b692802c$678dae98$d07f0732@cox.net> (raw)
In-Reply-To: CA+pSGYcFAxsk=8EWPz4wM0Gjk4Nyra0HEcTjbJtRQU8-zfSVjA@mail.gmail.com
Jon Panozzo posted on Mon, 07 Dec 2015 08:43:14 -0600 as excerpted:
[On single-device dup data]
> Thanks for the additional feedback. Two follow-up questions to this is:
>
> Can the --mixed option only be applied when first creating the fs, or
> can you simply add this to the balance command to take an existing
> filesystem and add this to it?
Mixed-bg mode has to be done at btrfs creation.
It changes the way btrfs handles chunks, and doing that _live_, with a
non-zero time during which both modes are active, would be... complex and
an invitation to all sorts of race bugs, to put it mildly.
> So it sounds like there are really three ways to enable scrub to repair
> errors on a btrfs single device (please confirm):
Yes.
> 1) mkfs.btrfs with the --mixed option
This would be my current preferred to filesystem sizes of a quarter to
perhaps a half terabyte on spinning rust, and some people are known to
use mixed for exactly this reason, tho it's not particularly well tested
at the terabyte scale filesystem level, where as a result you might
uncover some unusual bugs.
> 2) create two partitions on a single phys device,
> then present them as logical devices (maybe a loopback or something)
> and create a btrfs raid1 for both data/metadata
No special loopback, etc, required. Btrfs deploys just fine on pretty
much any block device as presented by the kernel, including both
partitions and LVM volumes, the two ways single physical devices are
likely to be presented as multiple logical devices.
In fact I use btrfs on partitions here, tho in my case it's two devices
partitioned up identically, with raid1 across the parallel partitions on
each device, instead of using multiple partitions on the same physical
device, which is what we're talking about here.
This option will be rather inefficient on spinning rust as the write head
will have to write one copy to the one partition, then reposition itself
to write the second copy to the other partition, and that repositioning
is non-zero time on spinning rust, but there's no such repositioning
latency on SSDs, where it might actually be faster than mixed-mode, tho
I'm unaware of any benchmarking to find out.
Despite the inefficiency, both partitions and btrfs raid1 are separately
well tested and their combined use on a single device should introduce no
race conditions that wouldn't have been found by previous separate usage,
so this would be my current preferred at filesystem sizes over a half
terabyte on spinning rust, or on SSDs with their zero seek times.
But writing /will/ be slow on spinning rust, particularly with partition
sizes of a half-TiB or larger each, as that write-mode seek-time will be
/nasty/.
That said, again, there are people known to be using this mode, and it's
a viable choice in deployments such as laptops where physical multi-
device isn't an option, but the additional reliability of pair-copy data
is highly desirable.
> 3) wait for the patch in process to allow for btrfs single devices to
> support dup mode for data
This should be the preferred mode in the future, tho as with any new
btrfs feature, it'll probably take a couple kernel versions after initial
introduction for the most critical bugs in the new feature to be found
and duly exterminated, so I'd consider anyone using it the first kernel
cycle or two after introduction to be volunteering as guinea pigs. That
said, the individual components of this feature have been in btrfs for
some time and are well tested by now, so I'd expect the introduction of
this feature to be rather smoother than many. For the much more
disruptive raid56 mode, I suggested a guinea-pig time of a year, five
kernel cycles, for instance, and that turned out to be about right.
(Interestingly enough, that put raid56 mode feature stability at the soon
to be released kernel 4.4, which is scheduled to be a long-term-support
release, so the raid56 mode stability timing worked out rather well, tho
I had no idea 4.4 would be an LTS when I originally predicted the year's
settle-time.)
> Is that about right?
=:^)
One further caveat regarding SSDs.
On SSDs, many commonly deployed FTLs do dedup. Sandforce firmware, where
dedup is sold as a feature, is known for this. If the firmware is doing
dedup, then duplicated data /or/ metadata at the filesystem level is
simply being deduped at the physical device firmware level, so you end up
with only one physical copy in any case, and filesystem efforts to
provide redundancy only end up costing CPU cycles at both the filesystem
and device-firmware levels, all for naught. This is a big reason why
mkfs.btrfs on a single device defaults to single metadata if it detects
an SSD, despite the normally preferred dup metadata default.
So if you're deploying on SSDs using sandforce firmware or otherwise
known to do dedup at the FTL, don't bother with any of the above as the
firmware will be simply defeating your efforts at deliberate redundancy.
(FWIW, I happened to get lucky with my own SSDs as I knew way less about
them at the time I purchased mine, and happened to get SSDs designed for
server deployment that sell the /lack/ of dedup and compression as a
feature, because it makes latency and capacity much more stable and
predictable. So I can use dup mode in whatever form without fear of the
FTL second-guessing me, tho I actually use btrfs raid1 on two actual
physical device SSDs, on most of the partitions. But /boot is an
exception where I do actually use dup mode as opposed to raid1, on both
the working /boot on one device, and the backup /boot on the other
device. This is because while with grub2 I could actually use grub
rescue mode to load /boot from either device, rescue mode isn't the
easiest thing to use, and it's still easier to simply let grub point at
just one /boot, and use the BIOS to choose which device and thus grub and
associated /boot I'm going to actually boot from, the same way I did back
in the grub1 era, before grub had a rescue mode.)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2015-12-08 13:38 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-06 19:15 Scrub on btrfs single device only to detect errors, not correct them? Jon Panozzo
2015-12-06 20:42 ` Chris Murphy
2015-12-07 3:48 ` Duncan
2015-12-07 14:43 ` Jon Panozzo
2015-12-08 13:38 ` Duncan [this message]
2015-12-07 14:47 ` Jon Panozzo
2015-12-07 15:01 ` Austin S Hemmelgarn
2015-12-07 15:12 ` Jon Panozzo
2015-12-07 15:39 ` Austin S Hemmelgarn
2015-12-08 14:15 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$55430$b692802c$678dae98$d07f0732@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox