From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: btrfs check inconsistency with raid1, part 1
Date: Tue, 22 Dec 2015 10:23:17 +0000 (UTC) [thread overview]
Message-ID: <pan$30f15$f6f8b13b$d72dd248$aaee075f@cox.net> (raw)
In-Reply-To: 20151222024804.5c4f7da2@jupiter.sol.kaishome.de
Kai Krakow posted on Tue, 22 Dec 2015 02:48:04 +0100 as excerpted:
> I just wondered if btrfs allows for the case where both stripes could
> have valid checksums despite of btrfs-RAID - just because a failure
> occurred right on the spot.
>
> Is this possible? What happens then? If yes, it would mean not to
> blindly trust the RAID without doing the homeworks.
The one case where btrfs could get things wrong that I know of is as I
discovered in my initial pre-btrfs-raid1-deployment testing...
1) Create a two-device btrfs raid1 (data and metadata) and ensure some
data on it, including a test file with some content to be modified later.
Sync and unmount normally.
2) Remove one of the two devices.
3) Mount the remaining device degraded-writable (it shouldn't allow
mounting without degraded) and modify that test file. Sync and unmount.
4) Switch devices and repeat, modifying that test file in some other
incompatible way. Sync and unmount.
To this point, everything should be fine, except that you now have two
incompatible versions of the test file, potentially with the same
separate-but-equal generation numbers after the separate degraded-
writable mount, modify, unmount, cycles.
5) Plug both devices in and mount normally. Unless this has changed
since my tests, btrfs will neither complain in dmesg nor otherwise
provide any hint than anything is wrong. If you read the file, it'll
give you one of the versions, still not complaining or providing any hint
that something's wrong. Again unmount, without writing anything to the
test file this time.
6) Try separately mounting each device individually again (without the
other one available so degraded, can be writable or read-only this time)
and check the file. Each incompatible copy should remain in place on its
respective device. Reading the one copy (randomly chosen or more
precisely, chosen based on PID even/odd, as that's what the btrfs raid1
read-scheduler uses to decide which copy to read) didn't change the other
one -- btrfs remained oblivious to the incompatible versions. Again
unmount.
7) Plug both devices in and mount the combined filesystem writable once
again. Scrub.
Back when I did my testing, I stopped at step 6 as I didn't understand
that scrub was what I should use to resolve the problem. However, based
on quite a bit of later experience due to keeping a failing device (more
and more sectors replaced with spares, turns out at least the SSD I was
working with had way more spares than I would have expected, and even
after several months when I finally gave up and replaced it, I was only
down to about 85% of spares left, 15% used) around in raid1 mode for
awhile, this should *NORMALLY* not be a problem. As long as the
generations differ, btrfs scrub can sort things out and catch up the
"behind" device, resolving all differences to the latest generation copy.
8) But if both generations happen to be the same, having both been
mounted separately and written so they diverged, but so they end up at
the same generation when recombined...
>From all I know and from everything others told me when I asked at the
time, which copy you get then is entirely unpredictable, and worse yet,
you might get btrfs acting on divergent metadata when writing to the
other device.
The caution, therefore, is to do your best not to ever let the two copies
be both mounted degraded-writable, separately. If only one copy is
written to, then its generation will be higher than the other one, and
scrub should have no problem resolving things. Even if both copies are
separately written to incompatibly, in most real-world cases one's going
to have more generations written than the other and scrub should reliably
and predictably resolve differences in favor of that one. The problem
only appears if they actually happen to have the same generation number,
relatively unlikely except under controlled test conditions, but that has
the potential to be a *BIG* problem should it actually occur.
So if for some reason you MUST mount both copies degraded-writable
separately, the following are your options:
a) don't ever recombine them, doing a device replace missing with a third
device instead (or a convert to single/dup); use one of the options below
if you do need to recombine, or...
b) manually verify (using btrfs-show-super or the like) that the supers
on each don't have the same generation before attempting a recombine,
or...
c) wipe the one device and treat it as a new device add, so btrfs can't
get mixed up with differing versions at the same generation number, or...
d) simply take your chances and hope that the generation numbers don't
match.
(D should in practice be "good enough" if one was only mounted writable a
very short time, while the other was written to over a rather longer
period, such that it almost certainly had far more intervening commits
and thus generations than the other.)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2015-12-22 10:23 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-14 4:16 btrfs check inconsistency with raid1, part 1 Chris Murphy
2015-12-14 5:48 ` Qu Wenruo
2015-12-14 7:24 ` Chris Murphy
2015-12-14 8:04 ` Qu Wenruo
2015-12-14 17:59 ` Chris Murphy
2015-12-20 22:32 ` Chris Murphy
[not found] ` <CAJCQCtSEx_wYPkfazik0bcpQwXxJCA=O5f0o6RbxON4jjB4q7A@mail.gmail.com>
[not found] ` <5677592F.5000202@cn.fujitsu.com>
2015-12-21 2:12 ` Chris Murphy
2015-12-21 2:23 ` Qu Wenruo
2015-12-21 2:46 ` Chris Murphy
2015-12-22 1:05 ` Kai Krakow
2015-12-22 1:22 ` Qu Wenruo
2015-12-22 1:48 ` Kai Krakow
2015-12-22 2:15 ` Qu Wenruo
2015-12-22 4:21 ` Chris Murphy
2015-12-22 10:23 ` Duncan [this message]
2015-12-22 15:44 ` Austin S. Hemmelgarn
2015-12-29 21:33 ` Chris Murphy
2015-12-14 11:51 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$30f15$f6f8b13b$d72dd248$aaee075f@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox