Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Kai Krakow <hurikhan77@gmail.com>, <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs check inconsistency with raid1, part 1
Date: Tue, 22 Dec 2015 10:15:58 +0800	[thread overview]
Message-ID: <5678B25E.5030104@cn.fujitsu.com> (raw)
In-Reply-To: <20151222024804.5c4f7da2@jupiter.sol.kaishome.de>



Kai Krakow wrote on 2015/12/22 02:48 +0100:
> Am Tue, 22 Dec 2015 09:22:20 +0800
> schrieb Qu Wenruo <quwenruo@cn.fujitsu.com>:
>
>>
>>
>> Kai Krakow wrote on 2015/12/22 02:05 +0100:
>>> Am Mon, 21 Dec 2015 10:23:31 +0800
>>> schrieb Qu Wenruo <quwenruo@cn.fujitsu.com>:
>>>
>>>>
>>>>
>>>> Chris Murphy wrote on 2015/12/20 19:12 -0700:
>>>>> On Sun, Dec 20, 2015 at 6:43 PM, Qu Wenruo
>>>>> <quwenruo@cn.fujitsu.com> wrote:
>>>>>>
>>>>>>
>>>>>> Chris Murphy wrote on 2015/12/20 15:31 -0700:
>>>>>
>>>>>>> I think the cause is related to bus power with buggy USB 3 LPM
>>>>>>> firmware (these enclosures are cheap maybe $6). I've found some
>>>>>>> threads about this being a problem, but it's not expected to
>>>>>>> cause any corruptions. So, the fact Btrfs picks up one some
>>>>>>> problems might prove that (somewhat) incorrect.
>>>>>>
>>>>>>
>>>>>> Seems possible. Maybe some metadata just failed to reach disk.
>>>>>> BTW, did I asked for a btrfs-show-super output?
>>>>>
>>>>> Nope. I will attach to this email below for both devices.
>>>>>
>>>>>> If that's the case, superblock on device 2 maybe older than
>>>>>> superblock on device 1.
>>>>>
>>>>> Yes, looks iike devid 1 transid 4924, and devid 2 transid 4923.
>>>>> And it's devid 2 that had device reset and write errors when it
>>>>> vanished and reappeared as a different block device.
>>>>>
>>>>
>>>> Now all the problem is explained.
>>>>
>>>> You should be good to mount it rw, as RAID1 will handle all the
>>>> problem.
>>>
>>> How should RAID1 handle this if both copies have valid checksums
>>> (as I would assume here unless shown otherwise)? This is an even
>>> bigger problem with block based RAID1 which does not have checksums
>>> at all. Luckily, btrfs works different here.
>>
>> No, these two devices don't have the same generation, which means
>> they point to *different* bytenr.
>>
>> Like the following:
>>
>> Super of Dev1:
>> gen: X + 1
>> root bytenr: A (Btrfs logical)
>> logical A is mapped to A1 on dev1 and A2 on dev2.
>>
>> Super of Dev2:
>> gen: X
>> root bytenr: B
>> Here we don't need to bother bytenr B though.
>>
>> Due to the power bug, A2 and super of dev2 is not written to dev2.
>>
>> So you should see the problem now.
>> A1 on dev1 contains *valid* tree block, but A2 on dev2 doesn't(empty
>> data only).
>>
>> And your assumption on "both have valid copies" is wrong.
>>
>> Check all the 4 attachment in previous mail.
>
> I did only see those attachments at a second glance. Sry.
>
> Primarily I just wanted to note that RAID1 per-se doesn't mean anything
> more than: we have two readable copies but we don't know which one is
> correct. As in: let the admin think twice about it before blindly
> following a guide.
>
> This is why I pointed out btrfs csums which make this a little better
> which in turn has further consequences as you describe (for the
> treeblock).
>
> In contrast to block-level RAID btrfs usually has the knowledge which
> block is correct and which is not.
>
> I just wondered if btrfs allows for the case where both stripes could
> have valid checksums despite of btrfs-RAID - just because a failure
> occurred right on the spot.
>
> Is this possible? What happens then? If yes, it would mean not to
> blindly trust the RAID without doing the homeworks.

Very interesting question.
Although btrfs is a little beyond your expectation on block based RAID1.

1) Yes, it is possible.

2) Btrfs still detects it as an transid error and won't trust the
    metadata.(kernel behavior)
    And since it's raid1, it will try next copy to go on.

    The trick here is, btrfs metadata doesn't only record bytenr of its
    child tree block, but also the tranid(generation) of the tree block.

    So even such case happens, the transid won't match, and cause btrfs
    detects the error.

Thanks,
Qu
>
>>>> Then you can either use scrub on dev2 to fix all the
>>>> generation mismatch.
>>>
>>> I better understand why this could fix a problem...
>>
>> Why not?
>>
>> Tree block/data copy on dev1 is valid, but tree block/data copy on
>> dev2 is empty(not written), so btrfs detects the csum error, and
>> scrub will try to rewrite it.
>>
>> After rewrite, both copy on dev1 and dev2 with match and fix the
>> problem.
>
> Exactly. ;-) Didn't say anything against it.
>
>



  reply	other threads:[~2015-12-22  2:18 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-14  4:16 btrfs check inconsistency with raid1, part 1 Chris Murphy
2015-12-14  5:48 ` Qu Wenruo
2015-12-14  7:24   ` Chris Murphy
2015-12-14  8:04     ` Qu Wenruo
2015-12-14 17:59       ` Chris Murphy
2015-12-20 22:32         ` Chris Murphy
     [not found]         ` <CAJCQCtSEx_wYPkfazik0bcpQwXxJCA=O5f0o6RbxON4jjB4q7A@mail.gmail.com>
     [not found]           ` <5677592F.5000202@cn.fujitsu.com>
2015-12-21  2:12             ` Chris Murphy
2015-12-21  2:23               ` Qu Wenruo
2015-12-21  2:46                 ` Chris Murphy
2015-12-22  1:05                 ` Kai Krakow
2015-12-22  1:22                   ` Qu Wenruo
2015-12-22  1:48                     ` Kai Krakow
2015-12-22  2:15                       ` Qu Wenruo [this message]
2015-12-22  4:21                         ` Chris Murphy
2015-12-22 10:23                       ` Duncan
2015-12-22 15:44                         ` Austin S. Hemmelgarn
2015-12-29 21:33                           ` Chris Murphy
2015-12-14 11:51     ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5678B25E.5030104@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=hurikhan77@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox