Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Wang Yugui <wangyugui@e16-tech.com>, Qu Wenruo <wqu@suse.com>
Cc: Forza <forza@tnonline.net>, Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: What mechanisms protect against split brain?
Date: Wed, 8 Jun 2022 18:32:31 +0800	[thread overview]
Message-ID: <a97ff3a3-7b14-e6a4-32e9-b9da8cec422e@gmx.com> (raw)
In-Reply-To: <20220608181502.4AB1.409509F4@e16-tech.com>



On 2022/6/8 18:15, Wang Yugui wrote:
> Hi, Forza, Qu Wenruo
>
> I write a script to test RAID1 split brain base on Qu's work of raid5(*1)
> *1: https://lore.kernel.org/linux-btrfs/53f7bace2ac75d88ace42dd811d48b7912647301.1654672140.git.wqu@suse.com/T/#u

No no no, that is not to address split brain, but mostly to drop cache
for recovery path to maximize the chance of recovery.

It's not designed to solve split brain problem at all, it's just one
case of such problem.

In fact, fully split brain (both have the same generation, but
experienced their own degraded mount) case can not be solved by btrfs
itself at all.

Btrfs can only solve partial split brain case (one device has higher
generation, thus btrfs can still determine which copy is the correct one).

>
> #!/bin/bash
> set -uxe -o pipefail
>
> mnt=/mnt/test
> dev1=/dev/vdb1
> dev2=/dev/vdb2
>
>    dmesg -C
>    mkdir -p $mnt
>
>    mkfs.btrfs -f -m raid1 -d raid1 $dev1 $dev2
>    mount $dev1 $mnt
>    xfs_io -f -c "pwrite -S 0xee 0 1M" $mnt/file1
>    sync
>    umount $mnt
>
>    btrfs dev scan -u $dev2
>    mount -o degraded $dev1 $mnt
>    #xfs_io -f -c "pwrite -S 0xff 0 128M" $mnt/file2
>    mkdir -p $mnt/branch1; /bin/cp -R /usr/bin $mnt/branch1 #complex than xfs_io
>    umount $mnt
>
>    btrfs dev scan
>    btrfs dev scan -u $dev1
>    mount -o degraded $dev2 $mnt

Your case is the full split brain case.

Not possible to solve.

In fact, if you don't do the degraded mount on dev2, btrfs is completely
fine to resilve the fs without any problem.

Thanks,
Qu
>    #xfs_io -f -c "pwrite -S 0xff 0 128M" $mnt/file2
>    mkdir -p $mnt/branch2; /bin/cp -R /usr/lib64 $mnt/branch2 #complex than xfs_io
>    umount $mnt
>
>    btrfs dev scan
>    mount $dev1 $mnt # *1
>    ls $mnt
>
>    btrfs balance start --full-balance $mnt # *2
>    #btrfs scrub start -B $mnt  # *3
>    #btrfs scrub start $mnt; sleep 2; btrfs scrub status $mnt; btrfs scrub start -B $mnt; # *4
>
>    umount $mnt
>
> test result:
> we may fail in # *1; # *2; # *3; #*4 with different frequency.
>
> dmesg output:
> 1)
> [ 1379.124079] BTRFS error (device vdb1): tree level mismatch detected, bytenr=31866880 level expected=1 has=0
> [ 1379.127928] BTRFS error (device vdb1): tree level mismatch detected, bytenr=31866880 level expected=1 has=0
> [ 1379.132109] BTRFS error (device vdb1: state C): failed to load root csum
> [ 1379.137281] BTRFS error (device vdb1: state C): open_ctree failed
>
> 2)
> [ 2950.467178] BTRFS error (device vdb1): tree first key mismatch detected, bytenr=32342016 parent_transid=9 key expected=(301555712,168,106496) has=(2552,96,5)
> [ 2950.471283] BTRFS error (device vdb1): tree first key mismatch detected, bytenr=32342016 parent_transid=9 key expected=(301555712,168,106496) has=(2552,96,5)
> [ 2950.479960] BTRFS info (device vdb1): balance: ended with status: -117
>
> so RAID1 split brain case yet not supported by btrfs now.
>
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2022/06/08
>
>> Hi,
>>
>> I tried some test about this case.
>>
>> After the missing RAID1 device is re-introduced,
>> 1, mount/read seem to work.
>>     checksum based error detect help.
>>     current pid based i/o patch select policy may help too.
>>         preferred_mirror = first + (current->pid % num_stripes);
>>
>> 2, 'btrfs scrub' failed to finish.
>>      Any advice to return to clean state?
>>
>> Best Regards
>> Wang Yugui (wangyugui@e16-tech.com)
>> 2022/06/08
>>
>>> Hi,
>>>
>>> Recently there have been some discussions, both here on the mailing list and on #btrfs IRC, about the consequences of mounting one RAID1 mirror as degraded and then later re-introduce the missing device. But also on having degraded mount option in fstab and kernel command line.
>>>
>>> So I wonder if Btrfs has some protective mechanisms against data loss/corruption if a drive is missing for a bit but later re-introduced. There is also the case of split brain where each mirror might be independently updated and then recombined.
>>>
>>> Is there an official recommendation to have with regards to degraded mounts from kernel command line? I understand the use case as it allows the system to boot even if a device goes missing or dead after a reboot.
>>>
>>> Thanks,
>>> Forza
>>
>
>

  reply	other threads:[~2022-06-08 10:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-29 11:34 What mechanisms protect against split brain? Forza
2022-06-08  2:44 ` Wang Yugui
2022-06-08 10:15   ` Wang Yugui
2022-06-08 10:32     ` Qu Wenruo [this message]
2022-06-08 10:58       ` Wang Yugui
2022-06-08 11:19         ` Qu Wenruo
2022-06-08 11:55           ` Wang Yugui
2022-06-08 11:59             ` Qu Wenruo
2022-06-08 11:40       ` Austin S. Hemmelgarn
2022-06-08 14:11       ` Andrei Borzenkov
2022-06-08 20:22         ` Forza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a97ff3a3-7b14-e6a4-32e9-b9da8cec422e@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=forza@tnonline.net \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wangyugui@e16-tech.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox