From: Forza <forza@tnonline.net>
To: Andrei Borzenkov <arvidjaar@gmail.com>,
Qu Wenruo <quwenruo.btrfs@gmx.com>,
Wang Yugui <wangyugui@e16-tech.com>, Qu Wenruo <wqu@suse.com>,
Nicholas D Steeves <nsteeves@gmail.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: What mechanisms protect against split brain?
Date: Wed, 8 Jun 2022 22:22:33 +0200 (GMT+02:00) [thread overview]
Message-ID: <5532eba.f5bd40a9.18144fbad50@tnonline.net> (raw)
In-Reply-To: <cf220242-25dd-9241-01d5-38555c262d9e@gmail.com>
---- From: Andrei Borzenkov <arvidjaar@gmail.com> -- Sent: 2022-06-08 - 16:11 ----
> On 08.06.2022 13:32, Qu Wenruo wrote:
>>
>>
>> On 2022/6/8 18:15, Wang Yugui wrote:
>>> Hi, Forza, Qu Wenruo
>>>
>>> I write a script to test RAID1 split brain base on Qu's work of raid5(*1)
>>> *1: https://lore.kernel.org/linux-btrfs/53f7bace2ac75d88ace42dd811d48b7912647301.1654672140.git.wqu@suse.com/T/#u
>>
>> No no no, that is not to address split brain, but mostly to drop cache
>> for recovery path to maximize the chance of recovery.
>>
>> It's not designed to solve split brain problem at all, it's just one
>> case of such problem.
>>
>> In fact, fully split brain (both have the same generation, but
>> experienced their own degraded mount) case can not be solved by btrfs
>> itself at all.
>>
>> Btrfs can only solve partial split brain case (one device has higher
>> generation, thus btrfs can still determine which copy is the correct one).
>>
>
> Start with both devices having the same generation number N.
>
> Mount device1 separately, do some writes, device has generation N+1.
>
> Mount device2 separately, do some writes, device has generation N+2.
>
> Applying changes between N+1 and N+2 to device1 is wrong because content
> of N+1 is different on both devices.
>
> So there is absolutely no difference between "same generation" and
> "higher generation".
>
> The only thing btrfs could do is to try to detect this and refuse to
> integrate another device. One suggested rather radical approach was to
> change UUID on degraded mount, but this is probably unfeasible in real life.
>
> Removing missing device from device list in superblock (or at least
> marking it as permanently missing until replaced) is probably another
> option.
>
> And write intent log as discussed further in this thread could be used
> as well - if btrfs detects write intent log on device it should refuse
> to add it to existing filesystem.
Thank you all for the feedback on this topic.
My take away from this is that we need to;
* clearly document the Btrfs cannot handle full split brain
* document best practices on how to recover from a temporary device loss
* add features that help btrfs handle both types of split brain in a safe way.
One such mechanism could be to write a simple hash on all existing devices during a degraded mount. If the old device is re-attached, the hash will mismatch and btrfs should reject this device.
Then a special 'btrfs device add --reattach' command could be developed that would do what Qu said, run a full scrub and correct any differences on the reattached device. This avoids writing all data again, which is faster and saves TBW.
Nicholas suggested a similar approach in another thread https://lore.kernel.org/linux-btrfs/87sfogkwbd.fsf@DigitalMercury.freeddns.org/T/#t
Thanks,
Forza
prev parent reply other threads:[~2022-06-08 20:22 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-29 11:34 What mechanisms protect against split brain? Forza
2022-06-08 2:44 ` Wang Yugui
2022-06-08 10:15 ` Wang Yugui
2022-06-08 10:32 ` Qu Wenruo
2022-06-08 10:58 ` Wang Yugui
2022-06-08 11:19 ` Qu Wenruo
2022-06-08 11:55 ` Wang Yugui
2022-06-08 11:59 ` Qu Wenruo
2022-06-08 11:40 ` Austin S. Hemmelgarn
2022-06-08 14:11 ` Andrei Borzenkov
2022-06-08 20:22 ` Forza [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5532eba.f5bd40a9.18144fbad50@tnonline.net \
--to=forza@tnonline.net \
--cc=arvidjaar@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=nsteeves@gmail.com \
--cc=quwenruo.btrfs@gmx.com \
--cc=wangyugui@e16-tech.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox