Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Forza <forza@tnonline.net>
To: Andrei Borzenkov <arvidjaar@gmail.com>,
	Qu Wenruo <quwenruo.btrfs@gmx.com>,
	Wang Yugui <wangyugui@e16-tech.com>, Qu Wenruo <wqu@suse.com>,
	Nicholas D Steeves <nsteeves@gmail.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: What mechanisms protect against split brain?
Date: Wed, 8 Jun 2022 22:22:33 +0200 (GMT+02:00)	[thread overview]
Message-ID: <5532eba.f5bd40a9.18144fbad50@tnonline.net> (raw)
In-Reply-To: <cf220242-25dd-9241-01d5-38555c262d9e@gmail.com>



---- From: Andrei Borzenkov <arvidjaar@gmail.com> -- Sent: 2022-06-08 - 16:11 ----

> On 08.06.2022 13:32, Qu Wenruo wrote:
>> 
>> 
>> On 2022/6/8 18:15, Wang Yugui wrote:
>>> Hi, Forza, Qu Wenruo
>>>
>>> I write a script to test RAID1 split brain base on Qu's work of raid5(*1)
>>> *1: https://lore.kernel.org/linux-btrfs/53f7bace2ac75d88ace42dd811d48b7912647301.1654672140.git.wqu@suse.com/T/#u
>> 
>> No no no, that is not to address split brain, but mostly to drop cache
>> for recovery path to maximize the chance of recovery.
>> 
>> It's not designed to solve split brain problem at all, it's just one
>> case of such problem.
>> 
>> In fact, fully split brain (both have the same generation, but
>> experienced their own degraded mount) case can not be solved by btrfs
>> itself at all.
>> 
>> Btrfs can only solve partial split brain case (one device has higher
>> generation, thus btrfs can still determine which copy is the correct one).
>> 
> 
> Start with both devices having the same generation number N.
> 
> Mount device1 separately, do some writes, device has generation N+1.
> 
> Mount device2 separately, do some writes, device has generation N+2.
> 
> Applying changes between N+1 and N+2 to device1 is wrong because content
> of N+1 is different on both devices.
> 
> So there is absolutely no difference between "same generation" and
> "higher generation".
> 
> The only thing btrfs could do is to try to detect this and refuse to
> integrate another device. One suggested rather radical approach was to
> change UUID on degraded mount, but this is probably unfeasible in real life.
> 
> Removing missing device from device list in superblock (or at least
> marking it as permanently missing until replaced) is probably another
> option.
> 
> And write intent log as discussed further in this thread could be used
> as well - if btrfs detects write intent log on device it should refuse
> to add it to existing filesystem.


Thank you all for the feedback on this topic. 

My take away from this is that we need to;
* clearly document the Btrfs cannot handle full split brain 
* document best practices on how to recover from a temporary device loss 
* add features that help btrfs handle both types of split brain in a safe way. 

One such mechanism could be to write a simple hash on all existing devices during a degraded mount. If the old device is re-attached, the hash will mismatch and btrfs should reject this device.

Then a special 'btrfs device add --reattach' command could be developed that would do what Qu said, run a full scrub and correct any differences on the reattached device. This avoids writing all data again, which is faster and saves TBW.

Nicholas suggested a similar approach in another thread https://lore.kernel.org/linux-btrfs/87sfogkwbd.fsf@DigitalMercury.freeddns.org/T/#t

Thanks,
Forza


      reply	other threads:[~2022-06-08 20:22 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-29 11:34 What mechanisms protect against split brain? Forza
2022-06-08  2:44 ` Wang Yugui
2022-06-08 10:15   ` Wang Yugui
2022-06-08 10:32     ` Qu Wenruo
2022-06-08 10:58       ` Wang Yugui
2022-06-08 11:19         ` Qu Wenruo
2022-06-08 11:55           ` Wang Yugui
2022-06-08 11:59             ` Qu Wenruo
2022-06-08 11:40       ` Austin S. Hemmelgarn
2022-06-08 14:11       ` Andrei Borzenkov
2022-06-08 20:22         ` Forza [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5532eba.f5bd40a9.18144fbad50@tnonline.net \
    --to=forza@tnonline.net \
    --cc=arvidjaar@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nsteeves@gmail.com \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=wangyugui@e16-tech.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox