From: Anand Jain <anand.jain@oracle.com>
To: dsterba@suse.cz
Cc: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH RFC] btrfs-progs: btrfstune -m|M remove 2-stage commit
Date: Thu, 21 Sep 2023 07:01:52 +0800 [thread overview]
Message-ID: <8c6110f9-b338-492c-01a8-d831507b9f16@oracle.com> (raw)
In-Reply-To: <20230918223856.GR2747@twin.jikos.cz>
On 19/09/2023 06:38, David Sterba wrote:
> On Sat, Aug 19, 2023 at 07:14:40PM +0800, Anand Jain wrote:
>>
>>
>> On 18/08/2023 17:27, Qu Wenruo wrote:
>>>
>>>
>>> On 2023/8/18 17:21, Anand Jain wrote:
>>>> On 17/08/2023 19:52, David Sterba wrote:
>>>>> On Tue, Aug 15, 2023 at 08:53:24PM +0800, Anand Jain wrote:
>>>>>> The btrfstune -m|M operation changes the fsid and preserves the
>>>>>> original
>>>>>> fsid in the metadata_uuid.
>>>>>>
>>>>>> Changing the fsid happens in two commits in the function
>>>>>> set_metadata_uuid()
>>>>>> - Stage 1 commits the superblock with the CHANGING_FSID_V2 flag.
>>>>>> - Stage 2 updates the fsid in fs_info->super_copy (local memory),
>>>>>> resets the CHANGING_FSID_V2 flag (local memory),
>>>>>> and then calls commit.
>>>>>>
>>>>>> The two-stage operation with the CHANGING_FSID flag makes sense for
>>>>>> btrfstune -u|U, where the fsid is changed on each and every tree block,
>>>>>> involving many commits. The CHANGING_FSID flag helps to identify the
>>>>>> transient incomplete operation.
>>>>>>
>>>>>> However, for btrfstune -m|M, we don't need the CHANGING_FSID_V2
>>>>>> flag, and
>>>>>> a single commit would have been sufficient. The original git commit
>>>>>> that
>>>>>> added it, 493dfc9d1fc4 ("btrfs-progs: btrfstune: Add support for
>>>>>> changing
>>>>>> the metadata uuid"), provides no reasoning or did I miss something?
>>>>>> (So marking this patch as RFC).
>>>>>
>>>>> I remember discussions with Nikolay about the corner cases that could
>>>>> happen due to interrupted conversion and there was a document explaining
>>>>> that. Part of that ended up in kernel in the logic to detect partially
>>>>> enabled metadata_uuid. So there was reason to do it in two phases but
>>>>> I'd have to look it up in mails or if I still have a link or copy of the
>>>>> document.
>>>>
>>>>
>>>> On 18/08/2023 08:21, Qu Wenruo wrote:
>>>>
>>>> > Oh, my memory comes back, the original design for the two stage
>>>> > commitment is to avoid split brain cases when one device is committed
>>>> > with the new flag, while the remaining one doesn't.
>>>> >
>>>> > With the extra stage, even if at stage 1 or 2 the transaction is
>>>> > interrupted and only one device got the new flag, it can help us to
>>>> > locate the stage and recover.
>>>>
>>>> It is useful for `btrfstune -u`
>>>> when there are many transaction commits to write. It uses the
>>>> `CHANGING_FSID` flag for this purpose. Any device with the
>>>> `CHANGING_FSID` flag fails to mount, and `btrfstune` should be called
>>>> again to continue rewrite the new FSID. This is a fair process.
>>>>
>>>>
>>>> However, in the case of `CHANGING_FSID_V2`, which the `btrfstune -m`
>>>> command uses, only one transaction is required. How does this help?
>>>>
>>>> Disk1 Disk2
>>>>
>>>> Commit1 CHANGING_FSID_V2 CHANGING_FSID_V2
>>>> Commit2 new-fsid new-fsid
>>>>
>>>>
>>>>
>>>
>>> Instead if there is only one transaction to enable metadata_uuid
>>> feature, we can have the following situation where for the single
>>> transaction we only committed on disk 1:
>>>
>>> Disk 1 Disk 2
>>> Meta uuid Regular UUID
>>>
>>> Then how do kernel/progs proper recover from this situation?
>>>
>>> Although I'd say, it's still possible to recover, but significantly more
>>> difficult, as we need to properly handle different generations of super
>>> blocks.
>>>
>>> For the two stage one, it's a little simpler but I'm not sure if we have
>>> the extra handling. E.g. if commit 1 failed partially:
>>>
>>> Disk 1 Disk 2
>>> Changing_fsid_v2 no changing_fsid_v2
>>>
>>> In this case, we can detects we're trying to start fsid change using
>>> metadata uuid.
>>>
>>> The same thing for the 2nd commit.
>>
>>
>>
>> The changing_fsid_v2 flag an unnecessary overhead, right? As it could be
>> something like:
>>
>> . Write new-fsid and Verify. Revert if needed and verify.
>>
>>
>> ------
>> Summarizing the overall patches:
>>
>> Patchset [1] is a port of kernel changing_fsid_v2 recovery automation
>> functions to the progs. So, hosts with older progs and can't upgrade to
>> find a tmp host with the newer progs to fix the disks?
>>
>> [1] [PATCH 00/16] btrfs-progs: recover from failed metadata_uuid
>>
>>
>> Automation in progs/kernel cannot fix all possible scenarios; we still
>> need user intervention to reunite, as in patchset [2]. It adds --device
>> and --noscan options to reunite. (Needs comments, so that I can rebase).
>>
>> [2] [PATCH 00/10] btrfs-progs: check and tune: add device and noscan
>> options
>>
>>
>> Patchset [3] removes the changing_fsid_v2 recovery code from the kernel,
>> as pointed out- recovery code is robust in general, except in corner
>> cases. So, after this, similar to changing_fsid, disks with
>> changing_fsid_v2 are rejected.
>
> Hm I think this is acceptable though degrading the feature a bit. We
> can't tell how often the kernel recovery of the metadata_uuid has
> happened but fixing all cases there might be too complex, more than it
> already is. The whole conversion in btrfstune is quite fast, crashing in
> the middle of that is possible but highly unlikely.
>
> We don't have a proper documentation for the metadata_uuid use case,
> there are the option and sysfs descriptions but not really how it's
> supposed to be used and what to do if setting the metadata_uuid fails.
>
>> But when progs can recover the failed situation and could remove the use
>> of the changing_fsid_v2 flag (patch [4]), we don't actually need the
>> recovery in the kernel.
>
> Ok, let's do it.
Done.
Kernel:
[PATCH 0/2 v2] btrfs: reject device with CHANGING_FSID_V2 flag
btrfs-progs (in the same order):
[PATCH 0/4 v4] btrfs-progs: recover from failed metadata_uuid port kernel
and
[PATCH] btrfs-progs: misc-tests/034-metadata-uuid remove kernel support
Thanks, Anand
prev parent reply other threads:[~2023-09-20 23:02 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-15 12:53 [PATCH RFC] btrfs-progs: btrfstune -m|M remove 2-stage commit Anand Jain
2023-08-16 2:17 ` Qu Wenruo
2023-08-17 11:52 ` David Sterba
2023-08-18 9:21 ` Anand Jain
2023-08-18 9:27 ` Qu Wenruo
2023-08-19 11:14 ` Anand Jain
2023-09-18 22:38 ` David Sterba
2023-09-20 23:01 ` Anand Jain [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8c6110f9-b338-492c-01a8-d831507b9f16@oracle.com \
--to=anand.jain@oracle.com \
--cc=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).