linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org, dsterba@suse.cz
Subject: Re: [PATCH RFC] btrfs-progs: btrfstune -m|M remove 2-stage commit
Date: Sat, 19 Aug 2023 19:14:40 +0800	[thread overview]
Message-ID: <df569834-1949-5c1e-dd99-8da105a10b38@oracle.com> (raw)
In-Reply-To: <ef88fc2a-b845-4637-b006-43ecc511d9fd@suse.com>



On 18/08/2023 17:27, Qu Wenruo wrote:
> 
> 
> On 2023/8/18 17:21, Anand Jain wrote:
>> On 17/08/2023 19:52, David Sterba wrote:
>>> On Tue, Aug 15, 2023 at 08:53:24PM +0800, Anand Jain wrote:
>>>> The btrfstune -m|M operation changes the fsid and preserves the 
>>>> original
>>>> fsid in the metadata_uuid.
>>>>
>>>> Changing the fsid happens in two commits in the function
>>>> set_metadata_uuid()
>>>> - Stage 1 commits the superblock with the CHANGING_FSID_V2 flag.
>>>> - Stage 2 updates the fsid in fs_info->super_copy (local memory),
>>>>    resets the CHANGING_FSID_V2 flag (local memory),
>>>>    and then calls commit.
>>>>
>>>> The two-stage operation with the CHANGING_FSID flag makes sense for
>>>> btrfstune -u|U, where the fsid is changed on each and every tree block,
>>>> involving many commits. The CHANGING_FSID flag helps to identify the
>>>> transient incomplete operation.
>>>>
>>>> However, for btrfstune -m|M, we don't need the CHANGING_FSID_V2 
>>>> flag, and
>>>> a single commit would have been sufficient. The original git commit 
>>>> that
>>>> added it, 493dfc9d1fc4 ("btrfs-progs: btrfstune: Add support for 
>>>> changing
>>>> the metadata uuid"), provides no reasoning or did I miss something?
>>>> (So marking this patch as RFC).
>>>
>>> I remember discussions with Nikolay about the corner cases that could
>>> happen due to interrupted conversion and there was a document explaining
>>> that. Part of that ended up in kernel in the logic to detect partially
>>> enabled metadata_uuid.  So there was reason to do it in two phases but
>>> I'd have to look it up in mails or if I still have a link or copy of the
>>> document.
>>
>>
>> On 18/08/2023 08:21, Qu Wenruo wrote:
>>
>>  > Oh, my memory comes back, the original design for the two stage
>>  > commitment is to avoid split brain cases when one device is committed
>>  > with the new flag, while the remaining one doesn't.
>>  >
>>  > With the extra stage, even if at stage 1 or 2 the transaction is
>>  > interrupted and only one device got the new flag, it can help us to
>>  > locate the stage and recover.
>>
>> It is useful for `btrfstune -u`
>> when there are many transaction commits to write. It uses the
>> `CHANGING_FSID` flag for this purpose. Any device with the
>> `CHANGING_FSID` flag fails to mount, and `btrfstune` should be called
>> again to continue rewrite the new FSID. This is a fair process.
>>
>>
>> However, in the case of `CHANGING_FSID_V2`, which the `btrfstune -m` 
>> command uses, only one transaction is required. How does this help?
>>
>>                  Disk1              Disk2
>>
>> Commit1     CHANGING_FSID_V2   CHANGING_FSID_V2
>> Commit2     new-fsid           new-fsid
>>
>>
>>
> 
> Instead if there is only one transaction to enable metadata_uuid 
> feature, we can have the following situation where for the single 
> transaction we only committed on disk 1:
> 
>      Disk 1        Disk 2
>      Meta uuid    Regular UUID
> 
> Then how do kernel/progs proper recover from this situation?
> 
> Although I'd say, it's still possible to recover, but significantly more 
> difficult, as we need to properly handle different generations of super 
> blocks.
> 
> For the two stage one, it's a little simpler but I'm not sure if we have 
> the extra handling. E.g. if commit 1 failed partially:
> 
>      Disk 1            Disk 2
>      Changing_fsid_v2    no changing_fsid_v2
> 
> In this case, we can detects we're trying to start fsid change using 
> metadata uuid.
> 
> The same thing for the 2nd commit.



The changing_fsid_v2 flag an unnecessary overhead, right? As it could be 
something like:

  . Write new-fsid and Verify. Revert if needed and verify.


------
Summarizing the overall patches:

Patchset [1] is a port of kernel changing_fsid_v2 recovery automation 
functions to the progs. So, hosts with older progs and can't upgrade to 
find a tmp host with the newer progs to fix the disks?

   [1] [PATCH 00/16] btrfs-progs: recover from failed metadata_uuid


Automation in progs/kernel cannot fix all possible scenarios; we still 
need user intervention to reunite, as in patchset [2]. It adds --device 
and --noscan options to reunite. (Needs comments, so that I can rebase).

   [2] [PATCH 00/10] btrfs-progs: check and tune: add device and noscan 
options


Patchset [3] removes the changing_fsid_v2 recovery code from the kernel, 
as pointed out- recovery code is robust in general, except in corner 
cases. So, after this, similar to changing_fsid, disks with 
changing_fsid_v2 are rejected.
But when progs can recover the failed situation and could remove the use 
of the changing_fsid_v2 flag (patch [4]), we don't actually need the 
recovery in the kernel.

   [3] [PATCH] btrfs: reject device with CHANGING_FSID_V2

   [4] [PATCH RFC] btrfs-progs: btrfstune -m|M remove 2-stage commit
-------

Thanks,
Anand

  reply	other threads:[~2023-08-19 11:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-15 12:53 [PATCH RFC] btrfs-progs: btrfstune -m|M remove 2-stage commit Anand Jain
2023-08-16  2:17 ` Qu Wenruo
2023-08-17 11:52 ` David Sterba
2023-08-18  9:21   ` Anand Jain
2023-08-18  9:27     ` Qu Wenruo
2023-08-19 11:14       ` Anand Jain [this message]
2023-09-18 22:38         ` David Sterba
2023-09-20 23:01           ` Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df569834-1949-5c1e-dd99-8da105a10b38@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).