From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Anand Jain <anand.jain@oracle.com>,
linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [DOC] BTRFS Volume operations, Device Lists and Locks all in one page
Date: Fri, 13 Jul 2018 10:07:15 +0800 [thread overview]
Message-ID: <445da3ad-2bdc-696d-3e5d-a731f14a9f91@gmx.com> (raw)
In-Reply-To: <84f4c573-9831-8c2f-0370-78ea3018b569@gmx.com>
[-- Attachment #1.1: Type: text/plain, Size: 4379 bytes --]
On 2018年07月13日 08:20, Qu Wenruo wrote:
>
>
> [snip]
>>> In this case, it depends on when and how we mark the device resilvering.
>>> If we record the generation of write error happens, then just initial a
>>> scrub for generation greater than that generation.
>>
>> If we record all the degraded transactions then yes. Not just the last
>> failed transaction.
>
> The last successful generation won't be upgraded until the scrub success.
>
>>
>>> In the list, some guys mentioned that for LVM/mdraid they will record
>>> the generation when some device(s) get write error or missing, and do
>>> self cure.
>>>
>>>>
>>>> I have been scratching on fix for this [3] for some time now. Thanks
>>>> for the participation. In my understanding we are missing across-tree
>>>> parent transid verification at the lowest possible granular OR
>>>
>>> Maybe the newly added first_key and level check could help detect such
>>> mismatch?
>>>
>>>> other approach is to modify Liubo approach to provide a list of
>>>> degraded chunks but without a journal disk.
>>>
>>> Currently, DEV_ITEM::generation is seldom used. (only for seed sprout
>>> case)
>>> Maybe we could reuse that member to record the last successful written
>>> transaction to that device and do above purposed LVM/mdraid style self
>>> cure?
>>
>> Record of just the last successful transaction won't help. OR its an
>> overkill to fix a write hole.
>>
>> Transactions: 10 11 [12] [13] [14] <---- write hole ----> [19] [20]
>> In the above example
>> disk disappeared at transaction 11 and when it reappeared at
>> the transaction 19, there were new writes as well as the resilver
>> writes,
>
> Then the last good generation will be 11 and we will commit current
> transaction as soon as we find a device disappear, and won't upgrade the
> last good generation until the scrub finishes.
>
>> so we were able to write 12 13 14 and 19 20 and then
>> the disk disappears again leaving a write hole.
>
> Only if in above transactions, the auto scrub finishes, the device will
> has generation updated, or it will stay generation 11.
>
>> Now next time when
>> disk reappears the last transaction indicates 20 on both-disks
>> but leaving a write hole in one of disk.
>
> That will only happens if auto-scrub finishes in transaction 20, or its
> last successful generation will stay 11.
>
>> But if you are planning to
>> record and start at transaction [14] then its an overkill because
>> transaction [19 and [20] are already in the disk.
>
> Yes, I'm doing it overkilled.
> But it's already much better than scrub all block groups (my original plan).
Well, my idea has a major problem, that's we don't have generation for
block group item, that's to say either we use free space cache
generation or add new BLOCK_GROUP_ITEM member for generation detection.
Thanks,
Qu
>
> Thanks,
> Qu
>
>>
>> Thanks, Anand
>>
>>
>>> Thanks,
>>> Qu
>>>
>>>> [3] https://patchwork.kernel.org/patch/10403311/
>>>>
>>>> Further, as we do a self adapting chunk allocation in RAID1, it needs
>>>> balance-convert to fix. IMO at some point we have to provide degraded
>>>> raid1 chunk allocation and also modify the scrub to be chunk granular.
>>>>
>>>> Thanks, Anand
>>>>
>>>>> Any idea on this?
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>>> Unlock: btrfs_fs_info::chunk_mutex
>>>>>> Unlock: btrfs_fs_devices::device_list_mutex
>>>>>>
>>>>>> -----------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> Thanks, Anand
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> linux-btrfs" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2018-07-13 2:19 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-11 7:50 [DOC] BTRFS Volume operations, Device Lists and Locks all in one page Anand Jain
2018-07-12 5:43 ` Qu Wenruo
2018-07-12 12:33 ` Anand Jain
2018-07-12 12:59 ` Qu Wenruo
2018-07-12 16:44 ` Anand Jain
2018-07-13 0:20 ` Qu Wenruo
2018-07-13 2:07 ` Qu Wenruo [this message]
2018-07-13 5:32 ` Anand Jain
2018-07-13 5:39 ` Qu Wenruo
2018-07-13 7:24 ` Anand Jain
2018-07-13 7:41 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=445da3ad-2bdc-696d-3e5d-a731f14a9f91@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=anand.jain@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).