Re: [PATCH] btrfs: scrub: expand scrub block size for data range scrub

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: li zhang <zhanglikernel@gmail.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] btrfs: scrub: expand scrub block size for data range scrub
Date: Tue, 15 Nov 2022 18:37:48 +0800	[thread overview]
Message-ID: <a4ebbed4-f75f-16d3-34d0-838d73d031f9@gmx.com> (raw)
In-Reply-To: <11a71790-de79-3c2f-97f3-b97305b99378@gmx.com>

[Adding the mailing list, as the reply is to the first mail I got, which 
lacks the ML]

On 2022/11/15 06:39, Qu Wenruo wrote:
[...]
>>>
>>> And I hope it can also unify the error accounting of data and metadata.
>>>
>>
>> Because only one checksum is calculated for all sectors of the
>> metadata block.
>>
>> In this case the checksum count is ambiguous if
>> checksum_error means sector error, it should count all sectors, but if
>> checksum_error indicates that the checksum does not match and can only be
>> incremented by 1.

In fact, this is the problem of the existing scrub interface.

The scrub_process only reports an very ambiguous csum mismatch count.

If scrub reports 4 csum errors, you can not distinguish if it's 4 tree 
blocks csum mismatch or 4 data sectors.

Thus in my scrub_fs proposal, I do the error reports separately.
Furthermore unify the error report to use bytes.

That's to say, if a metadata (16K by default) has csum mismatch, it will 
report 16K bytes has csum mismatch for metadata.


>>
>> Also, since the metadata data block is a complete block, it cannot be
>> Fix sector by sector, the patch doesn't take this case into account.
>> So the patch still has bugs.

That doesn't matter. What end users really want is, how many bytes are 
affected, which is generic between data and metadata.

Not some ambiguous standard which changes according to if it's a 
metadata or data.

>>
>>> Currently for metadata csum mismatch, we only report one csum error even
>>> if the metadata is 4 sectors.
>>> While for data, we always report the number of corrupted sectors.
>>>
>>>>
>>>> The function enters the wrong scrub_block, and
>>>> the overall process is as follows
>>>>
>>>> 1) Check the scrub_block again, check again if the error is gone.
>>>>
>>>> 2) Check the corresponding mirror scrub_block, if there is no error,
>>>> Fix bad sblocks with mirror scrub_block.
>>>>
>>>> 3) If no error-free scrub_block is found, repair it sector by sector.
>>>>
>>>> One difficulty with this function is rechecking the scrub_block.
>>>>
>>>> Imagine this situation, if a sector is checked the first time
>>>> without errors, butthe recheck returns an error.
>>>
>>> If you mean the interleaved corrupted sectors like the following:
>>>               0 1 2 3
>>>    Mirror 1: |X| |X| |
>>>    Mirror 2: | |X| |X|
>>>
>>> Then it's pretty simple, when doing re-check, we should only bother the
>>> one with errors.
>>> You do not always treat the scrub_block all together.
>>>
>>> Thus if you're handling mirror 1, then you should only need to fix
>>> sector 0 and sector 2, not bothering fixing the good sectors (1 and 3).
>>>
>>
>> Consider the following
>>
>> The results of the first checking are as follows:
> 
> We're talking about "X" = corrupted sector right?
> 
>>             0 1 2 3
>>    Mirror 1: |X| |X| |
>>    Mirror 2: | |X| | |
>>
>> The result of the recheck is:
>>             0 1 2 3
>>    Mirror 1: |X| |X|X|
>>    Mirror 2: | |X| | |
>> An additional error was reported, what should we do,
>> recheck (which means it would recheck twice or more)?
>> Or just check only bad sectors during recheck.
> 
> Of course we should only repair the corrupted sectors.
> Why bother the already good sectors?
> 
> The csum error report should on cover the initial mirror we're checking.
> The re-check thing is only to make sure after repair, the repaired 
> sectors are good.
> 
> So the csum error accounting should be the original corrupted sector 
> number.

In fact, since my RAID56 work almost finished, I guess I should spend 
more time on integrating the scrub_fs ideas into the existing scrub code.
(Other than just introduce a new interface from scratch)

I'll try to craft a PoC patchset to a stripe by stripe verification (to 
get rid of the complex bio form shaping code), and a proper bitmap based 
verification and repair (to only repair the corrupted sectors).

But as I mentioned above, the bad csum error reporting can not be easily 
fixed without a behavior change on btrfs_scrub_progress results.

Thanks,
Qu
> 
>>
>>>
>>>> What should
>>>> we do, this patch only fixes the bug that the sector first
>>>> appeared (As in the case where the scrub_block
>>>> contains only one scrub_sector).
>>>>
>>>> Another reason to only handle the first error is,
>>>> If the device goes bad, the recheck function will report more and
>>>> more errors,if we want to deal with the errors in the recheck,
>>>> you need to recheck again and again, which may lead to
>>>> Stuck in scrub_handle_errored_block for a long time.
>>>
>>> Taking longer time is not a problem, compared to corrupted data.
>>>
>>> Although I totally understand that the existing scrub is complex in its
>>> design, that's exactly why I'm trying to implement a better scrub_fs
>>> interface:
>>>
>>> https://lwn.net/Articles/907058/
> 
> Again, all of the behavior I mentioned can be found in above patchset.
> 
> But unfortunately I don't have a good way to apply them all to the 
> existing scrub infrastructure without such a full rework...
> 
> Thanks,
> Qu
> 
>>>
>>> RAID56 has a similiar problem until recent big refactor, changing it to
>>> a more streamlined flow.
>>>
>>> But the per-sector repair is still there, you can not avoid that, no
>>> matter if scrub_block contains single or multiple sectors.
>>> (Although single sector scrub_block make is much easier)
>>>
>>> [...]
>>>> @@ -1054,7 +1056,8 @@ static int scrub_handle_errored_block(struct 
>>>> scrub_block *sblock_to_check)
>>>>                if (ret == -ENOMEM)
>>>>                        sctx->stat.malloc_errors++;
>>>>                sctx->stat.read_errors++;
>>>> -             sctx->stat.uncorrectable_errors++;
>>>> +             sctx->stat.uncorrectable_errors += 
>>>> scrub_get_sblock_checksum_error(sblock_to_check);
>>>> +             sctx->stat.uncorrectable_errors += 
>>>> sblock_to_check->header_error;
>>>
>>> Do we double accout the header_error for metadata?
>>>
>>> Thanks,
>>> Qu

next prev parent reply	other threads:[~2022-11-15 10:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-13 15:35 [PATCH] btrfs: scrub: expand scrub block size for data range scrub Li Zhang
2022-11-13 15:37 ` li zhang
2022-11-13 22:37   ` Qu Wenruo
2022-11-13 22:58 ` Qu Wenruo
2022-11-14 14:52   ` li zhang
     [not found]   ` <CAAa-AGmQpL34eG8yx3bg8FYcbbOOjb3o8fb5YEocRbRPH1=NBw@mail.gmail.com>
     [not found]     ` <11a71790-de79-3c2f-97f3-b97305b99378@gmx.com>
2022-11-15 10:37       ` Qu Wenruo [this message]
2022-11-15 18:29         ` li zhang
2022-11-15 22:57           ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a4ebbed4-f75f-16d3-34d0-838d73d031f9@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=zhanglikernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).