Re: BUG: scrub reports uncorrectable csum errors linked to readable file (data: single)

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

From: Lionel Bouton <lionel-subscription@bouton.name>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>, linux-btrfs@vger.kernel.org
Subject: Re: BUG: scrub reports uncorrectable csum errors linked to readable file (data: single)
Date: Sun, 9 Jun 2024 02:16:29 +0200	[thread overview]
Message-ID: <e6bfb0b9-dd9f-46d5-ae77-c90047a4cb04@bouton.name> (raw)
In-Reply-To: <41a2a18f-6937-400a-b34b-c89b946535e1@gmx.com>

Le 09/06/2024 à 00:48, Qu Wenruo a écrit :
>
>
> 在 2024/6/9 01:45, Lionel Bouton 写道:
>> Hi,
>>
>> To keep this short I've removed most of past exchanges as this is just
>> to keep people following this thread informed on my progress.
>>
>> Le 07/06/2024 à 01:46, Lionel Bouton a écrit :
>>>
>>> [...]
>>>>> I briefly considered doing just that... but then I found out that the
>>>>> scrub errors were themselves in error and the on disk data was 
>>>>> matching
>>>>> the checksums. When I tried to read the file not only didn't the
>>>>> filesystem report an IO error (if I'm not mistaken it should if the
>>>>> csum
>>>>> doesn't match) but the file content matched the original file fetched
>>>>> from its source.
>>>>
>>>> Got it, this is really weird now.
>>>>
>>>> What scrub doing is read the data from disk (without bothering page
>>>> cache), and verify against checksums.
>>>>
>>>> Would it be possible to run "btrfs check --check-data-csum" on the
>>>> unmounted/RO mounted fs?
>>>
>>> Yes with some caveats as the scrub takes approximately a week to
>>> complete and I can't easily stop the services on this system for a 
>>> week.
>>> The block device is RBD on Ceph, so what I can do is take a block
>>> level snapshot, map this snapshot to another system and run btrfs
>>> check --check-data-csum there. If the IO is the same than btrfs scrub
>>> this will probably take between 3 to 5 days to complete. I'll have to
>>> run this on another VM with the same kernel and btrfs-progs versions,
>>> BTRFS doesn't like having 2 devices showing up with the same internal
>>> identifiers...
>>>
>>>>
>>>> That would output the error for each corrupted sector (without
>>>> ratelimit), so that you can record them all.
>>>> And try to do logical-resolve to find each corrupted location?
>>>>
>>>> If btrfs check reports no error, it's 100% sure scrub is to blamce.
>>>>
>>>> If btrfs check reports error, and logical-resolve failed to locate the
>>>> file and its position, it means the corruption is in bookend exntets.
>>>>
>>>> If btrfs check reports error and logical-resolve can locate the 
>>>> file and
>>>> position, it's a different problem then.
>>>
>>> OK. I understand. This is time for me to go to sleep, but I'll work on
>>> this tomorrow. I'll report as soon as check-data-sum finds something
>>> or at the end in several days if it didn't.
>>
>> There is a bit of a slowdown. btrfs check was killed a couple hours ago
>> (after running more than a day) by the OOM killer. I anticipated that it
>> would need large amounts of memory (see below for the number of
>> files/dirs/subvolumes) and started it on a VM with 32GB but it wasn't
>> enough. It stopped after printing: "[4/7] checking fs roots".
>>
>> I restarted btrfs check --check-data-csum after giving 64GB of RAM to
>> the VM hoping this will be enough.
>> If it doesn't work and the oom-killer still is triggered I'll have to
>> move other VMs around and the realistic maximum I can give the VM used
>> for runing the btrfs check is ~200GB.
>
> That's why we have --mode=lowmem exactly for that reason.
>
> But please be aware that, the low memory usage is traded off by doing a
> lot of more IO.
>

After the OOM I remembered reading discussions about the lowmem mode but 
even then I wasn't sure how to know at which point it would be needed : 
on one hand 32GB seems like a lot for a <20TB filesystem but this 
filesystem has both many files and snapshots.

The metadata uses 75.4GB and I assume it is for the 2 copies of the dup 
profile. If most of the RAM used by check is a copy of the metadata, 
64GB should be enough this time.

The process is using 35.4G at the "checking fs roots" step and it has 
been so for several hours now.

Thanks for your patience,
Lionel

> Thanks,
> Qu
>>
>> If someone familiar with btrfs check can estimate how much RAM is
>> needed, here is some information that might be relevant:
>> - according to the latest estimations there should be a total of around
>> 50M files and 2.5M directories in the 3 main subvolumes on this 
>> filesystem.
>> - for each of these 3 subvolumes there should be approximately 30
>> snapshots.
>>
>> Here is the filesystem usage output:
>> Overall:
>>      Device size:                  40.00TiB
>>      Device allocated:             31.62TiB
>>      Device unallocated:            8.38TiB
>>      Device missing:                  0.00B
>>      Device slack:                 12.00TiB
>>      Used:                         18.72TiB
>>      Free (estimated):             20.32TiB      (min: 16.13TiB)
>>      Free (statfs, df):            20.32TiB
>>      Data ratio:                       1.00
>>      Metadata ratio:                   2.00
>>      Global reserve:              512.00MiB      (used: 0.00B)
>>      Multiple profiles:                  no
>>
>> Data,single: Size:30.51TiB, Used:18.57TiB (60.87%)
>>     /dev/sdb       30.51TiB
>>
>> Metadata,DUP: Size:565.50GiB, Used:75.40GiB (13.33%)
>>     /dev/sdb        1.10TiB
>>
>> System,DUP: Size:8.00MiB, Used:3.53MiB (44.14%)
>>     /dev/sdb       16.00MiB
>>
>> Unallocated:
>>     /dev/sdb        8.38TiB
>>
>>
>> Best regards,
>> Lionel
>

next prev parent reply	other threads:[~2024-06-09  0:16 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-04 14:12 BUG: scrub reports uncorrectable csum errors linked to readable file (data: single) Lionel Bouton
2024-06-06 22:51 ` Lionel Bouton
2024-06-06 23:05 ` Qu Wenruo
2024-06-06 23:21   ` Lionel Bouton
2024-06-06 23:30     ` Qu Wenruo
2024-06-06 23:46       ` Lionel Bouton
2024-06-08 16:15         ` Lionel Bouton
2024-06-08 22:48           ` Qu Wenruo
2024-06-09  0:16             ` Lionel Bouton [this message]
2024-06-10 12:52               ` Lionel Bouton
2024-06-18 21:45       ` Lionel Bouton
2024-06-22  8:51         ` Lionel Bouton
2024-06-22  9:41           ` Qu Wenruo
2024-06-30 10:59             ` Lionel Bouton
2024-07-04 12:21               ` Lionel Bouton
2024-07-04 22:38                 ` Qu Wenruo
2024-07-04 22:49                   ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e6bfb0b9-dd9f-46d5-ae77-c90047a4cb04@bouton.name \
    --to=lionel-subscription@bouton.name \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox