Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Lionel Bouton <lionel-subscription@bouton.name>,
	linux-btrfs@vger.kernel.org
Subject: Re: BUG: scrub reports uncorrectable csum errors linked to readable file (data: single)
Date: Fri, 5 Jul 2024 08:19:31 +0930	[thread overview]
Message-ID: <52ea9f1f-ff91-402c-b997-ec08200ff049@gmx.com> (raw)
In-Reply-To: <2650d27a-5127-4ec9-b62f-ec1683d0cecf@gmx.com>



在 2024/7/5 08:08, Qu Wenruo 写道:
>
>
> 在 2024/7/4 21:51, Lionel Bouton 写道:
>> Le 30/06/2024 à 12:59, Lionel Bouton a écrit :
>>> Le 22/06/2024 à 11:41, Qu Wenruo a écrit :
>>>>
>>>>
>>>> 在 2024/6/22 18:21, Lionel Bouton 写道:
>>>> [...]
>>>>>>
>>>>>> I'll mount the filesystem and run a scrub again to see if I can
>>>>>> reproduce the problem. It should be noticeably quicker, we made
>>>>>> updates to the Ceph cluster and should get approximately 2x the I/O
>>>>>> bandwidth.
>>>>>> I plan to keep the disk snapshot for at least several weeks so if you
>>>>>> want to test something else just say so.
>>>>>
>>>>>
>>>>> The scrub is finished, here are the results :
>>>>>
>>>>> UUID: 61e86d80-d6e4-4f9e-a312-885194c5e690
>>>>> Scrub started:    Wed Jun 19 00:01:59 2024
>>>>> Status:           finished
>>>>> Duration:         81:04:21
>>>>> Total to scrub:   18.83TiB
>>>>> Rate:             67.67MiB/s
>>>>> Error summary:    no errors found
>>>>>
>>>>> So the scrub error isn't deterministic. I'll shut down the test VM for
>>>>> now and keep the disk snapshot it uses for at least a couple of
>>>>> week if
>>>>> it is needed for further tests.
>>>>> The original filesystem is scrubbed monthly, I'll reply to this
>>>>> message
>>>>> if another error shows up.
>>>>
>>>> I briefly remembered that there was a bug related to scrub that can
>>>> report false alerts:
>>>>
>>>> f546c4282673 ("btrfs: scrub: avoid use-after-free when chunk length is
>>>> not 64K aligned")
>>>>
>>>> But that should be automatically backported, and in that case it should
>>>> have some errors like "unable to find chunk map" error messages in the
>>>> kernel log.
>>>>
>>>> Otherwise, I have no extra clues.
>>>>
>>>> Have you tried kernels like v6.8/6.9 and can you reproduce the bug in
>>>> those newer kernels?
>>>
>>> I've just upgraded the kernel to 6.9.7 (and btrfs-progs to 6.9.2) and
>>> monthly scrubs with it will start next week. That said the last
>>> filesystem scrub with 6.6.30 ran without errors so it might be hard to
>>> reproduce.
>>> One difference with the last scrub vs the previous one which reported
>>> checksum errors is the underlying device speed : it is getting faster
>>> as we replace HDDs with SSDs on the Ceph cluster (it might be a cause
>>> if there's a race condition somewhere). Other than that there's
>>> nothing I can think of.
>>>
>>> In fact the only 2 major changes before the scrub checksum errors
>>> where :
>>> - a noticeable increase in constant I/O load,
>>> - an upgrade to the 6.6 kernel.
>>>
>>> As nobody else reported the same behavior I'm not ruling out an
>>> hardware glitch either.
>>> I'll reply to this thread if a future scrub reports a non reproducible
>>> checksum error again.
>>
>> I didn't expect to have something to report so soon...
>> Another virtual machine running on another physical server but using the
>> same Ceph cluster just reported csum errors that aren't reproducible.
>> This was with kernel 6.6.13 and btrfs-progs 6.8.2.
>> Fortunately this filesystem is small and can be scrubbed in 2 minutes :
>> I just ran the scrub again (less than 5 hours after the one that
>> reported errors) and no error are reported this time.
>>
>> I'll upgrade this VM to 6.9.7+ too. If 6.6 has indeed a scrub bug and
>> not 6.9 it might be easier to verify than I anticipated : most of our
>> VMs have migrated or are in the process of migrating to 6.6 which is the
>> latest LTS. If the problem manifest itself on a small filesystem too I
>> expect other systems to fail scrubs sooner or later if 6.6 is affected
>> by a scrub bug.
>
> So far it looks like it's the commit f546c4282673 ("btrfs: scrub: avoid
> use-after-free when chunk length is not 64K aligned") fixing the error.
>
> In that case, it looks like 6.6 is EOL at that time thus didn't got
> backports.

Nope, just as you mentioned 6.6 is LTS, and the last time I checked the
stable tree.

And it's already merged into 6.6.15, so it is not the case.

Let me dig deeper to find out why.

Thanks,
Qu

      reply	other threads:[~2024-07-04 22:49 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-04 14:12 BUG: scrub reports uncorrectable csum errors linked to readable file (data: single) Lionel Bouton
2024-06-06 22:51 ` Lionel Bouton
2024-06-06 23:05 ` Qu Wenruo
2024-06-06 23:21   ` Lionel Bouton
2024-06-06 23:30     ` Qu Wenruo
2024-06-06 23:46       ` Lionel Bouton
2024-06-08 16:15         ` Lionel Bouton
2024-06-08 22:48           ` Qu Wenruo
2024-06-09  0:16             ` Lionel Bouton
2024-06-10 12:52               ` Lionel Bouton
2024-06-18 21:45       ` Lionel Bouton
2024-06-22  8:51         ` Lionel Bouton
2024-06-22  9:41           ` Qu Wenruo
2024-06-30 10:59             ` Lionel Bouton
2024-07-04 12:21               ` Lionel Bouton
2024-07-04 22:38                 ` Qu Wenruo
2024-07-04 22:49                   ` Qu Wenruo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52ea9f1f-ff91-402c-b997-ec08200ff049@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lionel-subscription@bouton.name \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox