Re: scrub: unrepaired sectors detected

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Stefan N <stefannnau@gmail.com>, linux-btrfs@vger.kernel.org
Subject: Re: scrub: unrepaired sectors detected
Date: Wed, 6 Dec 2023 06:35:01 +1030	[thread overview]
Message-ID: <8ba85386-fb30-415e-8ef1-05dcaf833c26@gmx.com> (raw)
In-Reply-To: <CA+W5K0r4Jkhwm2ztJYwKQ1w91Cb0tObcd4PA6bLDOH18xbmYAg@mail.gmail.com>



On 2023/12/5 18:21, Stefan N wrote:
> Hi all,
>
> I'm having trouble getting an array to perform a scrub or replace, and
> would appreciate any assistance. I have two empty disks I can use to
> move things around, but the intended outcome is to use them to replace
> two of the smaller disks.
>
> $ uname -a ; btrfs --version ; btrfs fi show
> Linux $hostname 6.5.0-13-generic #13-Ubuntu SMP PREEMPT_DYNAMIC Fri
> Nov  3 12:16:05 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
> btrfs-progs v6.3.2
> Label: none  uuid: 3cde0d85-f53e-4db6-ac2c-a0e6528c5ced
>          Total devices 8 FS bytes used 71.32TiB
>          devid    1 size 16.37TiB used 16.37TiB path /dev/sdg
>          devid    2 size 10.91TiB used 10.91TiB path /dev/sdf
>          devid    3 size 16.37TiB used 16.36TiB path /dev/sdd
>          devid    4 size 16.37TiB used 12.54TiB path /dev/sda
>          devid    5 size 10.91TiB used 10.91TiB path /dev/sde
>          devid    6 size 10.91TiB used 10.91TiB path /dev/sdc
>          devid    7 size 16.37TiB used 16.37TiB path /dev/sdh
>          devid    8 size 10.91TiB used 10.91TiB path /dev/sdb
>
> $ btrfs fi df /mnt/point/
> Data, RAID6: total=71.97TiB, used=71.23TiB
> System, RAID1C3: total=36.00MiB, used=6.62MiB
> Metadata, RAID1C3: total=91.00GiB, used=85.09GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> $
>
> Attempting to scrub
> BTRFS error (device sdg): unrepaired sectors detected, full stripe
> 145926853230592 data stripe 2 errors 5-13

This is introduced in recent kernels, to detect full stripe RAID56
stripes which contains sectors which can not be repaired.

This is pretty new behavior as an extra safenet, as sometimes such scrub
itself can further corrupt the P/Q stripes and cause unrepairable sectors.

And I'm afraid that's already the case here.
Older RAID56 code (and even the newer one) still has the old write-hole
problem, thus previous power loss can reduce the redundancy and
eventually lead to data corruption.

Newer scrub code is addressing this by detecting and error out, other
than further spreading the corruption.
> BTRFS info (device sdg): scrub: not finished on devid 2 with status: -5
>
> Scrub device /dev/sdf (id 2) canceled
> Scrub started:    Thu Nov 30 08:01:03 2023
> Status:           aborted
> Duration:         32:17:10
>          data_extents_scrubbed: 89766644
>          tree_extents_scrubbed: 0
>          data_bytes_scrubbed: 5856020676608
>          tree_bytes_scrubbed: 0
>          read_errors: 0
>          csum_errors: 0
>          verify_errors: 0
>          no_csum: 0
>          csum_discards: 0
>          super_errors: 0
>          malloc_errors: 0
>          uncorrectable_errors: 0
>          unverified_errors: 0
>          corrected_errors: 0
>          last_physical: 7984173809664
>
> Attempting to do replace using brand new disks, failed at ~50%, ran
> twice with two different pairs of disks
> Disk /dev/sdi: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors
> Disk /dev/sdl: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors
>
> BTRFS error (device sdg): unrepaired sectors detected, full stripe
> 145926853230592 data stripe 2 errors 5-13
> BTRFS error (device sdg): btrfs_scrub_dev(/dev/sdf, 2, /dev/sdl) failed -5
>
> The data is fairly replaceable so typically have been previously been
> deleting files that fail checks and performing roughly 3-monthly
> scrubs and weekly balances (musage/dusage=50).

This can be something happened in the past but only caught by newer kernel.

Anyway if you're fine to delete some files (only 9 sectors affected),
you can try to locate the inodes for the following bytenr range:

  [145926853382144, 145926853414912]

The way to go is using "btrfs logical-resolve -o <bytenr> <mnt>".

And delete all the involved files, increase the bytenr by 4k, try again
until no more output for every 4K block in above range.

Normally it should only be one or two files.

Then retry scrub, re-do the loop until the scrub can finish properly.

Thanks,
Qu

>
> Any help would be appreciated!
>
> Cheers,
>
> Stefan
>

next prev parent reply	other threads:[~2023-12-05 20:05 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-05  7:51 scrub: unrepaired sectors detected Stefan N
2023-12-05 20:05 ` Qu Wenruo [this message]
2023-12-09  1:50   ` Stefan N
2023-12-09  5:25     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ba85386-fb30-415e-8ef1-05dcaf833c26@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=stefannnau@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox