All of lore.kernel.org
 help / color / mirror / Atom feed
From: Russell Coker <russell@coker.com.au>
To: linux-btrfs@vger.kernel.org
Subject: strangely uncorrectable errors with RAID-5
Date: Sun, 20 Oct 2024 21:09:50 +1100	[thread overview]
Message-ID: <23840777.EfDdHjke4D@xev> (raw)

I've been testing out BTRFS RAID-5 with Debian kernels 6.10.9 and 6.11.2 from 
Unstable.

I know that RAID-5 is not expected to be good enough for real data but it 
still seemed interesting to test it as apparently there have been improvemnts 
recently.  I created some errors that SHOULD be recoverable (and are 
recoverable with RAID-1) which turned out to not be recoverable (according to 
BTRFS) even though the diff command reported that the data was intact.  Now I 
can't get the filesystem to an error-free status.

To test it I created a 4 device RAID-5 filesystem and ran the following script 
to stress it a bit:

#!/bin/bash
set -e
cd /mnt
while true ; do
  cp -r usr usr2
  cp -r usr usr3
  cp -r usr usr4
  cp -r usr usr5
  sync
  diff -ru usr usr2
  diff -ru usr usr3
  diff -ru usr usr4
  diff -ru usr usr5
  rm -rf usr?
done

Then I ran the following script to cause corruption and scrub it to see what 
happens:

#!/bin/bash
set -e
while true ; do
  for DEV in c d e f ; do
    dd if=/dev/zero of=/dev/vd$DEV oseek=$((20+$RANDOM%3*1000)) bs=1024k \ 
count=1000
    sync
    btrfs scrub start -B /mnt
    sync
  done
done

It didn't take very long before it reported problems scrubbing the filesystem 
even though the diff commands didn't report any errors.  According to diff 
that filesystem has not lost any data, but now even after rebooting I get the 
following when I run a scrub:

root@testing1:~# btrfs scrub start -B /mnt
Starting scrub on devid 1
Starting scrub on devid 2
Starting scrub on devid 3
Starting scrub on devid 4
ERROR: scrubbing /mnt failed for device id 1: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 2: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 3: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 (Input/output 
error)
scrub canceled for f8a30d07-f92e-4dfc-a62f-f49d35b70467
Scrub started:    Sun Oct 20 10:01:21 2024
Status:           aborted
Duration:         0:00:03
Total to scrub:   332.22MiB
Rate:             110.74MiB/s (some device limits set)
Error summary:    csum=4
  Corrected:      0
  Uncorrectable:  4
  Unverified:     0
root@testing1:~# btrfs scrub start -B /mnt
Starting scrub on devid 1
Starting scrub on devid 2
Starting scrub on devid 3
Starting scrub on devid 4
ERROR: scrubbing /mnt failed for device id 1: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 2: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 3: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 (Input/output 
error)
scrub canceled for f8a30d07-f92e-4dfc-a62f-f49d35b70467
Scrub started:    Sun Oct 20 10:01:27 2024
Status:           aborted
Duration:         0:00:03
Total to scrub:   332.22MiB
Rate:             110.74MiB/s (some device limits set)
Error summary:    csum=4
  Corrected:      0
  Uncorrectable:  4
  Unverified:     0

Below is some output from dmesg:

[   36.975742] BTRFS info (device vdc): bdev /dev/vdc errs: wr 0, rd 0, flush 
0, corrupt 5443, gen 0
[   36.976083] BTRFS info (device vdc): bdev /dev/vde errs: wr 0, rd 0, flush 
0, corrupt 13127, gen 0
[   36.976397] BTRFS info (device vdc): bdev /dev/vdf errs: wr 0, rd 0, flush 
0, corrupt 1412, gen 0
[   38.877364] BTRFS info (device vdc): scrub: started on devid 3
[   38.878607] BTRFS info (device vdc): scrub: started on devid 4
[   38.880468] BTRFS info (device vdc): scrub: started on devid 1
[   38.885000] BTRFS info (device vdc): scrub: started on devid 2
[   39.347569] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.350325] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.353158] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.355091] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.355786] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   39.356293] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   39.357059] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   39.357198] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.357602] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   39.359539] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.363175] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   39.364156] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   39.364813] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   39.365519] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   39.365838] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.368456] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.369461] BTRFS error (device vdc): unrepaired sectors detected, full 
stripe 411893760 data stripe 2 errors 0-3
[   39.370175] BTRFS info (device vdc): scrub: not finished on devid 4 with 
status: -5
[   41.231719] BTRFS error (device vdc): bad tree block start, mirror 1 want 
412024832 have 0
[   41.232326] BTRFS error (device vdc): bad tree block start, mirror 2 want 
412024832 have 0
[   41.232832] BTRFS error (device vdc): bad tree block start, mirror 1 want 
412024832 have 0
[   41.233470] BTRFS error (device vdc): bad tree block start, mirror 2 want 
412024832 have 0
[   41.234085] BTRFS info (device vdc): scrub: not finished on devid 1 with 
status: -5
[   41.234170] BTRFS info (device vdc): scrub: not finished on devid 2 with 
status: -5
[   41.234231] BTRFS info (device vdc): scrub: not finished on devid 3 with 
status: -5
[   44.243128] BTRFS info (device vdc): scrub: started on devid 1
[   44.243901] BTRFS info (device vdc): scrub: started on devid 2
[   44.243928] BTRFS info (device vdc): scrub: started on devid 4
[   44.244796] BTRFS info (device vdc): scrub: started on devid 3
[   44.774710] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.793802] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.797168] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.803175] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.807162] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   44.810892] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   44.811443] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   44.823205] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.823540] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   44.823544] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   44.823546] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   44.823547] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   44.823549] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   44.832155] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.838895] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.844663] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.845561] BTRFS error (device vdc): unrepaired sectors detected, full 
stripe 411893760 data stripe 2 errors 0-3
[   44.846842] BTRFS info (device vdc): scrub: not finished on devid 4 with 
status: -5
[   47.746767] BTRFS error (device vdc): bad tree block start, mirror 1 want 
412024832 have 0
[   47.748256] BTRFS error (device vdc): bad tree block start, mirror 2 want 
412024832 have 0
[   47.749069] BTRFS info (device vdc): scrub: not finished on devid 3 with 
status: -5
[   47.754752] BTRFS error (device vdc): bad tree block start, mirror 1 want 
412024832 have 0
[   47.755952] BTRFS error (device vdc): bad tree block start, mirror 2 want 
412024832 have 0
[   47.758766] BTRFS info (device vdc): scrub: not finished on devid 2 with 
status: -5
[   47.822683] BTRFS error (device vdc): bad tree block start, mirror 1 want 
412024832 have 0
[   47.826760] BTRFS error (device vdc): bad tree block start, mirror 2 want 
412024832 have 0
[   47.834688] BTRFS info (device vdc): scrub: not finished on devid 1 with 
status: -5

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/


             reply	other threads:[~2024-10-20 10:18 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-20 10:09 Russell Coker [this message]
2024-10-20 21:01 ` strangely uncorrectable errors with RAID-5 Qu Wenruo
2024-10-21  3:55   ` Russell Coker
2024-10-21  4:26     ` Qu Wenruo
2025-03-14 12:27       ` Russell Coker
2025-03-14 16:54         ` Russell Coker
2025-03-14 19:32           ` Thiago Ramon
2025-03-15  2:51             ` Russell Coker
2025-03-15  5:19             ` Andrei Borzenkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=23840777.EfDdHjke4D@xev \
    --to=russell@coker.com.au \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.