All of lore.kernel.org
 help / color / mirror / Atom feed
* strangely uncorrectable errors with RAID-5
@ 2024-10-20 10:09 Russell Coker
  2024-10-20 21:01 ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Russell Coker @ 2024-10-20 10:09 UTC (permalink / raw)
  To: linux-btrfs

I've been testing out BTRFS RAID-5 with Debian kernels 6.10.9 and 6.11.2 from 
Unstable.

I know that RAID-5 is not expected to be good enough for real data but it 
still seemed interesting to test it as apparently there have been improvemnts 
recently.  I created some errors that SHOULD be recoverable (and are 
recoverable with RAID-1) which turned out to not be recoverable (according to 
BTRFS) even though the diff command reported that the data was intact.  Now I 
can't get the filesystem to an error-free status.

To test it I created a 4 device RAID-5 filesystem and ran the following script 
to stress it a bit:

#!/bin/bash
set -e
cd /mnt
while true ; do
  cp -r usr usr2
  cp -r usr usr3
  cp -r usr usr4
  cp -r usr usr5
  sync
  diff -ru usr usr2
  diff -ru usr usr3
  diff -ru usr usr4
  diff -ru usr usr5
  rm -rf usr?
done

Then I ran the following script to cause corruption and scrub it to see what 
happens:

#!/bin/bash
set -e
while true ; do
  for DEV in c d e f ; do
    dd if=/dev/zero of=/dev/vd$DEV oseek=$((20+$RANDOM%3*1000)) bs=1024k \ 
count=1000
    sync
    btrfs scrub start -B /mnt
    sync
  done
done

It didn't take very long before it reported problems scrubbing the filesystem 
even though the diff commands didn't report any errors.  According to diff 
that filesystem has not lost any data, but now even after rebooting I get the 
following when I run a scrub:

root@testing1:~# btrfs scrub start -B /mnt
Starting scrub on devid 1
Starting scrub on devid 2
Starting scrub on devid 3
Starting scrub on devid 4
ERROR: scrubbing /mnt failed for device id 1: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 2: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 3: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 (Input/output 
error)
scrub canceled for f8a30d07-f92e-4dfc-a62f-f49d35b70467
Scrub started:    Sun Oct 20 10:01:21 2024
Status:           aborted
Duration:         0:00:03
Total to scrub:   332.22MiB
Rate:             110.74MiB/s (some device limits set)
Error summary:    csum=4
  Corrected:      0
  Uncorrectable:  4
  Unverified:     0
root@testing1:~# btrfs scrub start -B /mnt
Starting scrub on devid 1
Starting scrub on devid 2
Starting scrub on devid 3
Starting scrub on devid 4
ERROR: scrubbing /mnt failed for device id 1: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 2: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 3: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 (Input/output 
error)
scrub canceled for f8a30d07-f92e-4dfc-a62f-f49d35b70467
Scrub started:    Sun Oct 20 10:01:27 2024
Status:           aborted
Duration:         0:00:03
Total to scrub:   332.22MiB
Rate:             110.74MiB/s (some device limits set)
Error summary:    csum=4
  Corrected:      0
  Uncorrectable:  4
  Unverified:     0

Below is some output from dmesg:

[   36.975742] BTRFS info (device vdc): bdev /dev/vdc errs: wr 0, rd 0, flush 
0, corrupt 5443, gen 0
[   36.976083] BTRFS info (device vdc): bdev /dev/vde errs: wr 0, rd 0, flush 
0, corrupt 13127, gen 0
[   36.976397] BTRFS info (device vdc): bdev /dev/vdf errs: wr 0, rd 0, flush 
0, corrupt 1412, gen 0
[   38.877364] BTRFS info (device vdc): scrub: started on devid 3
[   38.878607] BTRFS info (device vdc): scrub: started on devid 4
[   38.880468] BTRFS info (device vdc): scrub: started on devid 1
[   38.885000] BTRFS info (device vdc): scrub: started on devid 2
[   39.347569] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.350325] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.353158] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.355091] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.355786] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   39.356293] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   39.357059] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   39.357198] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.357602] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   39.359539] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.363175] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   39.364156] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   39.364813] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   39.365519] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   39.365838] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.368456] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.369461] BTRFS error (device vdc): unrepaired sectors detected, full 
stripe 411893760 data stripe 2 errors 0-3
[   39.370175] BTRFS info (device vdc): scrub: not finished on devid 4 with 
status: -5
[   41.231719] BTRFS error (device vdc): bad tree block start, mirror 1 want 
412024832 have 0
[   41.232326] BTRFS error (device vdc): bad tree block start, mirror 2 want 
412024832 have 0
[   41.232832] BTRFS error (device vdc): bad tree block start, mirror 1 want 
412024832 have 0
[   41.233470] BTRFS error (device vdc): bad tree block start, mirror 2 want 
412024832 have 0
[   41.234085] BTRFS info (device vdc): scrub: not finished on devid 1 with 
status: -5
[   41.234170] BTRFS info (device vdc): scrub: not finished on devid 2 with 
status: -5
[   41.234231] BTRFS info (device vdc): scrub: not finished on devid 3 with 
status: -5
[   44.243128] BTRFS info (device vdc): scrub: started on devid 1
[   44.243901] BTRFS info (device vdc): scrub: started on devid 2
[   44.243928] BTRFS info (device vdc): scrub: started on devid 4
[   44.244796] BTRFS info (device vdc): scrub: started on devid 3
[   44.774710] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.793802] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.797168] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.803175] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.807162] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   44.810892] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   44.811443] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   44.823205] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.823540] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   44.823544] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   44.823546] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   44.823547] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   44.823549] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   44.832155] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.838895] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.844663] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.845561] BTRFS error (device vdc): unrepaired sectors detected, full 
stripe 411893760 data stripe 2 errors 0-3
[   44.846842] BTRFS info (device vdc): scrub: not finished on devid 4 with 
status: -5
[   47.746767] BTRFS error (device vdc): bad tree block start, mirror 1 want 
412024832 have 0
[   47.748256] BTRFS error (device vdc): bad tree block start, mirror 2 want 
412024832 have 0
[   47.749069] BTRFS info (device vdc): scrub: not finished on devid 3 with 
status: -5
[   47.754752] BTRFS error (device vdc): bad tree block start, mirror 1 want 
412024832 have 0
[   47.755952] BTRFS error (device vdc): bad tree block start, mirror 2 want 
412024832 have 0
[   47.758766] BTRFS info (device vdc): scrub: not finished on devid 2 with 
status: -5
[   47.822683] BTRFS error (device vdc): bad tree block start, mirror 1 want 
412024832 have 0
[   47.826760] BTRFS error (device vdc): bad tree block start, mirror 2 want 
412024832 have 0
[   47.834688] BTRFS info (device vdc): scrub: not finished on devid 1 with 
status: -5

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-03-15  5:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-20 10:09 strangely uncorrectable errors with RAID-5 Russell Coker
2024-10-20 21:01 ` Qu Wenruo
2024-10-21  3:55   ` Russell Coker
2024-10-21  4:26     ` Qu Wenruo
2025-03-14 12:27       ` Russell Coker
2025-03-14 16:54         ` Russell Coker
2025-03-14 19:32           ` Thiago Ramon
2025-03-15  2:51             ` Russell Coker
2025-03-15  5:19             ` Andrei Borzenkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.