All of lore.kernel.org
 help / color / mirror / Atom feed
* strangely uncorrectable errors with RAID-5
@ 2024-10-20 10:09 Russell Coker
  2024-10-20 21:01 ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Russell Coker @ 2024-10-20 10:09 UTC (permalink / raw)
  To: linux-btrfs

I've been testing out BTRFS RAID-5 with Debian kernels 6.10.9 and 6.11.2 from 
Unstable.

I know that RAID-5 is not expected to be good enough for real data but it 
still seemed interesting to test it as apparently there have been improvemnts 
recently.  I created some errors that SHOULD be recoverable (and are 
recoverable with RAID-1) which turned out to not be recoverable (according to 
BTRFS) even though the diff command reported that the data was intact.  Now I 
can't get the filesystem to an error-free status.

To test it I created a 4 device RAID-5 filesystem and ran the following script 
to stress it a bit:

#!/bin/bash
set -e
cd /mnt
while true ; do
  cp -r usr usr2
  cp -r usr usr3
  cp -r usr usr4
  cp -r usr usr5
  sync
  diff -ru usr usr2
  diff -ru usr usr3
  diff -ru usr usr4
  diff -ru usr usr5
  rm -rf usr?
done

Then I ran the following script to cause corruption and scrub it to see what 
happens:

#!/bin/bash
set -e
while true ; do
  for DEV in c d e f ; do
    dd if=/dev/zero of=/dev/vd$DEV oseek=$((20+$RANDOM%3*1000)) bs=1024k \ 
count=1000
    sync
    btrfs scrub start -B /mnt
    sync
  done
done

It didn't take very long before it reported problems scrubbing the filesystem 
even though the diff commands didn't report any errors.  According to diff 
that filesystem has not lost any data, but now even after rebooting I get the 
following when I run a scrub:

root@testing1:~# btrfs scrub start -B /mnt
Starting scrub on devid 1
Starting scrub on devid 2
Starting scrub on devid 3
Starting scrub on devid 4
ERROR: scrubbing /mnt failed for device id 1: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 2: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 3: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 (Input/output 
error)
scrub canceled for f8a30d07-f92e-4dfc-a62f-f49d35b70467
Scrub started:    Sun Oct 20 10:01:21 2024
Status:           aborted
Duration:         0:00:03
Total to scrub:   332.22MiB
Rate:             110.74MiB/s (some device limits set)
Error summary:    csum=4
  Corrected:      0
  Uncorrectable:  4
  Unverified:     0
root@testing1:~# btrfs scrub start -B /mnt
Starting scrub on devid 1
Starting scrub on devid 2
Starting scrub on devid 3
Starting scrub on devid 4
ERROR: scrubbing /mnt failed for device id 1: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 2: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 3: ret=-1, errno=5 (Input/output 
error)
ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 (Input/output 
error)
scrub canceled for f8a30d07-f92e-4dfc-a62f-f49d35b70467
Scrub started:    Sun Oct 20 10:01:27 2024
Status:           aborted
Duration:         0:00:03
Total to scrub:   332.22MiB
Rate:             110.74MiB/s (some device limits set)
Error summary:    csum=4
  Corrected:      0
  Uncorrectable:  4
  Unverified:     0

Below is some output from dmesg:

[   36.975742] BTRFS info (device vdc): bdev /dev/vdc errs: wr 0, rd 0, flush 
0, corrupt 5443, gen 0
[   36.976083] BTRFS info (device vdc): bdev /dev/vde errs: wr 0, rd 0, flush 
0, corrupt 13127, gen 0
[   36.976397] BTRFS info (device vdc): bdev /dev/vdf errs: wr 0, rd 0, flush 
0, corrupt 1412, gen 0
[   38.877364] BTRFS info (device vdc): scrub: started on devid 3
[   38.878607] BTRFS info (device vdc): scrub: started on devid 4
[   38.880468] BTRFS info (device vdc): scrub: started on devid 1
[   38.885000] BTRFS info (device vdc): scrub: started on devid 2
[   39.347569] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.350325] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.353158] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.355091] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.355786] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   39.356293] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   39.357059] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   39.357198] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.357602] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   39.359539] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.363175] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   39.364156] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   39.364813] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   39.365519] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   39.365838] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.368456] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   39.369461] BTRFS error (device vdc): unrepaired sectors detected, full 
stripe 411893760 data stripe 2 errors 0-3
[   39.370175] BTRFS info (device vdc): scrub: not finished on devid 4 with 
status: -5
[   41.231719] BTRFS error (device vdc): bad tree block start, mirror 1 want 
412024832 have 0
[   41.232326] BTRFS error (device vdc): bad tree block start, mirror 2 want 
412024832 have 0
[   41.232832] BTRFS error (device vdc): bad tree block start, mirror 1 want 
412024832 have 0
[   41.233470] BTRFS error (device vdc): bad tree block start, mirror 2 want 
412024832 have 0
[   41.234085] BTRFS info (device vdc): scrub: not finished on devid 1 with 
status: -5
[   41.234170] BTRFS info (device vdc): scrub: not finished on devid 2 with 
status: -5
[   41.234231] BTRFS info (device vdc): scrub: not finished on devid 3 with 
status: -5
[   44.243128] BTRFS info (device vdc): scrub: started on devid 1
[   44.243901] BTRFS info (device vdc): scrub: started on devid 2
[   44.243928] BTRFS info (device vdc): scrub: started on devid 4
[   44.244796] BTRFS info (device vdc): scrub: started on devid 3
[   44.774710] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.793802] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.797168] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.803175] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.807162] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   44.810892] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   44.811443] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   44.823205] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.823540] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   44.823544] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   44.823546] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   44.823547] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 412024832 on dev /dev/vde physical 315555840
[   44.823549] BTRFS warning (device vdc): checksum error at logical 412024832 
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[   44.832155] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.838895] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.844663] BTRFS warning (device vdc): tree block 412024832 mirror 1 has 
bad bytenr, has 0 want 412024832
[   44.845561] BTRFS error (device vdc): unrepaired sectors detected, full 
stripe 411893760 data stripe 2 errors 0-3
[   44.846842] BTRFS info (device vdc): scrub: not finished on devid 4 with 
status: -5
[   47.746767] BTRFS error (device vdc): bad tree block start, mirror 1 want 
412024832 have 0
[   47.748256] BTRFS error (device vdc): bad tree block start, mirror 2 want 
412024832 have 0
[   47.749069] BTRFS info (device vdc): scrub: not finished on devid 3 with 
status: -5
[   47.754752] BTRFS error (device vdc): bad tree block start, mirror 1 want 
412024832 have 0
[   47.755952] BTRFS error (device vdc): bad tree block start, mirror 2 want 
412024832 have 0
[   47.758766] BTRFS info (device vdc): scrub: not finished on devid 2 with 
status: -5
[   47.822683] BTRFS error (device vdc): bad tree block start, mirror 1 want 
412024832 have 0
[   47.826760] BTRFS error (device vdc): bad tree block start, mirror 2 want 
412024832 have 0
[   47.834688] BTRFS info (device vdc): scrub: not finished on devid 1 with 
status: -5

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: strangely uncorrectable errors with RAID-5
  2024-10-20 10:09 strangely uncorrectable errors with RAID-5 Russell Coker
@ 2024-10-20 21:01 ` Qu Wenruo
  2024-10-21  3:55   ` Russell Coker
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2024-10-20 21:01 UTC (permalink / raw)
  To: russell, linux-btrfs



在 2024/10/20 20:39, Russell Coker 写道:
> I've been testing out BTRFS RAID-5 with Debian kernels 6.10.9 and 6.11.2 from
> Unstable.
>
> I know that RAID-5 is not expected to be good enough for real data but it
> still seemed interesting to test it as apparently there have been improvemnts
> recently.  I created some errors that SHOULD be recoverable (and are
> recoverable with RAID-1) which turned out to not be recoverable (according to
> BTRFS) even though the diff command reported that the data was intact.  Now I
> can't get the filesystem to an error-free status.
>
> To test it I created a 4 device RAID-5 filesystem and ran the following script
> to stress it a bit:
>
> #!/bin/bash
> set -e
> cd /mnt
> while true ; do
>    cp -r usr usr2
>    cp -r usr usr3
>    cp -r usr usr4
>    cp -r usr usr5
>    sync
>    diff -ru usr usr2
>    diff -ru usr usr3
>    diff -ru usr usr4
>    diff -ru usr usr5
>    rm -rf usr?
> done
>
> Then I ran the following script to cause corruption and scrub it to see what
> happens:
>
> #!/bin/bash
> set -e
> while true ; do
>    for DEV in c d e f ; do
>      dd if=/dev/zero of=/dev/vd$DEV oseek=$((20+$RANDOM%3*1000)) bs=1024k \
> count=1000
>      sync
>      btrfs scrub start -B /mnt
>      sync
>    done
> done
>
> It didn't take very long before it reported problems scrubbing the filesystem
> even though the diff commands didn't report any errors.  According to diff
> that filesystem has not lost any data, but now even after rebooting I get the
> following when I run a scrub:
>
> root@testing1:~# btrfs scrub start -B /mnt
> Starting scrub on devid 1
> Starting scrub on devid 2
> Starting scrub on devid 3
> Starting scrub on devid 4
> ERROR: scrubbing /mnt failed for device id 1: ret=-1, errno=5 (Input/output
> error)
> ERROR: scrubbing /mnt failed for device id 2: ret=-1, errno=5 (Input/output
> error)
> ERROR: scrubbing /mnt failed for device id 3: ret=-1, errno=5 (Input/output
> error)
> ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 (Input/output
> error)
> scrub canceled for f8a30d07-f92e-4dfc-a62f-f49d35b70467
> Scrub started:    Sun Oct 20 10:01:21 2024
> Status:           aborted
> Duration:         0:00:03
> Total to scrub:   332.22MiB
> Rate:             110.74MiB/s (some device limits set)
> Error summary:    csum=4
>    Corrected:      0
>    Uncorrectable:  4
>    Unverified:     0
> root@testing1:~# btrfs scrub start -B /mnt
> Starting scrub on devid 1
> Starting scrub on devid 2
> Starting scrub on devid 3
> Starting scrub on devid 4
> ERROR: scrubbing /mnt failed for device id 1: ret=-1, errno=5 (Input/output
> error)
> ERROR: scrubbing /mnt failed for device id 2: ret=-1, errno=5 (Input/output
> error)
> ERROR: scrubbing /mnt failed for device id 3: ret=-1, errno=5 (Input/output
> error)
> ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 (Input/output
> error)
> scrub canceled for f8a30d07-f92e-4dfc-a62f-f49d35b70467
> Scrub started:    Sun Oct 20 10:01:27 2024
> Status:           aborted
> Duration:         0:00:03
> Total to scrub:   332.22MiB
> Rate:             110.74MiB/s (some device limits set)
> Error summary:    csum=4
>    Corrected:      0
>    Uncorrectable:  4
>    Unverified:     0
>
> Below is some output from dmesg:
>
> [   36.975742] BTRFS info (device vdc): bdev /dev/vdc errs: wr 0, rd 0, flush
> 0, corrupt 5443, gen 0
> [   36.976083] BTRFS info (device vdc): bdev /dev/vde errs: wr 0, rd 0, flush
> 0, corrupt 13127, gen 0
> [   36.976397] BTRFS info (device vdc): bdev /dev/vdf errs: wr 0, rd 0, flush
> 0, corrupt 1412, gen 0
> [   38.877364] BTRFS info (device vdc): scrub: started on devid 3
> [   38.878607] BTRFS info (device vdc): scrub: started on devid 4
> [   38.880468] BTRFS info (device vdc): scrub: started on devid 1
> [   38.885000] BTRFS info (device vdc): scrub: started on devid 2
> [   39.347569] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [   39.350325] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [   39.353158] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [   39.355091] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832

The metadata is gone, and there is only one mirror for it.

What profile are you using for metadata?

Just in case, RAID5 is not recommended for metadata due to the complex
recovery combinations:

 >> The power failure safety for metadata with RAID56 is not 100%.

I'm pretty sure if you're using RAID5 for metadata, that's exactly the
case where corrupted metadata can not be properly fixed at a per-sector
basis.

Thus it's recommended to go RAID1* for metadata if you want to use RAID5
for data.

Thanks,
Qu


> [   39.355786] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 412024832 on dev /dev/vde physical 315555840
> [   39.356293] BTRFS warning (device vdc): checksum error at logical 412024832
> on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
> [   39.357059] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 412024832 on dev /dev/vde physical 315555840
> [   39.357198] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [   39.357602] BTRFS warning (device vdc): checksum error at logical 412024832
> on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
> [   39.359539] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [   39.363175] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 412024832 on dev /dev/vde physical 315555840
> [   39.364156] BTRFS warning (device vdc): checksum error at logical 412024832
> on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
> [   39.364813] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 412024832 on dev /dev/vde physical 315555840
> [   39.365519] BTRFS warning (device vdc): checksum error at logical 412024832
> on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
> [   39.365838] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [   39.368456] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [   39.369461] BTRFS error (device vdc): unrepaired sectors detected, full
> stripe 411893760 data stripe 2 errors 0-3
> [   39.370175] BTRFS info (device vdc): scrub: not finished on devid 4 with
> status: -5
> [   41.231719] BTRFS error (device vdc): bad tree block start, mirror 1 want
> 412024832 have 0
> [   41.232326] BTRFS error (device vdc): bad tree block start, mirror 2 want
> 412024832 have 0
> [   41.232832] BTRFS error (device vdc): bad tree block start, mirror 1 want
> 412024832 have 0
> [   41.233470] BTRFS error (device vdc): bad tree block start, mirror 2 want
> 412024832 have 0
> [   41.234085] BTRFS info (device vdc): scrub: not finished on devid 1 with
> status: -5
> [   41.234170] BTRFS info (device vdc): scrub: not finished on devid 2 with
> status: -5
> [   41.234231] BTRFS info (device vdc): scrub: not finished on devid 3 with
> status: -5
> [   44.243128] BTRFS info (device vdc): scrub: started on devid 1
> [   44.243901] BTRFS info (device vdc): scrub: started on devid 2
> [   44.243928] BTRFS info (device vdc): scrub: started on devid 4
> [   44.244796] BTRFS info (device vdc): scrub: started on devid 3
> [   44.774710] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [   44.793802] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [   44.797168] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [   44.803175] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [   44.807162] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 412024832 on dev /dev/vde physical 315555840
> [   44.810892] BTRFS warning (device vdc): checksum error at logical 412024832
> on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
> [   44.811443] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 412024832 on dev /dev/vde physical 315555840
> [   44.823205] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [   44.823540] BTRFS warning (device vdc): checksum error at logical 412024832
> on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
> [   44.823544] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 412024832 on dev /dev/vde physical 315555840
> [   44.823546] BTRFS warning (device vdc): checksum error at logical 412024832
> on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
> [   44.823547] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 412024832 on dev /dev/vde physical 315555840
> [   44.823549] BTRFS warning (device vdc): checksum error at logical 412024832
> on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
> [   44.832155] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [   44.838895] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [   44.844663] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [   44.845561] BTRFS error (device vdc): unrepaired sectors detected, full
> stripe 411893760 data stripe 2 errors 0-3
> [   44.846842] BTRFS info (device vdc): scrub: not finished on devid 4 with
> status: -5
> [   47.746767] BTRFS error (device vdc): bad tree block start, mirror 1 want
> 412024832 have 0
> [   47.748256] BTRFS error (device vdc): bad tree block start, mirror 2 want
> 412024832 have 0
> [   47.749069] BTRFS info (device vdc): scrub: not finished on devid 3 with
> status: -5
> [   47.754752] BTRFS error (device vdc): bad tree block start, mirror 1 want
> 412024832 have 0
> [   47.755952] BTRFS error (device vdc): bad tree block start, mirror 2 want
> 412024832 have 0
> [   47.758766] BTRFS info (device vdc): scrub: not finished on devid 2 with
> status: -5
> [   47.822683] BTRFS error (device vdc): bad tree block start, mirror 1 want
> 412024832 have 0
> [   47.826760] BTRFS error (device vdc): bad tree block start, mirror 2 want
> 412024832 have 0
> [   47.834688] BTRFS info (device vdc): scrub: not finished on devid 1 with
> status: -5
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: strangely uncorrectable errors with RAID-5
  2024-10-20 21:01 ` Qu Wenruo
@ 2024-10-21  3:55   ` Russell Coker
  2024-10-21  4:26     ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Russell Coker @ 2024-10-21  3:55 UTC (permalink / raw)
  To: linux-btrfs, Qu Wenruo

On Monday, 21 October 2024 08:01:52 AEDT Qu Wenruo wrote:
> The metadata is gone, and there is only one mirror for it.
> 
> What profile are you using for metadata?

# btrfs fi df /mnt
Data, RAID5: total=4.92GiB, used=1.21GiB
System, RAID5: total=48.00MiB, used=16.00KiB
Metadata, RAID5: total=864.00MiB, used=172.77MiB
GlobalReserve, single: total=6.95MiB, used=0.00B

It's all RAID5.  Why would it be all gone?  Also why can't it be recreated 
when diff doesn't report any file loss?  Can I convert a BTRFS filesystem from 
this state to an all good state?

As an aside I've had something quite similar happen on a production server 
running Ubuntu 20.04 and RAID-1 and now I'm just running it knowing that a 
scrub will give errors but that it apparently works.

> Just in case, RAID5 is not recommended for metadata due to the complex

From what I know it's still not recommended for anything though.  But 
definitely if I didn't want to have data loss and I wanted RAID-5 then I'd use 
RAID-1 for metadata.  But in this case I was more interested in seeing how it 
might break than in keeping the data intact.

> recovery combinations:
>  >> The power failure safety for metadata with RAID56 is not 100%.
> 
> I'm pretty sure if you're using RAID5 for metadata, that's exactly the
> case where corrupted metadata can not be properly fixed at a per-sector
> basis.
> 
> Thus it's recommended to go RAID1* for metadata if you want to use RAID5
> for data.

OK.  I'll run some new tests on RAID1 metadata and RAID5 data and see how that 
goes.

Is there any way of recovering the filesystem with those errors or is mkfs 
required?

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: strangely uncorrectable errors with RAID-5
  2024-10-21  3:55   ` Russell Coker
@ 2024-10-21  4:26     ` Qu Wenruo
  2025-03-14 12:27       ` Russell Coker
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2024-10-21  4:26 UTC (permalink / raw)
  To: Russell Coker, linux-btrfs, Qu Wenruo



在 2024/10/21 14:25, Russell Coker 写道:
> On Monday, 21 October 2024 08:01:52 AEDT Qu Wenruo wrote:
>> The metadata is gone, and there is only one mirror for it.
>>
>> What profile are you using for metadata?
> 
> # btrfs fi df /mnt
> Data, RAID5: total=4.92GiB, used=1.21GiB
> System, RAID5: total=48.00MiB, used=16.00KiB
> Metadata, RAID5: total=864.00MiB, used=172.77MiB
> GlobalReserve, single: total=6.95MiB, used=0.00B
> 
> It's all RAID5.  Why would it be all gone?  Also why can't it be recreated
> when diff doesn't report any file loss?  Can I convert a BTRFS filesystem from
> this state to an all good state?

Your metadata is corrupted, no better solution other than doing data 
salvage (-o rescue=all,ro, copy whatever data you can/want)

> 
> As an aside I've had something quite similar happen on a production server
> running Ubuntu 20.04 and RAID-1 and now I'm just running it knowing that a
> scrub will give errors but that it apparently works.

RAID1 is completely another story.

Without the complex RAID5 write-hole problems, it's very reliable.
> 
>> Just in case, RAID5 is not recommended for metadata due to the complex
> 
>  From what I know it's still not recommended for anything though.  But
> definitely if I didn't want to have data loss and I wanted RAID-5 then I'd use
> RAID-1 for metadata.  But in this case I was more interested in seeing how it
> might break than in keeping the data intact.

With the recent RAID56 improves, I'd say RAID5 data + RAID1 metadata is 
usable, but I'm not sure how it will survive in a production environment.

Considering we have a lot of other problems out of our control, like bad 
disk flush behavior, and even hardware memory bitflips, I won't 
recommend RAID5 data for now, but I believe RAID56 for data has improved 
a lot.

If you want to experiment RAID1 metadata with RAID5 data and report 
back, I would appreciate the effort a lot.
And from my last work on RAID56 (for data), it should survive your 
random corruption script.

> 
>> recovery combinations:
>>   >> The power failure safety for metadata with RAID56 is not 100%.
>>
>> I'm pretty sure if you're using RAID5 for metadata, that's exactly the
>> case where corrupted metadata can not be properly fixed at a per-sector
>> basis.
>>
>> Thus it's recommended to go RAID1* for metadata if you want to use RAID5
>> for data.
> 
> OK.  I'll run some new tests on RAID1 metadata and RAID5 data and see how that
> goes.
> 
> Is there any way of recovering the filesystem with those errors or is mkfs
> required?

Metadata is gone, and since you're randomly corrupting the fs with 
metadata using RAID5, I would suggest just go mkfs, with RAID1 metadata 
of course.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: strangely uncorrectable errors with RAID-5
  2024-10-21  4:26     ` Qu Wenruo
@ 2025-03-14 12:27       ` Russell Coker
  2025-03-14 16:54         ` Russell Coker
  0 siblings, 1 reply; 9+ messages in thread
From: Russell Coker @ 2025-03-14 12:27 UTC (permalink / raw)
  To: linux-btrfs, Qu Wenruo

On Monday, 21 October 2024 15:26:19 AEDT Qu Wenruo wrote:
> With the recent RAID56 improves, I'd say RAID5 data + RAID1 metadata is
> usable, but I'm not sure how it will survive in a production environment.
> 
> Considering we have a lot of other problems out of our control, like bad
> disk flush behavior, and even hardware memory bitflips, I won't
> recommend RAID5 data for now, but I believe RAID56 for data has improved
> a lot.
> 
> If you want to experiment RAID1 metadata with RAID5 data and report
> back, I would appreciate the effort a lot.
> And from my last work on RAID56 (for data), it should survive your
> random corruption script.

Just to see if anything had changed I ran the same tests with RAID-5 data and 
metadata again with the Debian kernel 6.12.17-amd64 and this time got properly 
uncorrectable errors in less than a minute.  I don't know if the error 
happened faster and worse than before because of some kernel difference, luck, 
or some timing difference when running on different hardware.

The system is a Dell PowerEdge T630 with advanced ECC enabled so I don't think 
that memory bitflips are an issue here.

[  398.486860] BTRFS warning (device vdc): checksum error at logical 
5073338368 on dev /dev/vdc, physical 1705771008: metadata leaf (level 0) in 
tree 5
[  398.487367] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 5073338368 on dev /dev/vdc physical 1705771008
[  398.487826] BTRFS warning (device vdc): checksum error at logical 
5073338368 on dev /dev/vdc, physical 1705771008: metadata leaf (level 0) in 
tree 5
[  398.488333] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 5073338368 on dev /dev/vdc physical 1705771008
[  398.488792] BTRFS warning (device vdc): checksum error at logical 
5073338368 on dev /dev/vdc, physical 1705771008: metadata leaf (level 0) in 
tree 5
[  398.489308] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 5073338368 on dev /dev/vdc physical 1705771008
[  398.489772] BTRFS warning (device vdc): checksum error at logical 
5073338368 on dev /dev/vdc, physical 1705771008: metadata leaf (level 0) in 
tree 5
[  398.490459] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 5073338368 on dev /dev/vdc physical 1705771008
[  398.490926] BTRFS warning (device vdc): checksum error at logical 
5073338368 on dev /dev/vdc, physical 1705771008: metadata leaf (level 0) in 
tree 5
[  398.491435] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 5073338368 on dev /dev/vdc physical 1705771008
[  398.491898] BTRFS warning (device vdc): checksum error at logical 
5073338368 on dev /dev/vdc, physical 1705771008: metadata leaf (level 0) in 
tree 5
[  398.492406] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 5073338368 on dev /dev/vdc physical 1705771008
[  398.492868] BTRFS warning (device vdc): checksum error at logical 
5073338368 on dev /dev/vdc, physical 1705771008: metadata leaf (level 0) in 
tree 5

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: strangely uncorrectable errors with RAID-5
  2025-03-14 12:27       ` Russell Coker
@ 2025-03-14 16:54         ` Russell Coker
  2025-03-14 19:32           ` Thiago Ramon
  0 siblings, 1 reply; 9+ messages in thread
From: Russell Coker @ 2025-03-14 16:54 UTC (permalink / raw)
  To: linux-btrfs, Qu Wenruo

On Friday, 14 March 2025 23:27:51 AEDT Russell Coker wrote:
> Just to see if anything had changed I ran the same tests with RAID-5 data
> and metadata again with the Debian kernel 6.12.17-amd64 and this time got
> properly uncorrectable errors in less than a minute.  I don't know if the
> error happened faster and worse than before because of some kernel
> difference, luck, or some timing difference when running on different
> hardware.

I did it again but with RAID-5 data and RAID-1 metadata and it took a few 
hours this time but again got to an uncorrectable state.

[15653.298999] BTRFS warning (device vdc): checksum error at logical 
5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in 
tree 5
[15653.299001] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 5157748736 on dev /dev/vdc physical 1673199616
[15653.299002] BTRFS warning (device vdc): checksum error at logical 
5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in 
tree 5
[15653.299003] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 5157748736 on dev /dev/vdc physical 1673199616
[15653.299005] BTRFS warning (device vdc): checksum error at logical 
5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in 
tree 5
[15653.299006] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 5157748736 on dev /dev/vdc physical 1673199616
[15653.299007] BTRFS warning (device vdc): checksum error at logical 
5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in 
tree 5
[15653.299009] BTRFS error (device vdc): unable to fixup (regular) error at 
logical 5157748736 on dev /dev/vdc physical 1673199616
[15653.299010] BTRFS warning (device vdc): checksum error at logical 
5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in 
tree 5

Below is the script that broke it.

#!/bin/bash
set -e
while true ; do
  for DEV in c d e f ; do
    dd if=/dev/zero of=/dev/vd$DEV oseek=$((20+$RANDOM%3*1000)) bs=1024k 
count=1000
    sync
    btrfs scrub start -B /mnt
    sync
  done
  date
done

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: strangely uncorrectable errors with RAID-5
  2025-03-14 16:54         ` Russell Coker
@ 2025-03-14 19:32           ` Thiago Ramon
  2025-03-15  2:51             ` Russell Coker
  2025-03-15  5:19             ` Andrei Borzenkov
  0 siblings, 2 replies; 9+ messages in thread
From: Thiago Ramon @ 2025-03-14 19:32 UTC (permalink / raw)
  To: russell; +Cc: linux-btrfs, Qu Wenruo

On Fri, Mar 14, 2025 at 1:54 PM Russell Coker <russell@coker.com.au> wrote:
>
> On Friday, 14 March 2025 23:27:51 AEDT Russell Coker wrote:
> > Just to see if anything had changed I ran the same tests with RAID-5 data
> > and metadata again with the Debian kernel 6.12.17-amd64 and this time got
> > properly uncorrectable errors in less than a minute.  I don't know if the
> > error happened faster and worse than before because of some kernel
> > difference, luck, or some timing difference when running on different
> > hardware.
>
> I did it again but with RAID-5 data and RAID-1 metadata and it took a few
> hours this time but again got to an uncorrectable state.
>
> [15653.298999] BTRFS warning (device vdc): checksum error at logical
> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
> tree 5
> [15653.299001] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 5157748736 on dev /dev/vdc physical 1673199616
> [15653.299002] BTRFS warning (device vdc): checksum error at logical
> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
> tree 5
> [15653.299003] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 5157748736 on dev /dev/vdc physical 1673199616
> [15653.299005] BTRFS warning (device vdc): checksum error at logical
> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
> tree 5
> [15653.299006] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 5157748736 on dev /dev/vdc physical 1673199616
> [15653.299007] BTRFS warning (device vdc): checksum error at logical
> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
> tree 5
> [15653.299009] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 5157748736 on dev /dev/vdc physical 1673199616
> [15653.299010] BTRFS warning (device vdc): checksum error at logical
> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
> tree 5
>
> Below is the script that broke it.
>
> #!/bin/bash
> set -e
> while true ; do
>   for DEV in c d e f ; do
>     dd if=/dev/zero of=/dev/vd$DEV oseek=$((20+$RANDOM%3*1000)) bs=1024k
> count=1000
>     sync
>     btrfs scrub start -B /mnt
>     sync
>   done
>   date
> done
Your script is writing randomly to ALL your disks. You just need to
get lucky for them to overwrite the same sector in 2 different disks
to ruin RAID5 and RAID1. If you want to stress test the filesystem you
need to pay more attention to what you're doing.
>
> --
> My Main Blog         http://etbe.coker.com.au/
> My Documents Blog    http://doc.coker.com.au/
>
>
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: strangely uncorrectable errors with RAID-5
  2025-03-14 19:32           ` Thiago Ramon
@ 2025-03-15  2:51             ` Russell Coker
  2025-03-15  5:19             ` Andrei Borzenkov
  1 sibling, 0 replies; 9+ messages in thread
From: Russell Coker @ 2025-03-15  2:51 UTC (permalink / raw)
  To: Thiago Ramon; +Cc: linux-btrfs, Qu Wenruo

On Saturday, 15 March 2025 06:32:46 AEDT Thiago Ramon wrote:
> > while true ; do
> > 
> >   for DEV in c d e f ; do
> >   
> >     dd if=/dev/zero of=/dev/vd$DEV oseek=$((20+$RANDOM%3*1000)) bs=1024k
> > 
> > count=1000
> > 
> >     sync
> >     btrfs scrub start -B /mnt
> >     sync
> >   
> >   done
> >   date
> > 
> > done
> 
> Your script is writing randomly to ALL your disks. You just need to
> get lucky for them to overwrite the same sector in 2 different disks
> to ruin RAID5 and RAID1. If you want to stress test the filesystem you
> need to pay more attention to what you're doing.

Are you saying that the btrfs scrub is not designed to correct the errors?

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: strangely uncorrectable errors with RAID-5
  2025-03-14 19:32           ` Thiago Ramon
  2025-03-15  2:51             ` Russell Coker
@ 2025-03-15  5:19             ` Andrei Borzenkov
  1 sibling, 0 replies; 9+ messages in thread
From: Andrei Borzenkov @ 2025-03-15  5:19 UTC (permalink / raw)
  To: Thiago Ramon, russell; +Cc: linux-btrfs, Qu Wenruo

14.03.2025 22:32, Thiago Ramon wrote:
> On Fri, Mar 14, 2025 at 1:54 PM Russell Coker <russell@coker.com.au> wrote:
>>
>> On Friday, 14 March 2025 23:27:51 AEDT Russell Coker wrote:
>>> Just to see if anything had changed I ran the same tests with RAID-5 data
>>> and metadata again with the Debian kernel 6.12.17-amd64 and this time got
>>> properly uncorrectable errors in less than a minute.  I don't know if the
>>> error happened faster and worse than before because of some kernel
>>> difference, luck, or some timing difference when running on different
>>> hardware.
>>
>> I did it again but with RAID-5 data and RAID-1 metadata and it took a few
>> hours this time but again got to an uncorrectable state.
>>
>> [15653.298999] BTRFS warning (device vdc): checksum error at logical
>> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
>> tree 5
>> [15653.299001] BTRFS error (device vdc): unable to fixup (regular) error at
>> logical 5157748736 on dev /dev/vdc physical 1673199616
>> [15653.299002] BTRFS warning (device vdc): checksum error at logical
>> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
>> tree 5
>> [15653.299003] BTRFS error (device vdc): unable to fixup (regular) error at
>> logical 5157748736 on dev /dev/vdc physical 1673199616
>> [15653.299005] BTRFS warning (device vdc): checksum error at logical
>> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
>> tree 5
>> [15653.299006] BTRFS error (device vdc): unable to fixup (regular) error at
>> logical 5157748736 on dev /dev/vdc physical 1673199616
>> [15653.299007] BTRFS warning (device vdc): checksum error at logical
>> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
>> tree 5
>> [15653.299009] BTRFS error (device vdc): unable to fixup (regular) error at
>> logical 5157748736 on dev /dev/vdc physical 1673199616
>> [15653.299010] BTRFS warning (device vdc): checksum error at logical
>> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
>> tree 5
>>
>> Below is the script that broke it.
>>
>> #!/bin/bash
>> set -e
>> while true ; do
>>    for DEV in c d e f ; do
>>      dd if=/dev/zero of=/dev/vd$DEV oseek=$((20+$RANDOM%3*1000)) bs=1024k
>> count=1000
>>      sync
>>      btrfs scrub start -B /mnt
>>      sync
>>    done
>>    date
>> done
> Your script is writing randomly to ALL your disks. You just need to

btrfs scrub start -B is supposed to work synchronously. At each 
iteration only one disk gets overwritten and then scrub runs to completion.

> get lucky for them to overwrite the same sector in 2 different disks
> to ruin RAID5 and RAID1. If you want to stress test the filesystem you
> need to pay more attention to what you're doing.
>>
>> --
>> My Main Blog         http://etbe.coker.com.au/
>> My Documents Blog    http://doc.coker.com.au/
>>
>>
>>
>>
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-03-15  5:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-20 10:09 strangely uncorrectable errors with RAID-5 Russell Coker
2024-10-20 21:01 ` Qu Wenruo
2024-10-21  3:55   ` Russell Coker
2024-10-21  4:26     ` Qu Wenruo
2025-03-14 12:27       ` Russell Coker
2025-03-14 16:54         ` Russell Coker
2025-03-14 19:32           ` Thiago Ramon
2025-03-15  2:51             ` Russell Coker
2025-03-15  5:19             ` Andrei Borzenkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.