* strangely uncorrectable errors with RAID-5
@ 2024-10-20 10:09 Russell Coker
2024-10-20 21:01 ` Qu Wenruo
0 siblings, 1 reply; 9+ messages in thread
From: Russell Coker @ 2024-10-20 10:09 UTC (permalink / raw)
To: linux-btrfs
I've been testing out BTRFS RAID-5 with Debian kernels 6.10.9 and 6.11.2 from
Unstable.
I know that RAID-5 is not expected to be good enough for real data but it
still seemed interesting to test it as apparently there have been improvemnts
recently. I created some errors that SHOULD be recoverable (and are
recoverable with RAID-1) which turned out to not be recoverable (according to
BTRFS) even though the diff command reported that the data was intact. Now I
can't get the filesystem to an error-free status.
To test it I created a 4 device RAID-5 filesystem and ran the following script
to stress it a bit:
#!/bin/bash
set -e
cd /mnt
while true ; do
cp -r usr usr2
cp -r usr usr3
cp -r usr usr4
cp -r usr usr5
sync
diff -ru usr usr2
diff -ru usr usr3
diff -ru usr usr4
diff -ru usr usr5
rm -rf usr?
done
Then I ran the following script to cause corruption and scrub it to see what
happens:
#!/bin/bash
set -e
while true ; do
for DEV in c d e f ; do
dd if=/dev/zero of=/dev/vd$DEV oseek=$((20+$RANDOM%3*1000)) bs=1024k \
count=1000
sync
btrfs scrub start -B /mnt
sync
done
done
It didn't take very long before it reported problems scrubbing the filesystem
even though the diff commands didn't report any errors. According to diff
that filesystem has not lost any data, but now even after rebooting I get the
following when I run a scrub:
root@testing1:~# btrfs scrub start -B /mnt
Starting scrub on devid 1
Starting scrub on devid 2
Starting scrub on devid 3
Starting scrub on devid 4
ERROR: scrubbing /mnt failed for device id 1: ret=-1, errno=5 (Input/output
error)
ERROR: scrubbing /mnt failed for device id 2: ret=-1, errno=5 (Input/output
error)
ERROR: scrubbing /mnt failed for device id 3: ret=-1, errno=5 (Input/output
error)
ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 (Input/output
error)
scrub canceled for f8a30d07-f92e-4dfc-a62f-f49d35b70467
Scrub started: Sun Oct 20 10:01:21 2024
Status: aborted
Duration: 0:00:03
Total to scrub: 332.22MiB
Rate: 110.74MiB/s (some device limits set)
Error summary: csum=4
Corrected: 0
Uncorrectable: 4
Unverified: 0
root@testing1:~# btrfs scrub start -B /mnt
Starting scrub on devid 1
Starting scrub on devid 2
Starting scrub on devid 3
Starting scrub on devid 4
ERROR: scrubbing /mnt failed for device id 1: ret=-1, errno=5 (Input/output
error)
ERROR: scrubbing /mnt failed for device id 2: ret=-1, errno=5 (Input/output
error)
ERROR: scrubbing /mnt failed for device id 3: ret=-1, errno=5 (Input/output
error)
ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 (Input/output
error)
scrub canceled for f8a30d07-f92e-4dfc-a62f-f49d35b70467
Scrub started: Sun Oct 20 10:01:27 2024
Status: aborted
Duration: 0:00:03
Total to scrub: 332.22MiB
Rate: 110.74MiB/s (some device limits set)
Error summary: csum=4
Corrected: 0
Uncorrectable: 4
Unverified: 0
Below is some output from dmesg:
[ 36.975742] BTRFS info (device vdc): bdev /dev/vdc errs: wr 0, rd 0, flush
0, corrupt 5443, gen 0
[ 36.976083] BTRFS info (device vdc): bdev /dev/vde errs: wr 0, rd 0, flush
0, corrupt 13127, gen 0
[ 36.976397] BTRFS info (device vdc): bdev /dev/vdf errs: wr 0, rd 0, flush
0, corrupt 1412, gen 0
[ 38.877364] BTRFS info (device vdc): scrub: started on devid 3
[ 38.878607] BTRFS info (device vdc): scrub: started on devid 4
[ 38.880468] BTRFS info (device vdc): scrub: started on devid 1
[ 38.885000] BTRFS info (device vdc): scrub: started on devid 2
[ 39.347569] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
bad bytenr, has 0 want 412024832
[ 39.350325] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
bad bytenr, has 0 want 412024832
[ 39.353158] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
bad bytenr, has 0 want 412024832
[ 39.355091] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
bad bytenr, has 0 want 412024832
[ 39.355786] BTRFS error (device vdc): unable to fixup (regular) error at
logical 412024832 on dev /dev/vde physical 315555840
[ 39.356293] BTRFS warning (device vdc): checksum error at logical 412024832
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[ 39.357059] BTRFS error (device vdc): unable to fixup (regular) error at
logical 412024832 on dev /dev/vde physical 315555840
[ 39.357198] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
bad bytenr, has 0 want 412024832
[ 39.357602] BTRFS warning (device vdc): checksum error at logical 412024832
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[ 39.359539] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
bad bytenr, has 0 want 412024832
[ 39.363175] BTRFS error (device vdc): unable to fixup (regular) error at
logical 412024832 on dev /dev/vde physical 315555840
[ 39.364156] BTRFS warning (device vdc): checksum error at logical 412024832
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[ 39.364813] BTRFS error (device vdc): unable to fixup (regular) error at
logical 412024832 on dev /dev/vde physical 315555840
[ 39.365519] BTRFS warning (device vdc): checksum error at logical 412024832
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[ 39.365838] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
bad bytenr, has 0 want 412024832
[ 39.368456] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
bad bytenr, has 0 want 412024832
[ 39.369461] BTRFS error (device vdc): unrepaired sectors detected, full
stripe 411893760 data stripe 2 errors 0-3
[ 39.370175] BTRFS info (device vdc): scrub: not finished on devid 4 with
status: -5
[ 41.231719] BTRFS error (device vdc): bad tree block start, mirror 1 want
412024832 have 0
[ 41.232326] BTRFS error (device vdc): bad tree block start, mirror 2 want
412024832 have 0
[ 41.232832] BTRFS error (device vdc): bad tree block start, mirror 1 want
412024832 have 0
[ 41.233470] BTRFS error (device vdc): bad tree block start, mirror 2 want
412024832 have 0
[ 41.234085] BTRFS info (device vdc): scrub: not finished on devid 1 with
status: -5
[ 41.234170] BTRFS info (device vdc): scrub: not finished on devid 2 with
status: -5
[ 41.234231] BTRFS info (device vdc): scrub: not finished on devid 3 with
status: -5
[ 44.243128] BTRFS info (device vdc): scrub: started on devid 1
[ 44.243901] BTRFS info (device vdc): scrub: started on devid 2
[ 44.243928] BTRFS info (device vdc): scrub: started on devid 4
[ 44.244796] BTRFS info (device vdc): scrub: started on devid 3
[ 44.774710] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
bad bytenr, has 0 want 412024832
[ 44.793802] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
bad bytenr, has 0 want 412024832
[ 44.797168] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
bad bytenr, has 0 want 412024832
[ 44.803175] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
bad bytenr, has 0 want 412024832
[ 44.807162] BTRFS error (device vdc): unable to fixup (regular) error at
logical 412024832 on dev /dev/vde physical 315555840
[ 44.810892] BTRFS warning (device vdc): checksum error at logical 412024832
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[ 44.811443] BTRFS error (device vdc): unable to fixup (regular) error at
logical 412024832 on dev /dev/vde physical 315555840
[ 44.823205] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
bad bytenr, has 0 want 412024832
[ 44.823540] BTRFS warning (device vdc): checksum error at logical 412024832
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[ 44.823544] BTRFS error (device vdc): unable to fixup (regular) error at
logical 412024832 on dev /dev/vde physical 315555840
[ 44.823546] BTRFS warning (device vdc): checksum error at logical 412024832
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[ 44.823547] BTRFS error (device vdc): unable to fixup (regular) error at
logical 412024832 on dev /dev/vde physical 315555840
[ 44.823549] BTRFS warning (device vdc): checksum error at logical 412024832
on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
[ 44.832155] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
bad bytenr, has 0 want 412024832
[ 44.838895] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
bad bytenr, has 0 want 412024832
[ 44.844663] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
bad bytenr, has 0 want 412024832
[ 44.845561] BTRFS error (device vdc): unrepaired sectors detected, full
stripe 411893760 data stripe 2 errors 0-3
[ 44.846842] BTRFS info (device vdc): scrub: not finished on devid 4 with
status: -5
[ 47.746767] BTRFS error (device vdc): bad tree block start, mirror 1 want
412024832 have 0
[ 47.748256] BTRFS error (device vdc): bad tree block start, mirror 2 want
412024832 have 0
[ 47.749069] BTRFS info (device vdc): scrub: not finished on devid 3 with
status: -5
[ 47.754752] BTRFS error (device vdc): bad tree block start, mirror 1 want
412024832 have 0
[ 47.755952] BTRFS error (device vdc): bad tree block start, mirror 2 want
412024832 have 0
[ 47.758766] BTRFS info (device vdc): scrub: not finished on devid 2 with
status: -5
[ 47.822683] BTRFS error (device vdc): bad tree block start, mirror 1 want
412024832 have 0
[ 47.826760] BTRFS error (device vdc): bad tree block start, mirror 2 want
412024832 have 0
[ 47.834688] BTRFS info (device vdc): scrub: not finished on devid 1 with
status: -5
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: strangely uncorrectable errors with RAID-5
2024-10-20 10:09 strangely uncorrectable errors with RAID-5 Russell Coker
@ 2024-10-20 21:01 ` Qu Wenruo
2024-10-21 3:55 ` Russell Coker
0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2024-10-20 21:01 UTC (permalink / raw)
To: russell, linux-btrfs
在 2024/10/20 20:39, Russell Coker 写道:
> I've been testing out BTRFS RAID-5 with Debian kernels 6.10.9 and 6.11.2 from
> Unstable.
>
> I know that RAID-5 is not expected to be good enough for real data but it
> still seemed interesting to test it as apparently there have been improvemnts
> recently. I created some errors that SHOULD be recoverable (and are
> recoverable with RAID-1) which turned out to not be recoverable (according to
> BTRFS) even though the diff command reported that the data was intact. Now I
> can't get the filesystem to an error-free status.
>
> To test it I created a 4 device RAID-5 filesystem and ran the following script
> to stress it a bit:
>
> #!/bin/bash
> set -e
> cd /mnt
> while true ; do
> cp -r usr usr2
> cp -r usr usr3
> cp -r usr usr4
> cp -r usr usr5
> sync
> diff -ru usr usr2
> diff -ru usr usr3
> diff -ru usr usr4
> diff -ru usr usr5
> rm -rf usr?
> done
>
> Then I ran the following script to cause corruption and scrub it to see what
> happens:
>
> #!/bin/bash
> set -e
> while true ; do
> for DEV in c d e f ; do
> dd if=/dev/zero of=/dev/vd$DEV oseek=$((20+$RANDOM%3*1000)) bs=1024k \
> count=1000
> sync
> btrfs scrub start -B /mnt
> sync
> done
> done
>
> It didn't take very long before it reported problems scrubbing the filesystem
> even though the diff commands didn't report any errors. According to diff
> that filesystem has not lost any data, but now even after rebooting I get the
> following when I run a scrub:
>
> root@testing1:~# btrfs scrub start -B /mnt
> Starting scrub on devid 1
> Starting scrub on devid 2
> Starting scrub on devid 3
> Starting scrub on devid 4
> ERROR: scrubbing /mnt failed for device id 1: ret=-1, errno=5 (Input/output
> error)
> ERROR: scrubbing /mnt failed for device id 2: ret=-1, errno=5 (Input/output
> error)
> ERROR: scrubbing /mnt failed for device id 3: ret=-1, errno=5 (Input/output
> error)
> ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 (Input/output
> error)
> scrub canceled for f8a30d07-f92e-4dfc-a62f-f49d35b70467
> Scrub started: Sun Oct 20 10:01:21 2024
> Status: aborted
> Duration: 0:00:03
> Total to scrub: 332.22MiB
> Rate: 110.74MiB/s (some device limits set)
> Error summary: csum=4
> Corrected: 0
> Uncorrectable: 4
> Unverified: 0
> root@testing1:~# btrfs scrub start -B /mnt
> Starting scrub on devid 1
> Starting scrub on devid 2
> Starting scrub on devid 3
> Starting scrub on devid 4
> ERROR: scrubbing /mnt failed for device id 1: ret=-1, errno=5 (Input/output
> error)
> ERROR: scrubbing /mnt failed for device id 2: ret=-1, errno=5 (Input/output
> error)
> ERROR: scrubbing /mnt failed for device id 3: ret=-1, errno=5 (Input/output
> error)
> ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 (Input/output
> error)
> scrub canceled for f8a30d07-f92e-4dfc-a62f-f49d35b70467
> Scrub started: Sun Oct 20 10:01:27 2024
> Status: aborted
> Duration: 0:00:03
> Total to scrub: 332.22MiB
> Rate: 110.74MiB/s (some device limits set)
> Error summary: csum=4
> Corrected: 0
> Uncorrectable: 4
> Unverified: 0
>
> Below is some output from dmesg:
>
> [ 36.975742] BTRFS info (device vdc): bdev /dev/vdc errs: wr 0, rd 0, flush
> 0, corrupt 5443, gen 0
> [ 36.976083] BTRFS info (device vdc): bdev /dev/vde errs: wr 0, rd 0, flush
> 0, corrupt 13127, gen 0
> [ 36.976397] BTRFS info (device vdc): bdev /dev/vdf errs: wr 0, rd 0, flush
> 0, corrupt 1412, gen 0
> [ 38.877364] BTRFS info (device vdc): scrub: started on devid 3
> [ 38.878607] BTRFS info (device vdc): scrub: started on devid 4
> [ 38.880468] BTRFS info (device vdc): scrub: started on devid 1
> [ 38.885000] BTRFS info (device vdc): scrub: started on devid 2
> [ 39.347569] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [ 39.350325] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [ 39.353158] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [ 39.355091] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
The metadata is gone, and there is only one mirror for it.
What profile are you using for metadata?
Just in case, RAID5 is not recommended for metadata due to the complex
recovery combinations:
>> The power failure safety for metadata with RAID56 is not 100%.
I'm pretty sure if you're using RAID5 for metadata, that's exactly the
case where corrupted metadata can not be properly fixed at a per-sector
basis.
Thus it's recommended to go RAID1* for metadata if you want to use RAID5
for data.
Thanks,
Qu
> [ 39.355786] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 412024832 on dev /dev/vde physical 315555840
> [ 39.356293] BTRFS warning (device vdc): checksum error at logical 412024832
> on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
> [ 39.357059] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 412024832 on dev /dev/vde physical 315555840
> [ 39.357198] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [ 39.357602] BTRFS warning (device vdc): checksum error at logical 412024832
> on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
> [ 39.359539] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [ 39.363175] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 412024832 on dev /dev/vde physical 315555840
> [ 39.364156] BTRFS warning (device vdc): checksum error at logical 412024832
> on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
> [ 39.364813] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 412024832 on dev /dev/vde physical 315555840
> [ 39.365519] BTRFS warning (device vdc): checksum error at logical 412024832
> on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
> [ 39.365838] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [ 39.368456] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [ 39.369461] BTRFS error (device vdc): unrepaired sectors detected, full
> stripe 411893760 data stripe 2 errors 0-3
> [ 39.370175] BTRFS info (device vdc): scrub: not finished on devid 4 with
> status: -5
> [ 41.231719] BTRFS error (device vdc): bad tree block start, mirror 1 want
> 412024832 have 0
> [ 41.232326] BTRFS error (device vdc): bad tree block start, mirror 2 want
> 412024832 have 0
> [ 41.232832] BTRFS error (device vdc): bad tree block start, mirror 1 want
> 412024832 have 0
> [ 41.233470] BTRFS error (device vdc): bad tree block start, mirror 2 want
> 412024832 have 0
> [ 41.234085] BTRFS info (device vdc): scrub: not finished on devid 1 with
> status: -5
> [ 41.234170] BTRFS info (device vdc): scrub: not finished on devid 2 with
> status: -5
> [ 41.234231] BTRFS info (device vdc): scrub: not finished on devid 3 with
> status: -5
> [ 44.243128] BTRFS info (device vdc): scrub: started on devid 1
> [ 44.243901] BTRFS info (device vdc): scrub: started on devid 2
> [ 44.243928] BTRFS info (device vdc): scrub: started on devid 4
> [ 44.244796] BTRFS info (device vdc): scrub: started on devid 3
> [ 44.774710] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [ 44.793802] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [ 44.797168] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [ 44.803175] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [ 44.807162] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 412024832 on dev /dev/vde physical 315555840
> [ 44.810892] BTRFS warning (device vdc): checksum error at logical 412024832
> on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
> [ 44.811443] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 412024832 on dev /dev/vde physical 315555840
> [ 44.823205] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [ 44.823540] BTRFS warning (device vdc): checksum error at logical 412024832
> on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
> [ 44.823544] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 412024832 on dev /dev/vde physical 315555840
> [ 44.823546] BTRFS warning (device vdc): checksum error at logical 412024832
> on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
> [ 44.823547] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 412024832 on dev /dev/vde physical 315555840
> [ 44.823549] BTRFS warning (device vdc): checksum error at logical 412024832
> on dev /dev/vde, physical 315555840: metadata leaf (level 0) in tree 2
> [ 44.832155] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [ 44.838895] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [ 44.844663] BTRFS warning (device vdc): tree block 412024832 mirror 1 has
> bad bytenr, has 0 want 412024832
> [ 44.845561] BTRFS error (device vdc): unrepaired sectors detected, full
> stripe 411893760 data stripe 2 errors 0-3
> [ 44.846842] BTRFS info (device vdc): scrub: not finished on devid 4 with
> status: -5
> [ 47.746767] BTRFS error (device vdc): bad tree block start, mirror 1 want
> 412024832 have 0
> [ 47.748256] BTRFS error (device vdc): bad tree block start, mirror 2 want
> 412024832 have 0
> [ 47.749069] BTRFS info (device vdc): scrub: not finished on devid 3 with
> status: -5
> [ 47.754752] BTRFS error (device vdc): bad tree block start, mirror 1 want
> 412024832 have 0
> [ 47.755952] BTRFS error (device vdc): bad tree block start, mirror 2 want
> 412024832 have 0
> [ 47.758766] BTRFS info (device vdc): scrub: not finished on devid 2 with
> status: -5
> [ 47.822683] BTRFS error (device vdc): bad tree block start, mirror 1 want
> 412024832 have 0
> [ 47.826760] BTRFS error (device vdc): bad tree block start, mirror 2 want
> 412024832 have 0
> [ 47.834688] BTRFS info (device vdc): scrub: not finished on devid 1 with
> status: -5
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: strangely uncorrectable errors with RAID-5
2024-10-20 21:01 ` Qu Wenruo
@ 2024-10-21 3:55 ` Russell Coker
2024-10-21 4:26 ` Qu Wenruo
0 siblings, 1 reply; 9+ messages in thread
From: Russell Coker @ 2024-10-21 3:55 UTC (permalink / raw)
To: linux-btrfs, Qu Wenruo
On Monday, 21 October 2024 08:01:52 AEDT Qu Wenruo wrote:
> The metadata is gone, and there is only one mirror for it.
>
> What profile are you using for metadata?
# btrfs fi df /mnt
Data, RAID5: total=4.92GiB, used=1.21GiB
System, RAID5: total=48.00MiB, used=16.00KiB
Metadata, RAID5: total=864.00MiB, used=172.77MiB
GlobalReserve, single: total=6.95MiB, used=0.00B
It's all RAID5. Why would it be all gone? Also why can't it be recreated
when diff doesn't report any file loss? Can I convert a BTRFS filesystem from
this state to an all good state?
As an aside I've had something quite similar happen on a production server
running Ubuntu 20.04 and RAID-1 and now I'm just running it knowing that a
scrub will give errors but that it apparently works.
> Just in case, RAID5 is not recommended for metadata due to the complex
From what I know it's still not recommended for anything though. But
definitely if I didn't want to have data loss and I wanted RAID-5 then I'd use
RAID-1 for metadata. But in this case I was more interested in seeing how it
might break than in keeping the data intact.
> recovery combinations:
> >> The power failure safety for metadata with RAID56 is not 100%.
>
> I'm pretty sure if you're using RAID5 for metadata, that's exactly the
> case where corrupted metadata can not be properly fixed at a per-sector
> basis.
>
> Thus it's recommended to go RAID1* for metadata if you want to use RAID5
> for data.
OK. I'll run some new tests on RAID1 metadata and RAID5 data and see how that
goes.
Is there any way of recovering the filesystem with those errors or is mkfs
required?
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: strangely uncorrectable errors with RAID-5
2024-10-21 3:55 ` Russell Coker
@ 2024-10-21 4:26 ` Qu Wenruo
2025-03-14 12:27 ` Russell Coker
0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2024-10-21 4:26 UTC (permalink / raw)
To: Russell Coker, linux-btrfs, Qu Wenruo
在 2024/10/21 14:25, Russell Coker 写道:
> On Monday, 21 October 2024 08:01:52 AEDT Qu Wenruo wrote:
>> The metadata is gone, and there is only one mirror for it.
>>
>> What profile are you using for metadata?
>
> # btrfs fi df /mnt
> Data, RAID5: total=4.92GiB, used=1.21GiB
> System, RAID5: total=48.00MiB, used=16.00KiB
> Metadata, RAID5: total=864.00MiB, used=172.77MiB
> GlobalReserve, single: total=6.95MiB, used=0.00B
>
> It's all RAID5. Why would it be all gone? Also why can't it be recreated
> when diff doesn't report any file loss? Can I convert a BTRFS filesystem from
> this state to an all good state?
Your metadata is corrupted, no better solution other than doing data
salvage (-o rescue=all,ro, copy whatever data you can/want)
>
> As an aside I've had something quite similar happen on a production server
> running Ubuntu 20.04 and RAID-1 and now I'm just running it knowing that a
> scrub will give errors but that it apparently works.
RAID1 is completely another story.
Without the complex RAID5 write-hole problems, it's very reliable.
>
>> Just in case, RAID5 is not recommended for metadata due to the complex
>
> From what I know it's still not recommended for anything though. But
> definitely if I didn't want to have data loss and I wanted RAID-5 then I'd use
> RAID-1 for metadata. But in this case I was more interested in seeing how it
> might break than in keeping the data intact.
With the recent RAID56 improves, I'd say RAID5 data + RAID1 metadata is
usable, but I'm not sure how it will survive in a production environment.
Considering we have a lot of other problems out of our control, like bad
disk flush behavior, and even hardware memory bitflips, I won't
recommend RAID5 data for now, but I believe RAID56 for data has improved
a lot.
If you want to experiment RAID1 metadata with RAID5 data and report
back, I would appreciate the effort a lot.
And from my last work on RAID56 (for data), it should survive your
random corruption script.
>
>> recovery combinations:
>> >> The power failure safety for metadata with RAID56 is not 100%.
>>
>> I'm pretty sure if you're using RAID5 for metadata, that's exactly the
>> case where corrupted metadata can not be properly fixed at a per-sector
>> basis.
>>
>> Thus it's recommended to go RAID1* for metadata if you want to use RAID5
>> for data.
>
> OK. I'll run some new tests on RAID1 metadata and RAID5 data and see how that
> goes.
>
> Is there any way of recovering the filesystem with those errors or is mkfs
> required?
Metadata is gone, and since you're randomly corrupting the fs with
metadata using RAID5, I would suggest just go mkfs, with RAID1 metadata
of course.
Thanks,
Qu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: strangely uncorrectable errors with RAID-5
2024-10-21 4:26 ` Qu Wenruo
@ 2025-03-14 12:27 ` Russell Coker
2025-03-14 16:54 ` Russell Coker
0 siblings, 1 reply; 9+ messages in thread
From: Russell Coker @ 2025-03-14 12:27 UTC (permalink / raw)
To: linux-btrfs, Qu Wenruo
On Monday, 21 October 2024 15:26:19 AEDT Qu Wenruo wrote:
> With the recent RAID56 improves, I'd say RAID5 data + RAID1 metadata is
> usable, but I'm not sure how it will survive in a production environment.
>
> Considering we have a lot of other problems out of our control, like bad
> disk flush behavior, and even hardware memory bitflips, I won't
> recommend RAID5 data for now, but I believe RAID56 for data has improved
> a lot.
>
> If you want to experiment RAID1 metadata with RAID5 data and report
> back, I would appreciate the effort a lot.
> And from my last work on RAID56 (for data), it should survive your
> random corruption script.
Just to see if anything had changed I ran the same tests with RAID-5 data and
metadata again with the Debian kernel 6.12.17-amd64 and this time got properly
uncorrectable errors in less than a minute. I don't know if the error
happened faster and worse than before because of some kernel difference, luck,
or some timing difference when running on different hardware.
The system is a Dell PowerEdge T630 with advanced ECC enabled so I don't think
that memory bitflips are an issue here.
[ 398.486860] BTRFS warning (device vdc): checksum error at logical
5073338368 on dev /dev/vdc, physical 1705771008: metadata leaf (level 0) in
tree 5
[ 398.487367] BTRFS error (device vdc): unable to fixup (regular) error at
logical 5073338368 on dev /dev/vdc physical 1705771008
[ 398.487826] BTRFS warning (device vdc): checksum error at logical
5073338368 on dev /dev/vdc, physical 1705771008: metadata leaf (level 0) in
tree 5
[ 398.488333] BTRFS error (device vdc): unable to fixup (regular) error at
logical 5073338368 on dev /dev/vdc physical 1705771008
[ 398.488792] BTRFS warning (device vdc): checksum error at logical
5073338368 on dev /dev/vdc, physical 1705771008: metadata leaf (level 0) in
tree 5
[ 398.489308] BTRFS error (device vdc): unable to fixup (regular) error at
logical 5073338368 on dev /dev/vdc physical 1705771008
[ 398.489772] BTRFS warning (device vdc): checksum error at logical
5073338368 on dev /dev/vdc, physical 1705771008: metadata leaf (level 0) in
tree 5
[ 398.490459] BTRFS error (device vdc): unable to fixup (regular) error at
logical 5073338368 on dev /dev/vdc physical 1705771008
[ 398.490926] BTRFS warning (device vdc): checksum error at logical
5073338368 on dev /dev/vdc, physical 1705771008: metadata leaf (level 0) in
tree 5
[ 398.491435] BTRFS error (device vdc): unable to fixup (regular) error at
logical 5073338368 on dev /dev/vdc physical 1705771008
[ 398.491898] BTRFS warning (device vdc): checksum error at logical
5073338368 on dev /dev/vdc, physical 1705771008: metadata leaf (level 0) in
tree 5
[ 398.492406] BTRFS error (device vdc): unable to fixup (regular) error at
logical 5073338368 on dev /dev/vdc physical 1705771008
[ 398.492868] BTRFS warning (device vdc): checksum error at logical
5073338368 on dev /dev/vdc, physical 1705771008: metadata leaf (level 0) in
tree 5
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: strangely uncorrectable errors with RAID-5
2025-03-14 12:27 ` Russell Coker
@ 2025-03-14 16:54 ` Russell Coker
2025-03-14 19:32 ` Thiago Ramon
0 siblings, 1 reply; 9+ messages in thread
From: Russell Coker @ 2025-03-14 16:54 UTC (permalink / raw)
To: linux-btrfs, Qu Wenruo
On Friday, 14 March 2025 23:27:51 AEDT Russell Coker wrote:
> Just to see if anything had changed I ran the same tests with RAID-5 data
> and metadata again with the Debian kernel 6.12.17-amd64 and this time got
> properly uncorrectable errors in less than a minute. I don't know if the
> error happened faster and worse than before because of some kernel
> difference, luck, or some timing difference when running on different
> hardware.
I did it again but with RAID-5 data and RAID-1 metadata and it took a few
hours this time but again got to an uncorrectable state.
[15653.298999] BTRFS warning (device vdc): checksum error at logical
5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
tree 5
[15653.299001] BTRFS error (device vdc): unable to fixup (regular) error at
logical 5157748736 on dev /dev/vdc physical 1673199616
[15653.299002] BTRFS warning (device vdc): checksum error at logical
5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
tree 5
[15653.299003] BTRFS error (device vdc): unable to fixup (regular) error at
logical 5157748736 on dev /dev/vdc physical 1673199616
[15653.299005] BTRFS warning (device vdc): checksum error at logical
5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
tree 5
[15653.299006] BTRFS error (device vdc): unable to fixup (regular) error at
logical 5157748736 on dev /dev/vdc physical 1673199616
[15653.299007] BTRFS warning (device vdc): checksum error at logical
5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
tree 5
[15653.299009] BTRFS error (device vdc): unable to fixup (regular) error at
logical 5157748736 on dev /dev/vdc physical 1673199616
[15653.299010] BTRFS warning (device vdc): checksum error at logical
5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
tree 5
Below is the script that broke it.
#!/bin/bash
set -e
while true ; do
for DEV in c d e f ; do
dd if=/dev/zero of=/dev/vd$DEV oseek=$((20+$RANDOM%3*1000)) bs=1024k
count=1000
sync
btrfs scrub start -B /mnt
sync
done
date
done
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: strangely uncorrectable errors with RAID-5
2025-03-14 16:54 ` Russell Coker
@ 2025-03-14 19:32 ` Thiago Ramon
2025-03-15 2:51 ` Russell Coker
2025-03-15 5:19 ` Andrei Borzenkov
0 siblings, 2 replies; 9+ messages in thread
From: Thiago Ramon @ 2025-03-14 19:32 UTC (permalink / raw)
To: russell; +Cc: linux-btrfs, Qu Wenruo
On Fri, Mar 14, 2025 at 1:54 PM Russell Coker <russell@coker.com.au> wrote:
>
> On Friday, 14 March 2025 23:27:51 AEDT Russell Coker wrote:
> > Just to see if anything had changed I ran the same tests with RAID-5 data
> > and metadata again with the Debian kernel 6.12.17-amd64 and this time got
> > properly uncorrectable errors in less than a minute. I don't know if the
> > error happened faster and worse than before because of some kernel
> > difference, luck, or some timing difference when running on different
> > hardware.
>
> I did it again but with RAID-5 data and RAID-1 metadata and it took a few
> hours this time but again got to an uncorrectable state.
>
> [15653.298999] BTRFS warning (device vdc): checksum error at logical
> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
> tree 5
> [15653.299001] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 5157748736 on dev /dev/vdc physical 1673199616
> [15653.299002] BTRFS warning (device vdc): checksum error at logical
> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
> tree 5
> [15653.299003] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 5157748736 on dev /dev/vdc physical 1673199616
> [15653.299005] BTRFS warning (device vdc): checksum error at logical
> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
> tree 5
> [15653.299006] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 5157748736 on dev /dev/vdc physical 1673199616
> [15653.299007] BTRFS warning (device vdc): checksum error at logical
> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
> tree 5
> [15653.299009] BTRFS error (device vdc): unable to fixup (regular) error at
> logical 5157748736 on dev /dev/vdc physical 1673199616
> [15653.299010] BTRFS warning (device vdc): checksum error at logical
> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
> tree 5
>
> Below is the script that broke it.
>
> #!/bin/bash
> set -e
> while true ; do
> for DEV in c d e f ; do
> dd if=/dev/zero of=/dev/vd$DEV oseek=$((20+$RANDOM%3*1000)) bs=1024k
> count=1000
> sync
> btrfs scrub start -B /mnt
> sync
> done
> date
> done
Your script is writing randomly to ALL your disks. You just need to
get lucky for them to overwrite the same sector in 2 different disks
to ruin RAID5 and RAID1. If you want to stress test the filesystem you
need to pay more attention to what you're doing.
>
> --
> My Main Blog http://etbe.coker.com.au/
> My Documents Blog http://doc.coker.com.au/
>
>
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: strangely uncorrectable errors with RAID-5
2025-03-14 19:32 ` Thiago Ramon
@ 2025-03-15 2:51 ` Russell Coker
2025-03-15 5:19 ` Andrei Borzenkov
1 sibling, 0 replies; 9+ messages in thread
From: Russell Coker @ 2025-03-15 2:51 UTC (permalink / raw)
To: Thiago Ramon; +Cc: linux-btrfs, Qu Wenruo
On Saturday, 15 March 2025 06:32:46 AEDT Thiago Ramon wrote:
> > while true ; do
> >
> > for DEV in c d e f ; do
> >
> > dd if=/dev/zero of=/dev/vd$DEV oseek=$((20+$RANDOM%3*1000)) bs=1024k
> >
> > count=1000
> >
> > sync
> > btrfs scrub start -B /mnt
> > sync
> >
> > done
> > date
> >
> > done
>
> Your script is writing randomly to ALL your disks. You just need to
> get lucky for them to overwrite the same sector in 2 different disks
> to ruin RAID5 and RAID1. If you want to stress test the filesystem you
> need to pay more attention to what you're doing.
Are you saying that the btrfs scrub is not designed to correct the errors?
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: strangely uncorrectable errors with RAID-5
2025-03-14 19:32 ` Thiago Ramon
2025-03-15 2:51 ` Russell Coker
@ 2025-03-15 5:19 ` Andrei Borzenkov
1 sibling, 0 replies; 9+ messages in thread
From: Andrei Borzenkov @ 2025-03-15 5:19 UTC (permalink / raw)
To: Thiago Ramon, russell; +Cc: linux-btrfs, Qu Wenruo
14.03.2025 22:32, Thiago Ramon wrote:
> On Fri, Mar 14, 2025 at 1:54 PM Russell Coker <russell@coker.com.au> wrote:
>>
>> On Friday, 14 March 2025 23:27:51 AEDT Russell Coker wrote:
>>> Just to see if anything had changed I ran the same tests with RAID-5 data
>>> and metadata again with the Debian kernel 6.12.17-amd64 and this time got
>>> properly uncorrectable errors in less than a minute. I don't know if the
>>> error happened faster and worse than before because of some kernel
>>> difference, luck, or some timing difference when running on different
>>> hardware.
>>
>> I did it again but with RAID-5 data and RAID-1 metadata and it took a few
>> hours this time but again got to an uncorrectable state.
>>
>> [15653.298999] BTRFS warning (device vdc): checksum error at logical
>> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
>> tree 5
>> [15653.299001] BTRFS error (device vdc): unable to fixup (regular) error at
>> logical 5157748736 on dev /dev/vdc physical 1673199616
>> [15653.299002] BTRFS warning (device vdc): checksum error at logical
>> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
>> tree 5
>> [15653.299003] BTRFS error (device vdc): unable to fixup (regular) error at
>> logical 5157748736 on dev /dev/vdc physical 1673199616
>> [15653.299005] BTRFS warning (device vdc): checksum error at logical
>> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
>> tree 5
>> [15653.299006] BTRFS error (device vdc): unable to fixup (regular) error at
>> logical 5157748736 on dev /dev/vdc physical 1673199616
>> [15653.299007] BTRFS warning (device vdc): checksum error at logical
>> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
>> tree 5
>> [15653.299009] BTRFS error (device vdc): unable to fixup (regular) error at
>> logical 5157748736 on dev /dev/vdc physical 1673199616
>> [15653.299010] BTRFS warning (device vdc): checksum error at logical
>> 5157748736 on dev /dev/vdc, physical 1673199616: metadata leaf (level 0) in
>> tree 5
>>
>> Below is the script that broke it.
>>
>> #!/bin/bash
>> set -e
>> while true ; do
>> for DEV in c d e f ; do
>> dd if=/dev/zero of=/dev/vd$DEV oseek=$((20+$RANDOM%3*1000)) bs=1024k
>> count=1000
>> sync
>> btrfs scrub start -B /mnt
>> sync
>> done
>> date
>> done
> Your script is writing randomly to ALL your disks. You just need to
btrfs scrub start -B is supposed to work synchronously. At each
iteration only one disk gets overwritten and then scrub runs to completion.
> get lucky for them to overwrite the same sector in 2 different disks
> to ruin RAID5 and RAID1. If you want to stress test the filesystem you
> need to pay more attention to what you're doing.
>>
>> --
>> My Main Blog http://etbe.coker.com.au/
>> My Documents Blog http://doc.coker.com.au/
>>
>>
>>
>>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-03-15 5:19 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-20 10:09 strangely uncorrectable errors with RAID-5 Russell Coker
2024-10-20 21:01 ` Qu Wenruo
2024-10-21 3:55 ` Russell Coker
2024-10-21 4:26 ` Qu Wenruo
2025-03-14 12:27 ` Russell Coker
2025-03-14 16:54 ` Russell Coker
2025-03-14 19:32 ` Thiago Ramon
2025-03-15 2:51 ` Russell Coker
2025-03-15 5:19 ` Andrei Borzenkov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.