linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* "Corrected" errors persist after scrubbing
@ 2017-05-06 10:33 Tom Hale
  2017-05-08 19:26 ` Chris Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Tom Hale @ 2017-05-06 10:33 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2534 bytes --]

Below (and also attached because of formatting) is an example of `btrfs
scrub` incorrectly reporting that errors have been corrected.

In this example, /dev/md127 is the device created by running:
mdadm --build /dev/md0 --level=faulty --raid-devices=1 /dev/loop0

The filesystem is RAID1.

# mdadm --grow /dev/md0 --layout=rp400
layout for /dev/md0 set to 12803
# btrfs scrub start -Bd /mnt/tmp
scrub device /dev/md127 (id 1) done
        scrub started at Fri May  5 19:23:54 2017 and finished after
00:00:01
        total bytes scrubbed: 200.47MiB with 8 errors
        error details: read=8
        corrected errors: 8, uncorrectable errors: 0, unverified errors: 248
scrub device /dev/loop1 (id 2) done
        scrub started at Fri May  5 19:23:54 2017 and finished after
00:00:01
        total bytes scrubbed: 200.47MiB with 0 errors
WARNING: errors detected during scrubbing, corrected
# ### But the errors haven't really been corrected, they're still there:
# mdadm --grow /dev/md0 --layout=clear # Stop producing additional errors
layout for /dev/md0 set to 31
# btrfs scrub start -Bd /mnt/tmp
scrub device /dev/md127 (id 1) done
        scrub started at Fri May  5 19:24:24 2017 and finished after
00:00:00
        total bytes scrubbed: 200.47MiB with 8 errors
        error details: read=8
        corrected errors: 8, uncorrectable errors: 0, unverified errors: 248
scrub device /dev/loop1 (id 2) done
        scrub started at Fri May  5 19:24:24 2017 and finished after
00:00:00
        total bytes scrubbed: 200.47MiB with 0 errors
WARNING: errors detected during scrubbing, corrected
#

Since scrub is checking for read issues, I expect that it would read any
corrections before asserting that they have indeed been corrected.

I understand that HDDs have a pool of non-LBA-addressable sectors set
aside to mask bad physical sectors, but this pool size is fixed by the
manufacturer (who makes money from sales of new drives).

However, I don't believe it is sufficient to blindly trust that the
underlying  HDD still has spare reallocatable sectors or that the
hardware will always correctly write data, given the verification and
fixing intention of scrub.

At a minimum, shouldn't these 8 "corrected errors" be listed as
"uncorrectable errors" to inform the sysadmin that data integrity has
degraded (e.g. in this RAID1 example the data is no longer duplicated)?

Ideally, I would hope that the blocks with uncorrectable errors are
marked as bad and fresh blocks are used to maintain integrity.

-- 
Regards,

Tom Hale

[-- Attachment #2: btrfs-scrub --]
[-- Type: text/plain, Size: 1309 bytes --]

# mdadm --grow /dev/md0 --layout=rp400
layout for /dev/md0 set to 12803
# btrfs scrub start -Bd /mnt/tmp
scrub device /dev/md127 (id 1) done
        scrub started at Fri May  5 19:23:54 2017 and finished after 00:00:01
        total bytes scrubbed: 200.47MiB with 8 errors
        error details: read=8
        corrected errors: 8, uncorrectable errors: 0, unverified errors: 248
scrub device /dev/loop1 (id 2) done
        scrub started at Fri May  5 19:23:54 2017 and finished after 00:00:01
        total bytes scrubbed: 200.47MiB with 0 errors
WARNING: errors detected during scrubbing, corrected
# ### But the errors haven't really been corrected, they're still there:
# mdadm --grow /dev/md0 --layout=clear # Stop producing additional errors
layout for /dev/md0 set to 31
# btrfs scrub start -Bd /mnt/tmp
scrub device /dev/md127 (id 1) done
        scrub started at Fri May  5 19:24:24 2017 and finished after 00:00:00
        total bytes scrubbed: 200.47MiB with 8 errors
        error details: read=8
        corrected errors: 8, uncorrectable errors: 0, unverified errors: 248
scrub device /dev/loop1 (id 2) done
        scrub started at Fri May  5 19:24:24 2017 and finished after 00:00:00
        total bytes scrubbed: 200.47MiB with 0 errors
WARNING: errors detected during scrubbing, corrected
#

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-05-16 18:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-06 10:33 "Corrected" errors persist after scrubbing Tom Hale
2017-05-08 19:26 ` Chris Murphy
2017-05-09  5:15   ` Duncan
2017-05-16  9:53   ` Tom Hale
2017-05-16 11:27     ` Austin S. Hemmelgarn
2017-05-16 18:55     ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).