"Corrected" errors persist after scrubbing

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* "Corrected" errors persist after scrubbing
@ 2017-05-06 10:33 Tom Hale
  2017-05-08 19:26 ` Chris Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Tom Hale @ 2017-05-06 10:33 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2534 bytes --]

Below (and also attached because of formatting) is an example of `btrfs
scrub` incorrectly reporting that errors have been corrected.

In this example, /dev/md127 is the device created by running:
mdadm --build /dev/md0 --level=faulty --raid-devices=1 /dev/loop0

The filesystem is RAID1.

# mdadm --grow /dev/md0 --layout=rp400
layout for /dev/md0 set to 12803
# btrfs scrub start -Bd /mnt/tmp
scrub device /dev/md127 (id 1) done
        scrub started at Fri May  5 19:23:54 2017 and finished after
00:00:01
        total bytes scrubbed: 200.47MiB with 8 errors
        error details: read=8
        corrected errors: 8, uncorrectable errors: 0, unverified errors: 248
scrub device /dev/loop1 (id 2) done
        scrub started at Fri May  5 19:23:54 2017 and finished after
00:00:01
        total bytes scrubbed: 200.47MiB with 0 errors
WARNING: errors detected during scrubbing, corrected
# ### But the errors haven't really been corrected, they're still there:
# mdadm --grow /dev/md0 --layout=clear # Stop producing additional errors
layout for /dev/md0 set to 31
# btrfs scrub start -Bd /mnt/tmp
scrub device /dev/md127 (id 1) done
        scrub started at Fri May  5 19:24:24 2017 and finished after
00:00:00
        total bytes scrubbed: 200.47MiB with 8 errors
        error details: read=8
        corrected errors: 8, uncorrectable errors: 0, unverified errors: 248
scrub device /dev/loop1 (id 2) done
        scrub started at Fri May  5 19:24:24 2017 and finished after
00:00:00
        total bytes scrubbed: 200.47MiB with 0 errors
WARNING: errors detected during scrubbing, corrected
#

Since scrub is checking for read issues, I expect that it would read any
corrections before asserting that they have indeed been corrected.

I understand that HDDs have a pool of non-LBA-addressable sectors set
aside to mask bad physical sectors, but this pool size is fixed by the
manufacturer (who makes money from sales of new drives).

However, I don't believe it is sufficient to blindly trust that the
underlying  HDD still has spare reallocatable sectors or that the
hardware will always correctly write data, given the verification and
fixing intention of scrub.

At a minimum, shouldn't these 8 "corrected errors" be listed as
"uncorrectable errors" to inform the sysadmin that data integrity has
degraded (e.g. in this RAID1 example the data is no longer duplicated)?

Ideally, I would hope that the blocks with uncorrectable errors are
marked as bad and fresh blocks are used to maintain integrity.

-- 
Regards,

Tom Hale

[-- Attachment #2: btrfs-scrub --]
[-- Type: text/plain, Size: 1309 bytes --]

# mdadm --grow /dev/md0 --layout=rp400
layout for /dev/md0 set to 12803
# btrfs scrub start -Bd /mnt/tmp
scrub device /dev/md127 (id 1) done
        scrub started at Fri May  5 19:23:54 2017 and finished after 00:00:01
        total bytes scrubbed: 200.47MiB with 8 errors
        error details: read=8
        corrected errors: 8, uncorrectable errors: 0, unverified errors: 248
scrub device /dev/loop1 (id 2) done
        scrub started at Fri May  5 19:23:54 2017 and finished after 00:00:01
        total bytes scrubbed: 200.47MiB with 0 errors
WARNING: errors detected during scrubbing, corrected
# ### But the errors haven't really been corrected, they're still there:
# mdadm --grow /dev/md0 --layout=clear # Stop producing additional errors
layout for /dev/md0 set to 31
# btrfs scrub start -Bd /mnt/tmp
scrub device /dev/md127 (id 1) done
        scrub started at Fri May  5 19:24:24 2017 and finished after 00:00:00
        total bytes scrubbed: 200.47MiB with 8 errors
        error details: read=8
        corrected errors: 8, uncorrectable errors: 0, unverified errors: 248
scrub device /dev/loop1 (id 2) done
        scrub started at Fri May  5 19:24:24 2017 and finished after 00:00:00
        total bytes scrubbed: 200.47MiB with 0 errors
WARNING: errors detected during scrubbing, corrected
#

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "Corrected" errors persist after scrubbing
  2017-05-06 10:33 "Corrected" errors persist after scrubbing Tom Hale
@ 2017-05-08 19:26 ` Chris Murphy
  2017-05-09  5:15   ` Duncan
  2017-05-16  9:53   ` Tom Hale
  0 siblings, 2 replies; 6+ messages in thread
From: Chris Murphy @ 2017-05-08 19:26 UTC (permalink / raw)
  To: Tom Hale; +Cc: Btrfs BTRFS

On Sat, May 6, 2017 at 4:33 AM, Tom Hale <tom@hale.ee> wrote:
> Below (and also attached because of formatting) is an example of `btrfs
> scrub` incorrectly reporting that errors have been corrected.
>
> In this example, /dev/md127 is the device created by running:
> mdadm --build /dev/md0 --level=faulty --raid-devices=1 /dev/loop0
>
> The filesystem is RAID1.
>
> # mdadm --grow /dev/md0 --layout=rp400
> layout for /dev/md0 set to 12803
> # btrfs scrub start -Bd /mnt/tmp
> scrub device /dev/md127 (id 1) done
>         scrub started at Fri May  5 19:23:54 2017 and finished after
> 00:00:01
>         total bytes scrubbed: 200.47MiB with 8 errors
>         error details: read=8
>         corrected errors: 8, uncorrectable errors: 0, unverified errors: 248
> scrub device /dev/loop1 (id 2) done
>         scrub started at Fri May  5 19:23:54 2017 and finished after
> 00:00:01
>         total bytes scrubbed: 200.47MiB with 0 errors
> WARNING: errors detected during scrubbing, corrected
> # ### But the errors haven't really been corrected, they're still there:
> # mdadm --grow /dev/md0 --layout=clear # Stop producing additional errors
> layout for /dev/md0 set to 31
> # btrfs scrub start -Bd /mnt/tmp
> scrub device /dev/md127 (id 1) done
>         scrub started at Fri May  5 19:24:24 2017 and finished after
> 00:00:00
>         total bytes scrubbed: 200.47MiB with 8 errors
>         error details: read=8
>         corrected errors: 8, uncorrectable errors: 0, unverified errors: 248
> scrub device /dev/loop1 (id 2) done
>         scrub started at Fri May  5 19:24:24 2017 and finished after
> 00:00:00
>         total bytes scrubbed: 200.47MiB with 0 errors
> WARNING: errors detected during scrubbing, corrected
> #


What are the complete kernel messages for the scrub event? This should
show what problem Btrfs detects and how it fixes it, and what sectors
it's fixing each time.



>
> Since scrub is checking for read issues, I expect that it would read any
> corrections before asserting that they have indeed been corrected.

Read errors are fixed by overwrites. If the underlying device doesn't
report an error for the write command, it's assumed to succeed. Even
md and LVM raid's do this.

>
> I understand that HDDs have a pool of non-LBA-addressable sectors set
> aside to mask bad physical sectors, but this pool size is fixed by the
> manufacturer (who makes money from sales of new drives).
>
> However, I don't believe it is sufficient to blindly trust that the
> underlying  HDD still has spare reallocatable sectors or that the
> hardware will always correctly write data, given the verification and
> fixing intention of scrub.

If the is no spare, the drive by spec is supposed to report a write
error. The original sector fails write, and there are no more spares,
that means the write can only fail.

In Btrfs land, it keeps trying to write, I don't think it has a limit.
In md land, this used to be fatal, the block device would be ejected
from the array and not used anymore; but I think there's somewhat
recent patches that optionally allow a badblock map to exist for
tracking sectors that fail writes, and md simply won't use them and
will do it's own remapping.


>
> At a minimum, shouldn't these 8 "corrected errors" be listed as
> "uncorrectable errors" to inform the sysadmin that data integrity has
> degraded (e.g. in this RAID1 example the data is no longer duplicated)?
>
> Ideally, I would hope that the blocks with uncorrectable errors are
> marked as bad and fresh blocks are used to maintain integrity.

The only way they are corrected is if the Btrfs overwrite to fix bad
reads (either a csum error, or a device read error) does not result in
a device write error, then it is corrected. I haven't tried simulating
persistent write failure to see how Btrfs behaves but my recollection
is it just keeps trying and does not eject that device. Nor does it
track bad sectors.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "Corrected" errors persist after scrubbing
  2017-05-08 19:26 ` Chris Murphy
@ 2017-05-09  5:15   ` Duncan
  2017-05-16  9:53   ` Tom Hale
  1 sibling, 0 replies; 6+ messages in thread
From: Duncan @ 2017-05-09  5:15 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Mon, 08 May 2017 13:26:16 -0600 as excerpted:

> On Sat, May 6, 2017 at 4:33 AM, Tom Hale <tom@hale.ee> wrote:
>> Below (and also attached because of formatting) is an example of `btrfs
>> scrub` incorrectly reporting that errors have been corrected.
>>
>> In this example, /dev/md127 is the device created by running:
>> mdadm --build /dev/md0 --level=faulty --raid-devices=1 /dev/loop0
>>
>> The filesystem is RAID1.
>>
>> # mdadm --grow /dev/md0 --layout=rp400
>> layout for /dev/md0 set to 12803
>> # btrfs scrub start -Bd /mnt/tmp
>> scrub device /dev/md127 (id 1) done
>>         scrub started at Fri May  5 19:23:54 2017 and finished after
>> 00:00:01
>>         total bytes scrubbed: 200.47MiB with 8 errors
>>         error details: read=8
>>         corrected errors: 8, uncorrectable errors: 0, unverified errors: 248
>> scrub device /dev/loop1 (id 2) done
>>         scrub started at Fri May  5 19:23:54 2017 and finished after
>> 00:00:01
>>         total bytes scrubbed: 200.47MiB with 0 errors
>> WARNING: errors detected during scrubbing, corrected
>> # ### But the errors haven't really been corrected, they're still there:
>> # mdadm --grow /dev/md0 --layout=clear # Stop producing additional errors
>> layout for /dev/md0 set to 31
>> # btrfs scrub start -Bd /mnt/tmp
>> scrub device /dev/md127 (id 1) done
>>         scrub started at Fri May  5 19:24:24 2017 and finished after
>> 00:00:00
>>         total bytes scrubbed: 200.47MiB with 8 errors
>>         error details: read=8
>>         corrected errors: 8, uncorrectable errors: 0, unverified errors: 248
>> scrub device /dev/loop1 (id 2) done
>>         scrub started at Fri May  5 19:24:24 2017 and finished after
>> 00:00:00
>>         total bytes scrubbed: 200.47MiB with 0 errors
>> WARNING: errors detected during scrubbing, corrected
>> #
> 
> 
> What are the complete kernel messages for the scrub event? This should
> show what problem Btrfs detects and how it fixes it, and what sectors
> it's fixing each time.

I'm also wondering what version of kernel and btrfs-progs are being
used here.  For two reasons:

First:

AFAIK newer code shouldn't report unverified, which was originally
reported for blocks where the checksums of the blocks containing the
checksums of the unverified errors were in bad blocks.  IOW, the lower
branches of the tree couldn't be checked because higher ones were
still being repaired.

Back then, in ordered to fix such errors, one had to do multiple passes
manually, until there were no more unverified errors, each pass fixing
errors at one level so the levels below it could actually be checked in
the next pass.

I know because I had a pair of ssds where I deliberately kept an ssd
that was going bad in the btrfs raid1 pair, in ordered to see how things
worked over time.  So I got quite some experience running and rerunning
scrubs until all the errors were corrected after multiple passes!

But newer versions catch that problem and I believe actually use the
second copy for the verifications, so as long as there's no uncorrectable,
there should be no unverified, either.  (Either that or they do multiple
passes automatically, like I used to do manually.  I'm not sure which
except the former should be simpler and faster so I suspect that's what's
done.)

So the fact that there's unverified errors reported hints to me that
the used versions may be old.  Either that, or it's a different mechanism
generating the unverified, that I'm not familiar with and that doesn't
get corrected from the other copy or via multipass like the ones I have
experience with do.  But that's why I say "hint".

Second:

AFAIK there was a short period around kernel 4.10 and early 4.11-rcs where
read-errors were indeed not being corrected properly.  To my knowledge this
was in operation, not scrub, but perhaps certain scrub cases were affected
as well.

AFAIK this problem is entirely fixed in 4.11 release, and presumably in the
4.10 stables, and I don't believe 4.8 and earlier were affected at all (but
I'm not sure about 4.9, I /think/ it was before the regression, but some
4.9-stable releases /might/ be affected), but whatever the OP's running
/might/ just be in that gap, it'd take a dev or someone following those
specific patches closer than I did to know specifically what's affected
and thus be able to say for sure, if the OP's running something 4.9 or 4.10,
or early 4.11-rcs, but not the latest 4.11 release.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "Corrected" errors persist after scrubbing
  2017-05-08 19:26 ` Chris Murphy
  2017-05-09  5:15   ` Duncan
@ 2017-05-16  9:53   ` Tom Hale
  2017-05-16 11:27     ` Austin S. Hemmelgarn
  2017-05-16 18:55     ` Chris Murphy
  1 sibling, 2 replies; 6+ messages in thread
From: Tom Hale @ 2017-05-16  9:53 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 5120 bytes --]

Hi Chris,

On 09/05/17 02:26, Chris Murphy wrote:
> Read errors are fixed by overwrites. If the underlying device doesn't
> report an error for the write command, it's assumed to succeed. Even
> md and LVM raid's do this.

I understand assuming writes succeed in general. However, for a tool
which says (in its usage):

"verify checksums of data and metadata"

I would expect that the tool reports the *final state* of the data:

* Because the explicit function of scrub is to verify that all data can
be read.

* In a RAID1 situation, the sysadmin could think that the there are two
valid copies of all data, when in fact there may only be one.

> What are the complete kernel messages for the scrub event? This should
> show what problem Btrfs detects and how it fixes it, and what sectors
> it's fixing each time.

Attached are the kernel logs (and the `grep fixed` version) showing the
same logical blocks are "fixed" twice.

Below (and attached because of formatting) I show how to reproduce the
uncorrected errors, producing the attached logs.

Please reply-all as I'm not subscribed.

----------------------------------------------------

$ pacman -Q btrfs-progs
btrfs-progs 4.10.2-1
$ uname -r
4.9.24-1-MANJARO

# fallocate -l 500M good-disk
# fallocate -l 500M bad-disk
# losetup -f bad-disk # loop0
# losetup -f good-disk # loop1
# mdadm --create -v /dev/md0 --level=linear --force --raid-devices=1
/dev/loop0
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
# mkfs.btrfs -m raid1 -d raid1 /dev/loop1 /dev/md0
btrfs-progs v4.10.2
See http://btrfs.wiki.kernel.org for more information.

Performing full device TRIM /dev/loop1 (500.00MiB) ...
Performing full device TRIM /dev/md0 (499.73MiB) ...
Label:              (null)
UUID:               d748537b-3fa4-47cc-9934-62cf391fb638
Node size:          16384
Sector size:        4096
Filesystem size:    999.73MiB
Block group profiles:
  Data:             RAID1            64.00MiB
  Metadata:         RAID1            49.94MiB
  System:           RAID1             8.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  2
Devices:
   ID        SIZE  PATH
    1   500.00MiB  /dev/loop1
    2   499.73MiB  /dev/md0
# mount /dev/md0 /mnt/tmp
# dd if=/dev/urandom of=/mnt/tmp/rand bs=1M
dd: error writing '/mnt/tmp/rand': No space left on device
441+0 records in
440+0 records out
461963264 bytes (462 MB, 441 MiB) copied, 5.74358 s, 80.4 MB/s
# umount /mnt/tmp
# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
# mdadm --build /dev/md0 --level=faulty --raid-devices=1 /dev/loop0
mdadm: array /dev/md0 built and started.
    ├─vg_svelte-swap   swap              suspend
0792d327-566d-4306-93f4-810e247827e4   [SWAP]
    └─vg_svelte-rootfs btrfs             root
a1d2a0b4-8d4b-4c1c-9c87-5a55de61918e   /mnt/btrfs-vol/rootfs
# lsblk -f | head -n4
NAME                   FSTYPE            LABEL    UUID
                MOUNTPOINT
loop0                  linux_raid_member svelte:0
f7019aa8-3b2d-4e4c-9106-635d0dddca78
└─md0                  linux_raid_member svelte:0
f7019aa8-3b2d-4e4c-9106-635d0dddca78
  └─md127              btrfs
d748537b-3fa4-47cc-9934-62cf391fb638
# mount /dev/md127 /mnt/tmp
# btrfs scrub start -Bd /mnt/tmp
scrub device /dev/loop1 (id 1) done
        scrub started at Tue May 16 13:04:50 2017 and finished after
00:00:03
        total bytes scrubbed: 441.22MiB with 0 errors
scrub device /dev/md127 (id 2) done
        scrub started at Tue May 16 13:04:50 2017 and finished after
00:00:02
        total bytes scrubbed: 441.22MiB with 0 errors
# # Introduce errors in /dev/md127:
# mdadm --grow /dev/md0 --layout=rp400
layout for /dev/md0 set to 12803
# btrfs scrub start -Bd /mnt/tmp


scrub device /dev/loop1 (id 1) done
        scrub started at Tue May 16 13:07:57 2017 and finished after
00:00:02
        total bytes scrubbed: 441.22MiB with 0 errors
scrub device /dev/md127 (id 2) done
        scrub started at Tue May 16 13:07:57 2017 and finished after
00:00:03
        total bytes scrubbed: 441.22MiB with 19 errors
        error details: read=19
        corrected errors: 19, uncorrectable errors: 0, unverified
errors: 589
WARNING: errors detected during scrubbing, corrected
# # Stop producing additional errors, keeping the already existing
badblocks:
# mdadm --grow /dev/md0 --layout=clear
layout for /dev/md0 set to 31
# # Run scrub for 2nd time:
# btrfs scrub start -Bd /mnt/tmp


scrub device /dev/loop1 (id 1) done
        scrub started at Tue May 16 13:10:19 2017 and finished after
00:00:03
        total bytes scrubbed: 441.22MiB with 0 errors
scrub device /dev/md127 (id 2) done
        scrub started at Tue May 16 13:10:19 2017 and finished after
00:00:03
        total bytes scrubbed: 441.22MiB with 19 errors
        error details: read=19
        corrected errors: 19, uncorrectable errors: 0, unverified
errors: 589
WARNING: errors detected during scrubbing, corrected
# # Clean up
# umount /mnt/tmp
# mdadm --stop /dev/md127
mdadm: stopped /dev/md127
# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
# losetup -d /dev/loop[01]
#

-- 

Tom Hale

[-- Attachment #2: btrfs-fixed --]
[-- Type: text/plain, Size: 2260 bytes --]

May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 132972544 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 107016192 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 340852736 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 366804992 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 392626176 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 314896384 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 418050048 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 434962432 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 486998016 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 512688128 on dev /dev/md127
May 16 13:10:20 svelte kernel: BTRFS error (device md127): fixed up error at logical 132972544 on dev /dev/md127
May 16 13:10:20 svelte kernel: BTRFS error (device md127): fixed up error at logical 107016192 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 340852736 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 366804992 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 392626176 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 314896384 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 418050048 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 434962432 on dev /dev/md127
May 16 13:10:22 svelte kernel: BTRFS error (device md127): fixed up error at logical 486998016 on dev /dev/md127
May 16 13:10:22 svelte kernel: BTRFS error (device md127): fixed up error at logical 512688128 on dev /dev/md127

[-- Attachment #3: btrfs-log --]
[-- Type: text/plain, Size: 8841 bytes --]

May 16 13:04:30 svelte kernel: BTRFS info (device md127): disk space caching is enabled
May 16 13:04:30 svelte kernel: BTRFS info (device md127): has skinny extents
May 16 13:07:57 svelte kernel: BTRFS warning (device md127): i/o error at logical 132972544 on dev /dev/md127, sector 220800, root 5, inode 261, offset 305774592, length 4096, links 1 (path: rand)
May 16 13:07:58 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
May 16 13:07:58 svelte kernel: BTRFS warning (device md127): i/o error at logical 107016192 on dev /dev/md127, sector 170104, root 5, inode 261, offset 278917120, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 132972544 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 107016192 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 340852736 on dev /dev/md127, sector 313984, root 5, inode 261, offset 196018176, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 366804992 on dev /dev/md127, sector 364672, root 5, inode 261, offset 221970432, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 340852736 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 392626176 on dev /dev/md127, sector 415104, root 5, inode 261, offset 247791616, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 366804992 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 392626176 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 314896384 on dev /dev/md127, sector 263288, root 5, inode 261, offset 170061824, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 418050048 on dev /dev/md127, sector 464760, root 5, inode 261, offset 432599040, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 314896384 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 418050048 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 434962432 on dev /dev/md127, sector 497792, root 5, inode 261, offset 317849600, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 434962432 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 486998016 on dev /dev/md127, sector 599424, root 5, inode 261, offset 369885184, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 486998016 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 512688128 on dev /dev/md127, sector 649600, root 5, inode 261, offset 395575296, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 512688128 on dev /dev/md127
May 16 13:10:20 svelte kernel: BTRFS warning (device md127): i/o error at logical 132972544 on dev /dev/md127, sector 220800, root 5, inode 261, offset 305774592, length 4096, links 1 (path: rand)
May 16 13:10:20 svelte kernel: btrfs_dev_stat_print_on_error: 9 callbacks suppressed
May 16 13:10:20 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 20, flush 0, corrupt 0, gen 0
May 16 13:10:20 svelte kernel: BTRFS warning (device md127): i/o error at logical 107016192 on dev /dev/md127, sector 170104, root 5, inode 261, offset 278917120, length 4096, links 1 (path: rand)
May 16 13:10:20 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 21, flush 0, corrupt 0, gen 0
May 16 13:10:20 svelte kernel: BTRFS error (device md127): fixed up error at logical 132972544 on dev /dev/md127
May 16 13:10:20 svelte kernel: BTRFS error (device md127): fixed up error at logical 107016192 on dev /dev/md127
May 16 13:10:20 svelte kernel: BTRFS warning (device md127): i/o error at logical 340852736 on dev /dev/md127, sector 313984, root 5, inode 261, offset 196018176, length 4096, links 1 (path: rand)
May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 22, flush 0, corrupt 0, gen 0
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 340852736 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS warning (device md127): i/o error at logical 366804992 on dev /dev/md127, sector 364672, root 5, inode 261, offset 221970432, length 4096, links 1 (path: rand)
May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 23, flush 0, corrupt 0, gen 0
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 366804992 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS warning (device md127): i/o error at logical 392626176 on dev /dev/md127, sector 415104, root 5, inode 261, offset 247791616, length 4096, links 1 (path: rand)
May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 24, flush 0, corrupt 0, gen 0
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 392626176 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS warning (device md127): i/o error at logical 314896384 on dev /dev/md127, sector 263288, root 5, inode 261, offset 170061824, length 4096, links 1 (path: rand)
May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 25, flush 0, corrupt 0, gen 0
May 16 13:10:21 svelte kernel: BTRFS warning (device md127): i/o error at logical 418050048 on dev /dev/md127, sector 464760, root 5, inode 261, offset 432599040, length 4096, links 1 (path: rand)
May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 26, flush 0, corrupt 0, gen 0
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 314896384 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 418050048 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS warning (device md127): i/o error at logical 434962432 on dev /dev/md127, sector 497792, root 5, inode 261, offset 317849600, length 4096, links 1 (path: rand)
May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 27, flush 0, corrupt 0, gen 0
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 434962432 on dev /dev/md127
May 16 13:10:22 svelte kernel: BTRFS warning (device md127): i/o error at logical 486998016 on dev /dev/md127, sector 599424, root 5, inode 261, offset 369885184, length 4096, links 1 (path: rand)
May 16 13:10:22 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 28, flush 0, corrupt 0, gen 0
May 16 13:10:22 svelte kernel: BTRFS warning (device md127): i/o error at logical 512688128 on dev /dev/md127, sector 649600, root 5, inode 261, offset 395575296, length 4096, links 1 (path: rand)
May 16 13:10:22 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 29, flush 0, corrupt 0, gen 0
May 16 13:10:22 svelte kernel: BTRFS error (device md127): fixed up error at logical 486998016 on dev /dev/md127
May 16 13:10:22 svelte kernel: BTRFS error (device md127): fixed up error at logical 512688128 on dev /dev/md127

[-- Attachment #4: script --]
[-- Type: text/plain, Size: 4264 bytes --]

# fallocate -l 500M good-disk
# fallocate -l 500M bad-disk
# losetup -f bad-disk # loop0
# losetup -f good-disk # loop1
# mdadm --create -v /dev/md0 --level=linear --force --raid-devices=1 /dev/loop0
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
# mkfs.btrfs -m raid1 -d raid1 /dev/loop1 /dev/md0
btrfs-progs v4.10.2
See http://btrfs.wiki.kernel.org for more information.

Performing full device TRIM /dev/loop1 (500.00MiB) ...
Performing full device TRIM /dev/md0 (499.73MiB) ...
Label:              (null)
UUID:               d748537b-3fa4-47cc-9934-62cf391fb638
Node size:          16384
Sector size:        4096
Filesystem size:    999.73MiB
Block group profiles:
  Data:             RAID1            64.00MiB
  Metadata:         RAID1            49.94MiB
  System:           RAID1             8.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  2
Devices:
   ID        SIZE  PATH
    1   500.00MiB  /dev/loop1
    2   499.73MiB  /dev/md0
# mount /dev/md0 /mnt/tmp
# dd if=/dev/urandom of=/mnt/tmp/rand bs=1M
dd: error writing '/mnt/tmp/rand': No space left on device
441+0 records in
440+0 records out
461963264 bytes (462 MB, 441 MiB) copied, 5.74358 s, 80.4 MB/s
# umount /mnt/tmp
# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
# mdadm --build /dev/md0 --level=faulty --raid-devices=1 /dev/loop0
mdadm: array /dev/md0 built and started.
    ├─vg_svelte-swap   swap              suspend  0792d327-566d-4306-93f4-810e247827e4   [SWAP]
    └─vg_svelte-rootfs btrfs             root     a1d2a0b4-8d4b-4c1c-9c87-5a55de61918e   /mnt/btrfs-vol/rootfs
# lsblk -f | head -n4
NAME                   FSTYPE            LABEL    UUID                                   MOUNTPOINT
loop0                  linux_raid_member svelte:0 f7019aa8-3b2d-4e4c-9106-635d0dddca78
└─md0                  linux_raid_member svelte:0 f7019aa8-3b2d-4e4c-9106-635d0dddca78
  └─md127              btrfs                      d748537b-3fa4-47cc-9934-62cf391fb638
# mount /dev/md127 /mnt/tmp
# btrfs scrub start -Bd /mnt/tmp
scrub device /dev/loop1 (id 1) done
        scrub started at Tue May 16 13:04:50 2017 and finished after 00:00:03
        total bytes scrubbed: 441.22MiB with 0 errors
scrub device /dev/md127 (id 2) done
        scrub started at Tue May 16 13:04:50 2017 and finished after 00:00:02
        total bytes scrubbed: 441.22MiB with 0 errors
# # Introduce errors in /dev/md127:
# mdadm --grow /dev/md0 --layout=rp400
layout for /dev/md0 set to 12803
# btrfs scrub start -Bd /mnt/tmp                                                                                                                                                              
scrub device /dev/loop1 (id 1) done
        scrub started at Tue May 16 13:07:57 2017 and finished after 00:00:02
        total bytes scrubbed: 441.22MiB with 0 errors
scrub device /dev/md127 (id 2) done
        scrub started at Tue May 16 13:07:57 2017 and finished after 00:00:03
        total bytes scrubbed: 441.22MiB with 19 errors
        error details: read=19
        corrected errors: 19, uncorrectable errors: 0, unverified errors: 589
WARNING: errors detected during scrubbing, corrected
# # Stop producing additional errors, keeping the already existing badblocks:
# mdadm --grow /dev/md0 --layout=clear
layout for /dev/md0 set to 31
# # Run scrub for 2nd time:
# btrfs scrub start -Bd /mnt/tmp                                                                                                                                                              
scrub device /dev/loop1 (id 1) done
        scrub started at Tue May 16 13:10:19 2017 and finished after 00:00:03
        total bytes scrubbed: 441.22MiB with 0 errors
scrub device /dev/md127 (id 2) done
        scrub started at Tue May 16 13:10:19 2017 and finished after 00:00:03
        total bytes scrubbed: 441.22MiB with 19 errors
        error details: read=19
        corrected errors: 19, uncorrectable errors: 0, unverified errors: 589
WARNING: errors detected during scrubbing, corrected
# # Clean up
# umount /mnt/tmp
# mdadm --stop /dev/md127
mdadm: stopped /dev/md127
# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
# losetup -d /dev/loop[01]
# rm good-disk bad-disk


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "Corrected" errors persist after scrubbing
  2017-05-16  9:53   ` Tom Hale
@ 2017-05-16 11:27     ` Austin S. Hemmelgarn
  2017-05-16 18:55     ` Chris Murphy
  1 sibling, 0 replies; 6+ messages in thread
From: Austin S. Hemmelgarn @ 2017-05-16 11:27 UTC (permalink / raw)
  To: Tom Hale, Chris Murphy; +Cc: Btrfs BTRFS

On 2017-05-16 05:53, Tom Hale wrote:
> Hi Chris,
>
> On 09/05/17 02:26, Chris Murphy wrote:
>> Read errors are fixed by overwrites. If the underlying device doesn't
>> report an error for the write command, it's assumed to succeed. Even
>> md and LVM raid's do this.
>
> I understand assuming writes succeed in general. However, for a tool
> which says (in its usage):
Outside of dm-error and/or dm-flakey, what hardware have you seen that 
fails writes without reporting a write error?  SSD's might do it at the 
lowest level (below firmware), but the SSD firmware will notice and 
should correct it itself, and it won't happen persistently unless the 
device is essentially dead, but in general, short of the storage media 
itself failing, it is entirely safe to assume that what you told the 
device to write is what got written as long as the write commands succeeded.
>
> "verify checksums of data and metadata"
>
> I would expect that the tool reports the *final state* of the data:
Which is easy enough to do by hand without needing to make scrub do more 
work all the time (just run a read-only scrub after the first one).
>
> * Because the explicit function of scrub is to verify that all data can
> be read.
Not exactly, it's function is to verify that checksums are correct, 
verifying that there are no low-level read errors is simply a side 
effect of that.
>
> * In a RAID1 situation, the sysadmin could think that the there are two
> valid copies of all data, when in fact there may only be one.
>
>> What are the complete kernel messages for the scrub event? This should
>> show what problem Btrfs detects and how it fixes it, and what sectors
>> it's fixing each time.
>
> Attached are the kernel logs (and the `grep fixed` version) showing the
> same logical blocks are "fixed" twice.
>
> Below (and attached because of formatting) I show how to reproduce the
> uncorrected errors, producing the attached logs.
>
> Please reply-all as I'm not subscribed.
>
> ----------------------------------------------------
>
> $ pacman -Q btrfs-progs
> btrfs-progs 4.10.2-1
> $ uname -r
> 4.9.24-1-MANJARO
>
> # fallocate -l 500M good-disk
> # fallocate -l 500M bad-disk
> # losetup -f bad-disk # loop0
> # losetup -f good-disk # loop1
> # mdadm --create -v /dev/md0 --level=linear --force --raid-devices=1
> /dev/loop0
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md0 started.
> # mkfs.btrfs -m raid1 -d raid1 /dev/loop1 /dev/md0
> btrfs-progs v4.10.2
> See http://btrfs.wiki.kernel.org for more information.
>
> Performing full device TRIM /dev/loop1 (500.00MiB) ...
> Performing full device TRIM /dev/md0 (499.73MiB) ...
> Label:              (null)
> UUID:               d748537b-3fa4-47cc-9934-62cf391fb638
> Node size:          16384
> Sector size:        4096
> Filesystem size:    999.73MiB
> Block group profiles:
>   Data:             RAID1            64.00MiB
>   Metadata:         RAID1            49.94MiB
>   System:           RAID1             8.00MiB
> SSD detected:       no
> Incompat features:  extref, skinny-metadata
> Number of devices:  2
> Devices:
>    ID        SIZE  PATH
>     1   500.00MiB  /dev/loop1
>     2   499.73MiB  /dev/md0
> # mount /dev/md0 /mnt/tmp
> # dd if=/dev/urandom of=/mnt/tmp/rand bs=1M
> dd: error writing '/mnt/tmp/rand': No space left on device
> 441+0 records in
> 440+0 records out
> 461963264 bytes (462 MB, 441 MiB) copied, 5.74358 s, 80.4 MB/s
> # umount /mnt/tmp
> # mdadm --stop /dev/md0
> mdadm: stopped /dev/md0
> # mdadm --build /dev/md0 --level=faulty --raid-devices=1 /dev/loop0
> mdadm: array /dev/md0 built and started.
>     ├─vg_svelte-swap   swap              suspend
> 0792d327-566d-4306-93f4-810e247827e4   [SWAP]
>     └─vg_svelte-rootfs btrfs             root
> a1d2a0b4-8d4b-4c1c-9c87-5a55de61918e   /mnt/btrfs-vol/rootfs
> # lsblk -f | head -n4
> NAME                   FSTYPE            LABEL    UUID
>                 MOUNTPOINT
> loop0                  linux_raid_member svelte:0
> f7019aa8-3b2d-4e4c-9106-635d0dddca78
> └─md0                  linux_raid_member svelte:0
> f7019aa8-3b2d-4e4c-9106-635d0dddca78
>   └─md127              btrfs
> d748537b-3fa4-47cc-9934-62cf391fb638
> # mount /dev/md127 /mnt/tmp
> # btrfs scrub start -Bd /mnt/tmp
> scrub device /dev/loop1 (id 1) done
>         scrub started at Tue May 16 13:04:50 2017 and finished after
> 00:00:03
>         total bytes scrubbed: 441.22MiB with 0 errors
> scrub device /dev/md127 (id 2) done
>         scrub started at Tue May 16 13:04:50 2017 and finished after
> 00:00:02
>         total bytes scrubbed: 441.22MiB with 0 errors
> # # Introduce errors in /dev/md127:
> # mdadm --grow /dev/md0 --layout=rp400
> layout for /dev/md0 set to 12803
> # btrfs scrub start -Bd /mnt/tmp
>
>
> scrub device /dev/loop1 (id 1) done
>         scrub started at Tue May 16 13:07:57 2017 and finished after
> 00:00:02
>         total bytes scrubbed: 441.22MiB with 0 errors
> scrub device /dev/md127 (id 2) done
>         scrub started at Tue May 16 13:07:57 2017 and finished after
> 00:00:03
>         total bytes scrubbed: 441.22MiB with 19 errors
>         error details: read=19
>         corrected errors: 19, uncorrectable errors: 0, unverified
> errors: 589
> WARNING: errors detected during scrubbing, corrected
> # # Stop producing additional errors, keeping the already existing
> badblocks:
> # mdadm --grow /dev/md0 --layout=clear
> layout for /dev/md0 set to 31
> # # Run scrub for 2nd time:
> # btrfs scrub start -Bd /mnt/tmp
>
>
> scrub device /dev/loop1 (id 1) done
>         scrub started at Tue May 16 13:10:19 2017 and finished after
> 00:00:03
>         total bytes scrubbed: 441.22MiB with 0 errors
> scrub device /dev/md127 (id 2) done
>         scrub started at Tue May 16 13:10:19 2017 and finished after
> 00:00:03
>         total bytes scrubbed: 441.22MiB with 19 errors
>         error details: read=19
>         corrected errors: 19, uncorrectable errors: 0, unverified
> errors: 589
> WARNING: errors detected during scrubbing, corrected
> # # Clean up
> # umount /mnt/tmp
> # mdadm --stop /dev/md127
> mdadm: stopped /dev/md127
> # mdadm --stop /dev/md0
> mdadm: stopped /dev/md0
> # losetup -d /dev/loop[01]
> #
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "Corrected" errors persist after scrubbing
  2017-05-16  9:53   ` Tom Hale
  2017-05-16 11:27     ` Austin S. Hemmelgarn
@ 2017-05-16 18:55     ` Chris Murphy
  1 sibling, 0 replies; 6+ messages in thread
From: Chris Murphy @ 2017-05-16 18:55 UTC (permalink / raw)
  To: Tom Hale; +Cc: Chris Murphy, Btrfs BTRFS

On Tue, May 16, 2017 at 3:53 AM, Tom Hale <tom@hale.ee> wrote:

> # mdadm --build /dev/md0 --level=faulty --raid-devices=1 /dev/loop0
> # mdadm --grow /dev/md0 --layout=rp400
> layout for /dev/md0 set to 12803

I'm not familiar with this method of simulating problems. Everything
I've seen on this list is it's done with dm-flakey, or with the btrfs
corrupt tool. Everytime I've done that, fixups are honored and are
actually fixed. I have no idea if the above method is accepting
repairs properly. We can see from your log that Btrfs detects problems
and applies fixups. Everytime I've done this on real devices, it's
always fixed the problem. So my guess is the problem is fixed but then
the layout changes with this command

> # mdadm --grow /dev/md0 --layout=clear
> layout for /dev/md0 set to 31

And now the fixes are invalid so you get new errors again. That's just
a guess, like I said I've never seen this method of introducing faults
before so I don't know how it's behaving or what the expectations are.

The very fact Btrfs says it's sent fixups, to the specific faulty
device and sector value, and yet a 2nd scrub shows the exact same
logical errors can't be coincidence. So I suspect the methodology has
a flawed assumption, and either the fixup is not passing through
correctly or is corrupted on the way; or it's fixed but then clearing
the layout recorrupts just those fixed blocks.

>From this description:
http://linux-raid.vger.kernel.narkive.com/XFofrUH1/error-injection

>mdadm -G /dev/md0 --layout=clear
>will stop producing new errors
>mdadm -G /dev/md0 --layout=flush
>will forget all persistent errors.

I don't really understand the distinction, but it sounds like clear
might not be sufficient, that even thought fixes were applied, those
same sectors will still produce errors until flush is used.

Anyway I suspect flawed testing method.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-05-16 18:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-06 10:33 "Corrected" errors persist after scrubbing Tom Hale
2017-05-08 19:26 ` Chris Murphy
2017-05-09  5:15   ` Duncan
2017-05-16  9:53   ` Tom Hale
2017-05-16 11:27     ` Austin S. Hemmelgarn
2017-05-16 18:55     ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).