* Background corrections don't show up in device stats, and have low syslog priority. How do I reliably find out about them?
@ 2015-04-17 19:46 ivarun_ml
2015-04-17 20:35 ` Holger Hoffstätte
0 siblings, 1 reply; 3+ messages in thread
From: ivarun_ml @ 2015-04-17 19:46 UTC (permalink / raw)
To: linux-btrfs
I've been running some simple tests in a virtual machine with btrfs raid1 and I found the background correction behaviour to be a bit surprising. I've set up a raid1 and stored a big file along with its sha256sum on the volume. If I manually corrupt one of the underlying devices and run a btrfs scrub then btrfs device stats will show the corruption tally, and the system log will contain err-priority messages informing of the corruption, for example:
BTRFS: bdev /dev/sdb errs: wr 0, rd 0, flush 0, corrupt 94917, gen 0
BTRFS: fixed up error at logical 943534080 on dev /dev/sdb
But if I instead just read the file, then btrfs will still detect and correct the corruption, but the device stats are not updated, and the errors in the syslog have info-priority, making them much harder to notice. Examples:
BTRFS (device sde): bad tree block start 17335229606580733495 521674752
BTRFS info (device sde): csum failed ino 257 off 691347456 csum 4171722801 expected csum 2566472073
If a scrub is run after the file has been read, then some corruptions are still found (presumably because only parts of the file were read from that particular device), but the error count is much lower. I take this to mean that all of the corruptions that were corrected in the background have been "lost", and will not show up in the device stats even after a scrub.
Is this the intended behaviour? What would be the best practice for checking if corruption has been detected under normal operation? Would something like journalctl | grep "csum failed" be robust? Personally I would have expected these messages to have "err" priority, and for the device stats to be updated.
Here is a sample session of the kind of test I've done:
linux-57zx:~ # lsb_release -a
LSB Version: n/a
Distributor ID: openSUSE project
Description: openSUSE 20150413 (Tumbleweed) (x86_64)
Release: 20150413
Codename: n/a
linux-57zx:~ # uname -a
Linux linux-57zx.site 3.19.3-1-desktop #1 SMP PREEMPT Thu Mar 26 17:34:34 UTC 2015 (f10e7fc) x86_64 x86_64 x86_64 GNU/Linux
linux-57zx:~ # btrfs --version
btrfs-progs v3.19.1+20150325
linux-57zx:~ # btrfs fi show
Label: none uuid: 34d008ef-1d32-4177-9cab-628257ab5302
Total devices 1 FS bytes used 4.63GiB
devid 1 size 125.99GiB used 6.04GiB path /dev/sda2
Label: none uuid: 143c7163-88b1-4e07-9223-6509eae1945d
Total devices 3 FS bytes used 801.23MiB
devid 1 size 512.00MiB used 511.00MiB path /dev/sdb
devid 2 size 512.00MiB used 511.00MiB path /dev/sdc
devid 3 size 1.00GiB used 1022.00MiB path /dev/sde
btrfs-progs v3.19.1+20150325
linux-57zx:~ # btrfs fi df /mnt/raid/
System, RAID1: total=8.00MiB, used=4.00KiB
Data+Metadata, RAID1: total=1014.00MiB, used=801.23MiB
GlobalReserve, single: total=20.00MiB, used=0.00B
linux-57zx:~ # ls -lh /mnt/raid/
total 801M
-rw-r--r-- 1 root root 800M Apr 16 20:21 zerofile
-rw-r--r-- 1 root root 85 Apr 16 20:22 zerofile.sha256sum
linux-57zx:~ # dd if=/dev/urandom of=/dev/sdb bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 29.6714 s, 17.7 MB/s
linux-57zx:~ # sha256sum -c /mnt/raid/zerofile.sha256sum
/mnt/raid/zerofile: OK
linux-57zx:~ # btrfs device stats /mnt/raid/
[/dev/sdb].write_io_errs 0
[/dev/sdb].read_io_errs 0
[/dev/sdb].flush_io_errs 0
[/dev/sdb].corruption_errs 0
[/dev/sdb].generation_errs 0
[/dev/sdc].write_io_errs 0
[/dev/sdc].read_io_errs 0
[/dev/sdc].flush_io_errs 0
[/dev/sdc].corruption_errs 0
[/dev/sdc].generation_errs 0
[/dev/sde].write_io_errs 0
[/dev/sde].read_io_errs 0
[/dev/sde].flush_io_errs 0
[/dev/sde].corruption_errs 0
[/dev/sde].generation_errs 0
Regards,
Ivar Ursin Nikolaisen
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Background corrections don't show up in device stats, and have low syslog priority. How do I reliably find out about them?
2015-04-17 19:46 Background corrections don't show up in device stats, and have low syslog priority. How do I reliably find out about them? ivarun_ml
@ 2015-04-17 20:35 ` Holger Hoffstätte
2015-04-19 13:51 ` ivarun_ml
0 siblings, 1 reply; 3+ messages in thread
From: Holger Hoffstätte @ 2015-04-17 20:35 UTC (permalink / raw)
To: linux-btrfs
On Fri, 17 Apr 2015 21:46:08 +0200, ivarun_ml wrote:
> But if I instead just read the file, then btrfs will still detect and
> correct the corruption, but the device stats are not updated, and the
> errors in the syslog have info-priority, making them much harder to
> notice. [..]
I don't know about the device stats, but at least the log level for
read errors was already fixed in a patch series:
http://www.spinics.net/lists/linux-btrfs/msg40556.html
This was merged into 4.0.
-h
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Background corrections don't show up in device stats, and have low syslog priority. How do I reliably find out about them?
2015-04-17 20:35 ` Holger Hoffstätte
@ 2015-04-19 13:51 ` ivarun_ml
0 siblings, 0 replies; 3+ messages in thread
From: ivarun_ml @ 2015-04-19 13:51 UTC (permalink / raw)
To: linux-btrfs
On 4/17/2015 at 10:36 PM, "Holger Hoffstaette" <holger.hoffstaette@googlemail.com> wrote:
>I don't know about the device stats, but at least the log level for
>read errors was already fixed in a patch series:
>
>http://www.spinics.net/lists/linux-btrfs/msg40556.html
>
>This was merged into 4.0.
Thanks! Very narrow coincidence that openSUSE tumbleweed was still on 3.19.3 instead of 4.0 when I ran my test :-).
I would still be interested to know whether we can expect the device stats to be updated in the future, or if the syslog will be the best way to look up these numbers. Finding out how many errors there have been of the different kinds per device would be much easier to do through device stats rather than searching the syslog.
Regards,
Ivar Ursin Nikolaisen
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-04-19 13:51 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-17 19:46 Background corrections don't show up in device stats, and have low syslog priority. How do I reliably find out about them? ivarun_ml
2015-04-17 20:35 ` Holger Hoffstätte
2015-04-19 13:51 ` ivarun_ml
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).