* error count
@ 2013-08-04 11:42 Russell Coker
2013-08-04 13:37 ` Bart Noordervliet
0 siblings, 1 reply; 5+ messages in thread
From: Russell Coker @ 2013-08-04 11:42 UTC (permalink / raw)
To: linux-btrfs
I've got a 3TB SATA disk that is known to have problems (it failed in a zpool
for one of my clients). For test purposes I'm running a BTRFS RAID-1 on two
partitions on that disk, bad for performance and not something you'd normally
do but good for testing.
BTRFS recovers from read errors quite well and gives informative log messages.
But it doesn't seem possible to get a count of the number of errors. I think
that at the minimum I should be able to get a count of the number of errors
from a device since it was attached to the system. I think that the ideal
would be to have an error count stored on the device and available to the
sysadmin.
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: error count
2013-08-04 11:42 error count Russell Coker
@ 2013-08-04 13:37 ` Bart Noordervliet
2013-08-10 4:58 ` Chris Samuel
0 siblings, 1 reply; 5+ messages in thread
From: Bart Noordervliet @ 2013-08-04 13:37 UTC (permalink / raw)
To: russell; +Cc: linux-btrfs@vger.kernel.org
Hi Russell,
a sufficiently up-to-date kernel and btrfs tool will provide the
'btrfs device stats' command, which should give you the info you want.
Regards,
Bart
On Sun, Aug 4, 2013 at 1:42 PM, Russell Coker <russell@coker.com.au> wrote:
> I've got a 3TB SATA disk that is known to have problems (it failed in a zpool
> for one of my clients). For test purposes I'm running a BTRFS RAID-1 on two
> partitions on that disk, bad for performance and not something you'd normally
> do but good for testing.
>
> BTRFS recovers from read errors quite well and gives informative log messages.
>
> But it doesn't seem possible to get a count of the number of errors. I think
> that at the minimum I should be able to get a count of the number of errors
> from a device since it was attached to the system. I think that the ideal
> would be to have an error count stored on the device and available to the
> sysadmin.
>
> --
> My Main Blog http://etbe.coker.com.au/
> My Documents Blog http://doc.coker.com.au/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: error count
2013-08-04 13:37 ` Bart Noordervliet
@ 2013-08-10 4:58 ` Chris Samuel
2013-08-10 9:19 ` Russell Coker
0 siblings, 1 reply; 5+ messages in thread
From: Chris Samuel @ 2013-08-10 4:58 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org; +Cc: russell
[-- Attachment #1: Type: text/plain, Size: 681 bytes --]
On Sun, 4 Aug 2013 03:37:22 PM Bart Noordervliet wrote:
> a sufficiently up-to-date kernel and btrfs tool will provide the
> 'btrfs device stats' command, which should give you the info you want.
This is what it looks like:
chris@quad:~/Downloads/Linux/FileSystems/BtrFS/btrfs-progs$ sudo ./btrfs
device stats /srv/DR
[/dev/sdb3].write_io_errs 0
[/dev/sdb3].read_io_errs 0
[/dev/sdb3].flush_io_errs 0
[/dev/sdb3].corruption_errs 0
[/dev/sdb3].generation_errs 0
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: error count
2013-08-10 4:58 ` Chris Samuel
@ 2013-08-10 9:19 ` Russell Coker
2013-08-10 13:38 ` Chris Samuel
0 siblings, 1 reply; 5+ messages in thread
From: Russell Coker @ 2013-08-10 9:19 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org
On Sat, 10 Aug 2013, Chris Samuel <chris@csamuel.org> wrote:
> On Sun, 4 Aug 2013 03:37:22 PM Bart Noordervliet wrote:
> > a sufficiently up-to-date kernel and btrfs tool will provide the
> > 'btrfs device stats' command, which should give you the info you want.
>
> This is what it looks like:
>
> chris@quad:~/Downloads/Linux/FileSystems/BtrFS/btrfs-progs$ sudo ./btrfs
> device stats /srv/DR
> [/dev/sdb3].write_io_errs 0
> [/dev/sdb3].read_io_errs 0
> [/dev/sdb3].flush_io_errs 0
> [/dev/sdb3].corruption_errs 0
> [/dev/sdb3].generation_errs 0
Thanks Chris and Bart.
Would it be possible to get the man page updated to include a brief
description of those errors? The first three are somewhat obvious in meaning
(although not obvious in how they would happen) and the fourth is very
obvious. But what does generation_errs mean? I'm seeing some on one system.
Should I be concerned? If I write a Nagios check which ones should be
warnings and which ones errors?
Also why does it give the following errors about trying to open /dev/sr0 when
using a BTRFS RAID-1 filesystem? Below is for a RAID-1 over /dev/sdb and
/dev/sdc.
# btrfs device stats /dev/sdb
failed to open /dev/sr0: No medium found
failed to open /dev/sr0: No medium found
[/dev/sdb].write_io_errs 0
[/dev/sdb].read_io_errs 0
[/dev/sdb].flush_io_errs 0
[/dev/sdb].corruption_errs 0
[/dev/sdb].generation_errs 0
# btrfs device stats /dev/sdc
failed to open /dev/sr0: No medium found
failed to open /dev/sr0: No medium found
[/dev/sdc].write_io_errs 0
[/dev/sdc].read_io_errs 0
[/dev/sdc].flush_io_errs 0
[/dev/sdc].corruption_errs 0
[/dev/sdc].generation_errs 0
Why is it even searching for the other part when only a single device is
specified and why can't it give stats without checking /dev/sr0 when checking
a single device when it can do so while checking all devices?
# btrfs device stats /dev/sdd1
failed to open /dev/sr0: No medium found
failed to open /dev/sr0: No medium found
[/dev/sdd1].write_io_errs 0
[/dev/sdd1].read_io_errs 136
[/dev/sdd1].flush_io_errs 0
[/dev/sdd1].corruption_errs 0
[/dev/sdd1].generation_errs 0
# btrfs device stats /dev/sdd2
failed to open /dev/sr0: No medium found
failed to open /dev/sr0: No medium found
[/dev/sdd2].write_io_errs 0
[/dev/sdd2].read_io_errs 0
[/dev/sdd2].flush_io_errs 0
[/dev/sdd2].corruption_errs 0
[/dev/sdd2].generation_errs 0
# btrfs device stats /mnt/backup/
[/dev/sdd1].write_io_errs 0
[/dev/sdd1].read_io_errs 136
[/dev/sdd1].flush_io_errs 0
[/dev/sdd1].corruption_errs 0
[/dev/sdd1].generation_errs 0
[/dev/sdd2].write_io_errs 0
[/dev/sdd2].read_io_errs 0
[/dev/sdd2].flush_io_errs 0
[/dev/sdd2].corruption_errs 0
[/dev/sdd2].generation_errs 0
Thanks.
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: error count
2013-08-10 9:19 ` Russell Coker
@ 2013-08-10 13:38 ` Chris Samuel
0 siblings, 0 replies; 5+ messages in thread
From: Chris Samuel @ 2013-08-10 13:38 UTC (permalink / raw)
To: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 2081 bytes --]
On Sat, 10 Aug 2013 07:19:27 PM Russell Coker wrote:
> But what does generation_errs mean? I'm seeing some on one system.
> Should I be concerned? If I write a Nagios check which ones should be
> warnings and which ones errors?
All I know is that ioctl.h says:
BTRFS_DEV_STAT_GENERATION_ERRS, /* an indication that blocks have not
* been written */
Looking at the kernel code that only seems to get incremented during a scrub.
The code that does that says:
} else if (generation != le64_to_cpu(h->generation)) {
sblock->header_error = 1;
sblock->generation_error = 1;
}
The generation there is from the btrfs inode structure, the header says:
/* full 64 bit generation number, struct vfs_inode doesn't have a big
* enough field for this.
*/
u64 generation;
The wiki says:
https://btrfs.wiki.kernel.org/index.php/Glossary
# generation
# An internal counter which updates for each transaction. When a
# metadata block is written (using copy on write), current generation
# is stored in the block, so that blocks which are too new (and hence
# possibly inconsistent) can be identified.
and:
https://btrfs.wiki.kernel.org/index.php/Btrfs_design
# Everything that points to a btree block also stores the generation
# field it expects that block to have. This allows Btrfs to detect
# phantom or misplaced writes on the media.
HTH!
> Also why does it give the following errors about trying to open /dev/sr0
> when using a BTRFS RAID-1 filesystem? Below is for a RAID-1 over /dev/sdb
> and /dev/sdc.
I don't get that here, I'm building btrfs-progs from git at commit
194aa4a1bd6447bb545286d0bcb0b0be8204d79f (July 5th), aka:
btrfs-progs$ git describe --tags
v0.20-rc1-358-g194aa4a
cheers!
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-08-10 13:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-04 11:42 error count Russell Coker
2013-08-04 13:37 ` Bart Noordervliet
2013-08-10 4:58 ` Chris Samuel
2013-08-10 9:19 ` Russell Coker
2013-08-10 13:38 ` Chris Samuel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).