error count

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* error count
@ 2013-08-04 11:42 Russell Coker
  2013-08-04 13:37 ` Bart Noordervliet
  0 siblings, 1 reply; 5+ messages in thread
From: Russell Coker @ 2013-08-04 11:42 UTC (permalink / raw)
  To: linux-btrfs

I've got a 3TB SATA disk that is known to have problems (it failed in a zpool 
for one of my clients).  For test purposes I'm running a BTRFS RAID-1 on two 
partitions on that disk, bad for performance and not something you'd normally 
do but good for testing.

BTRFS recovers from read errors quite well and gives informative log messages.

But it doesn't seem possible to get a count of the number of errors.  I think 
that at the minimum I should be able to get a count of the number of errors 
from a device since it was attached to the system.  I think that the ideal 
would be to have an error count stored on the device and available to the 
sysadmin.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: error count
  2013-08-04 11:42 error count Russell Coker
@ 2013-08-04 13:37 ` Bart Noordervliet
  2013-08-10  4:58   ` Chris Samuel
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Noordervliet @ 2013-08-04 13:37 UTC (permalink / raw)
  To: russell; +Cc: linux-btrfs@vger.kernel.org

Hi Russell,

a sufficiently up-to-date kernel and btrfs tool will provide the
'btrfs device stats' command, which should give you the info you want.

Regards,

Bart


On Sun, Aug 4, 2013 at 1:42 PM, Russell Coker <russell@coker.com.au> wrote:
> I've got a 3TB SATA disk that is known to have problems (it failed in a zpool
> for one of my clients).  For test purposes I'm running a BTRFS RAID-1 on two
> partitions on that disk, bad for performance and not something you'd normally
> do but good for testing.
>
> BTRFS recovers from read errors quite well and gives informative log messages.
>
> But it doesn't seem possible to get a count of the number of errors.  I think
> that at the minimum I should be able to get a count of the number of errors
> from a device since it was attached to the system.  I think that the ideal
> would be to have an error count stored on the device and available to the
> sysadmin.
>
> --
> My Main Blog         http://etbe.coker.com.au/
> My Documents Blog    http://doc.coker.com.au/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: error count
  2013-08-04 13:37 ` Bart Noordervliet
@ 2013-08-10  4:58   ` Chris Samuel
  2013-08-10  9:19     ` Russell Coker
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Samuel @ 2013-08-10  4:58 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org; +Cc: russell

[-- Attachment #1: Type: text/plain, Size: 681 bytes --]

On Sun, 4 Aug 2013 03:37:22 PM Bart Noordervliet wrote:

> a sufficiently up-to-date kernel and btrfs tool will provide the
> 'btrfs device stats' command, which should give you the info you want.

This is what it looks like:

chris@quad:~/Downloads/Linux/FileSystems/BtrFS/btrfs-progs$ sudo ./btrfs 
device stats /srv/DR
[/dev/sdb3].write_io_errs   0
[/dev/sdb3].read_io_errs    0
[/dev/sdb3].flush_io_errs   0
[/dev/sdb3].corruption_errs 0
[/dev/sdb3].generation_errs 0

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: error count
  2013-08-10  4:58   ` Chris Samuel
@ 2013-08-10  9:19     ` Russell Coker
  2013-08-10 13:38       ` Chris Samuel
  0 siblings, 1 reply; 5+ messages in thread
From: Russell Coker @ 2013-08-10  9:19 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On Sat, 10 Aug 2013, Chris Samuel <chris@csamuel.org> wrote:
> On Sun, 4 Aug 2013 03:37:22 PM Bart Noordervliet wrote:
> > a sufficiently up-to-date kernel and btrfs tool will provide the
> > 'btrfs device stats' command, which should give you the info you want.
> 
> This is what it looks like:
> 
> chris@quad:~/Downloads/Linux/FileSystems/BtrFS/btrfs-progs$ sudo ./btrfs
> device stats /srv/DR
> [/dev/sdb3].write_io_errs   0
> [/dev/sdb3].read_io_errs    0
> [/dev/sdb3].flush_io_errs   0
> [/dev/sdb3].corruption_errs 0
> [/dev/sdb3].generation_errs 0

Thanks Chris and Bart.

Would it be possible to get the man page updated to include a brief 
description of those errors?  The first three are somewhat obvious in meaning 
(although not obvious in how they would happen) and the fourth is very 
obvious.  But what does generation_errs mean?  I'm seeing some on one system.  
Should I be concerned?  If I write a Nagios check which ones should be 
warnings and which ones errors?

Also why does it give the following errors about trying to open /dev/sr0 when 
using a BTRFS RAID-1 filesystem?  Below is for a RAID-1 over /dev/sdb and 
/dev/sdc.

# btrfs device stats /dev/sdb
failed to open /dev/sr0: No medium found
failed to open /dev/sr0: No medium found
[/dev/sdb].write_io_errs   0
[/dev/sdb].read_io_errs    0
[/dev/sdb].flush_io_errs   0
[/dev/sdb].corruption_errs 0
[/dev/sdb].generation_errs 0
# btrfs device stats /dev/sdc
failed to open /dev/sr0: No medium found
failed to open /dev/sr0: No medium found
[/dev/sdc].write_io_errs   0
[/dev/sdc].read_io_errs    0
[/dev/sdc].flush_io_errs   0
[/dev/sdc].corruption_errs 0
[/dev/sdc].generation_errs 0

Why is it even searching for the other part when only a single device is 
specified and why can't it give stats without checking /dev/sr0 when checking 
a single device when it can do so while checking all devices?

# btrfs device stats /dev/sdd1
failed to open /dev/sr0: No medium found
failed to open /dev/sr0: No medium found
[/dev/sdd1].write_io_errs   0
[/dev/sdd1].read_io_errs    136
[/dev/sdd1].flush_io_errs   0
[/dev/sdd1].corruption_errs 0
[/dev/sdd1].generation_errs 0
# btrfs device stats /dev/sdd2
failed to open /dev/sr0: No medium found
failed to open /dev/sr0: No medium found
[/dev/sdd2].write_io_errs   0
[/dev/sdd2].read_io_errs    0
[/dev/sdd2].flush_io_errs   0
[/dev/sdd2].corruption_errs 0
[/dev/sdd2].generation_errs 0
# btrfs device stats /mnt/backup/
[/dev/sdd1].write_io_errs   0
[/dev/sdd1].read_io_errs    136
[/dev/sdd1].flush_io_errs   0
[/dev/sdd1].corruption_errs 0
[/dev/sdd1].generation_errs 0
[/dev/sdd2].write_io_errs   0
[/dev/sdd2].read_io_errs    0
[/dev/sdd2].flush_io_errs   0
[/dev/sdd2].corruption_errs 0
[/dev/sdd2].generation_errs 0


Thanks.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: error count
  2013-08-10  9:19     ` Russell Coker
@ 2013-08-10 13:38       ` Chris Samuel
  0 siblings, 0 replies; 5+ messages in thread
From: Chris Samuel @ 2013-08-10 13:38 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2081 bytes --]

On Sat, 10 Aug 2013 07:19:27 PM Russell Coker wrote:

> But what does generation_errs mean?  I'm seeing some on one system.  
> Should I be concerned?  If I write a Nagios check which ones should be 
> warnings and which ones errors?

All I know is that ioctl.h says:

BTRFS_DEV_STAT_GENERATION_ERRS, /* an indication that blocks have not
                                                           * been written */

Looking at the kernel code that only seems to get incremented during a scrub.  
The code that does that says:

                } else if (generation != le64_to_cpu(h->generation)) {
                        sblock->header_error = 1;
                        sblock->generation_error = 1;
                }

The generation there is from the btrfs inode structure, the header says:

        /* full 64 bit generation number, struct vfs_inode doesn't have a big
         * enough field for this.
         */
        u64 generation;

The wiki says:

https://btrfs.wiki.kernel.org/index.php/Glossary

# generation 
#   An internal counter which updates for each transaction. When a
# metadata block is written (using copy on write), current generation
# is stored in the block, so that blocks which are too new (and hence
# possibly inconsistent) can be identified.

and:

https://btrfs.wiki.kernel.org/index.php/Btrfs_design

# Everything that points to a btree block also stores the generation
# field it expects that block to have. This allows Btrfs to detect
# phantom or misplaced writes on the media.

HTH!

> Also why does it give the following errors about trying to open /dev/sr0
> when  using a BTRFS RAID-1 filesystem?  Below is for a RAID-1 over /dev/sdb
> and /dev/sdc.

I don't get that here, I'm building btrfs-progs from git at commit 
194aa4a1bd6447bb545286d0bcb0b0be8204d79f (July 5th), aka:

btrfs-progs$ git describe --tags
v0.20-rc1-358-g194aa4a

cheers!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-08-10 13:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-04 11:42 error count Russell Coker
2013-08-04 13:37 ` Bart Noordervliet
2013-08-10  4:58   ` Chris Samuel
2013-08-10  9:19     ` Russell Coker
2013-08-10 13:38       ` Chris Samuel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).