All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Pocock <daniel@pocock.com.au>
To: linux-btrfs@vger.kernel.org
Subject: Re: Nagios probe for btrfs RAID status?
Date: Sat, 23 Nov 2013 12:44:25 +0100	[thread overview]
Message-ID: <52909519.7080508@pocock.com.au> (raw)
In-Reply-To: <pan$13621$9e5c77ca$1502a49b$4b791baa@cox.net>



On 23/11/13 11:35, Duncan wrote:
> Daniel Pocock posted on Sat, 23 Nov 2013 09:37:50 +0100 as excerpted:
> 
>> What about when btrfs detects a bad block checksum and recovers data
>> from the equivalent block on another disk?  The wiki says there will be
>> a syslog event.  Does btrfs keep any stats on the number of blocks that
>> it considers unreliable and can this be queried from user space?
> 
> The way you phrased that question is strange to me (considers unreliable?
> does that mean ones that it had to fix, or ones that it had to fix more 
> than once, or...), so I'm not sure this answers it, but from the btrfs 
> manpage...


Let me clarify: when I said unreliable, I was referring to those blocks
where the block device driver reads the block without reporting any
error but where btrfs has decided the checksum is bad and not used the
data from the block.

Such blocks definitely exist. Sometimes the data was corrupted at the
moment of writing and no matter how many times you read the block, you
always get a bad checksum.


>>>>>
> 
> btrfs device stats [-z] {<path>|<device>}
> 
> Read and print the device IO stats for all devices of the filesystem 
> identified by <path> or for a single <device>.
> 
> Options
> 
> -z   Reset stats to zero after reading them.
> 
> <<<<
> 
> Here's the output for my (dual device btrfs raid1) rootfs, here:
> 
> btrfs dev stat /
> [/dev/sdc5].write_io_errs   0
> [/dev/sdc5].read_io_errs    0
> [/dev/sdc5].flush_io_errs   0
> [/dev/sdc5].corruption_errs 0
> [/dev/sdc5].generation_errs 0
> [/dev/sda5].write_io_errs   0
> [/dev/sda5].read_io_errs    0
> [/dev/sda5].flush_io_errs   0
> [/dev/sda5].corruption_errs 0
> [/dev/sda5].generation_errs 0
> 
> As you can see, for multi-device filesystems it gives the stats per 
> component device.  Any errors accumulate until a reset using -z, so you 
> can easily see if the numbers are increasing over time and by how much.
> 


That looks interesting - are these explained anywhere?

Should a Nagios plugin just look for any non-zero value or just focus on
some of those?

Are they runtime stats (since system boot) or are they maintained in the
filesystem on disk?

My own version of the btrfs utility doesn't have that command though, I
am using a Debian stable system.  I tried a newer version and it gives

ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS)

so I probably need to update my kernel too.



  reply	other threads:[~2013-11-23 11:44 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-22 13:47 Nagios probe for btrfs RAID status? Daniel Pocock
2013-11-22 17:52 ` Duncan
2013-11-23  3:59 ` Anand Jain
2013-11-23  8:37   ` Daniel Pocock
2013-11-23  9:20     ` Daniel Pocock
2013-11-23 10:35     ` Duncan
2013-11-23 11:44       ` Daniel Pocock [this message]
2013-11-23 16:32         ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52909519.7080508@pocock.com.au \
    --to=daniel@pocock.com.au \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.