From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from plane.gmane.org ([80.91.229.3]:54244 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750849Ab3KWKfb (ORCPT ); Sat, 23 Nov 2013 05:35:31 -0500 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1VkAYj-0005Zz-Gm for linux-btrfs@vger.kernel.org; Sat, 23 Nov 2013 11:35:29 +0100 Received: from ip68-231-22-224.ph.ph.cox.net ([68.231.22.224]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 23 Nov 2013 11:35:29 +0100 Received: from 1i5t5.duncan by ip68-231-22-224.ph.ph.cox.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 23 Nov 2013 11:35:29 +0100 To: linux-btrfs@vger.kernel.org From: Duncan <1i5t5.duncan@cox.net> Subject: Re: Nagios probe for btrfs RAID status? Date: Sat, 23 Nov 2013 10:35:08 +0000 (UTC) Message-ID: References: <528F6085.4020603@pocock.com.au> <52902808.8020706@oracle.com> <5290695E.80506@pocock.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: Daniel Pocock posted on Sat, 23 Nov 2013 09:37:50 +0100 as excerpted: > What about when btrfs detects a bad block checksum and recovers data > from the equivalent block on another disk? The wiki says there will be > a syslog event. Does btrfs keep any stats on the number of blocks that > it considers unreliable and can this be queried from user space? The way you phrased that question is strange to me (considers unreliable? does that mean ones that it had to fix, or ones that it had to fix more than once, or...), so I'm not sure this answers it, but from the btrfs manpage... >>>> btrfs device stats [-z] {|} Read and print the device IO stats for all devices of the filesystem identified by or for a single . Options -z Reset stats to zero after reading them. <<<< Here's the output for my (dual device btrfs raid1) rootfs, here: btrfs dev stat / [/dev/sdc5].write_io_errs 0 [/dev/sdc5].read_io_errs 0 [/dev/sdc5].flush_io_errs 0 [/dev/sdc5].corruption_errs 0 [/dev/sdc5].generation_errs 0 [/dev/sda5].write_io_errs 0 [/dev/sda5].read_io_errs 0 [/dev/sda5].flush_io_errs 0 [/dev/sda5].corruption_errs 0 [/dev/sda5].generation_errs 0 As you can see, for multi-device filesystems it gives the stats per component device. Any errors accumulate until a reset using -z, so you can easily see if the numbers are increasing over time and by how much. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman