* Re: [PATCH v5 0/3] Btrfs: add IO error device stats
       [not found] <1337954770-10086-1-git-send-email-sbehrens@giantdisaster.de>
@ 2012-05-25 15:18 ` Christoph Hellwig
  2012-05-25 17:49   ` Stefan Behrens
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2012-05-25 15:18 UTC (permalink / raw)
  To: Stefan Behrens; +Cc: linux-btrfs, linux-fsdevel, linux-kernel
Can you explain why the device error counters should be in a filesystem
instead of generic block layer code?
On Fri, May 25, 2012 at 04:06:07PM +0200, Stefan Behrens wrote:
> Changes v1-v2:
> - Remove restriction that BTRFS_IOC_GET_DEVICE_STATS is a privileged
>   operation
> - Cast u64 to unsigned long long for printf()
> 
> Changes v2-v3:
> - Rebased on Chris' current master
> 
> Changes v3-v4:
> - Add padding at end of ioctl structure
> 
> Changes v4-v5:
> - The statistic members in the ioctl are now organized as an array of
>   64 bit values. Symbolic names for the array indexes are defined in
>   an enum, which also defines the max value. This change makes it
>   easier to add new statistic members in the future
> - Give ins_len = -1 to btrfs_search_slot() when an item might get
>   deleted
> - Introduce a helper function for the repeated sequence stat_int() +
>   dirty = 1 + stat_print()
> - Introduce a helper function for the code that shares the bio
>   bi_private member for two pieces of information
> 
> The goal is to detect when drives start to get an increased error rate,
> when drives should be replaced soon. Therefore statistic counters are
> added that count IO errors (read, write and flush). Additionally, the
> software detected errors like checksum errors and corrupted blocks are
> counted.
> 
> An ioctl interface is added to get the device statistic counters.
> A second ioctl is added to atomically get and reset these counters.
> 
> The device statistics are written into the device tree with each
> transaction commit. Only modified statistics are written.
> When a filesystem is mounted, the device statistics for each involved
> device are read from the device tree and used to initialize the
> counters.
> 
> A patch for the btrfs-progs world will also be sent.
> 
> Stefan Behrens (3):
>   Btrfs: add device counters for detected IO and checksum errors
>   Btrfs: add ioctl to get and reset the device stats
>   Btrfs: read device stats on mount, write modified ones during commit
> 
>  fs/btrfs/ctree.h       |   38 ++++++
>  fs/btrfs/disk-io.c     |   20 +++-
>  fs/btrfs/extent_io.c   |   18 ++-
>  fs/btrfs/ioctl.c       |   26 +++++
>  fs/btrfs/ioctl.h       |   33 ++++++
>  fs/btrfs/print-tree.c  |    3 +
>  fs/btrfs/scrub.c       |   65 ++++++++---
>  fs/btrfs/transaction.c |    4 +
>  fs/btrfs/volumes.c     |  304 +++++++++++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/volumes.h     |   52 +++++++++
>  10 files changed, 539 insertions(+), 24 deletions(-)
> 
> -- 
> 1.7.10.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
---end quoted text---
^ permalink raw reply	[flat|nested] 3+ messages in thread
* Re: [PATCH v5 0/3] Btrfs: add IO error device stats
  2012-05-25 15:18 ` [PATCH v5 0/3] Btrfs: add IO error device stats Christoph Hellwig
@ 2012-05-25 17:49   ` Stefan Behrens
  2012-05-25 20:41     ` Arne Jansen
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Behrens @ 2012-05-25 17:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs, linux-fsdevel, linux-kernel
It would be helpful if already the generic block layer would offer 
device error counters. Then btrfs could read them, add own counters for 
its checksum detected errors, and store everything persistently in the 
filesystem.
The goal is to replace disks that have an increased error rate with 
spare disks, and the goal is to repair this degenerated RAID state quickly.
On 05/25/2012 17:18, Christoph Hellwig wrote:
> Can you explain why the device error counters should be in a filesystem
> instead of generic block layer code?
>
> On Fri, May 25, 2012 at 04:06:07PM +0200, Stefan Behrens wrote:
[...]
>> The goal is to detect when drives start to get an increased error rate,
>> when drives should be replaced soon. Therefore statistic counters are
>> added that count IO errors (read, write and flush). Additionally, the
>> software detected errors like checksum errors and corrupted blocks are
>> counted.
>>
>> An ioctl interface is added to get the device statistic counters.
>> A second ioctl is added to atomically get and reset these counters.
>>
>> The device statistics are written into the device tree with each
>> transaction commit. Only modified statistics are written.
>> When a filesystem is mounted, the device statistics for each involved
>> device are read from the device tree and used to initialize the
>> counters.
>>
>> A patch for the btrfs-progs world will also be sent.
>>
>> Stefan Behrens (3):
>>    Btrfs: add device counters for detected IO and checksum errors
>>    Btrfs: add ioctl to get and reset the device stats
>>    Btrfs: read device stats on mount, write modified ones during commit
>>
>>   fs/btrfs/ctree.h       |   38 ++++++
>>   fs/btrfs/disk-io.c     |   20 +++-
>>   fs/btrfs/extent_io.c   |   18 ++-
>>   fs/btrfs/ioctl.c       |   26 +++++
>>   fs/btrfs/ioctl.h       |   33 ++++++
>>   fs/btrfs/print-tree.c  |    3 +
>>   fs/btrfs/scrub.c       |   65 ++++++++---
>>   fs/btrfs/transaction.c |    4 +
>>   fs/btrfs/volumes.c     |  304 +++++++++++++++++++++++++++++++++++++++++++++++-
>>   fs/btrfs/volumes.h     |   52 +++++++++
>>   10 files changed, 539 insertions(+), 24 deletions(-)
>>
>> --
>> 1.7.10.2
^ permalink raw reply	[flat|nested] 3+ messages in thread
* Re: [PATCH v5 0/3] Btrfs: add IO error device stats
  2012-05-25 17:49   ` Stefan Behrens
@ 2012-05-25 20:41     ` Arne Jansen
  0 siblings, 0 replies; 3+ messages in thread
From: Arne Jansen @ 2012-05-25 20:41 UTC (permalink / raw)
  To: Stefan Behrens
  Cc: Christoph Hellwig, linux-btrfs, linux-fsdevel, linux-kernel
On 05/25/12 19:49, Stefan Behrens wrote:
> It would be helpful if already the generic block layer would offer
> device error counters. Then btrfs could read them, add own counters for
> its checksum detected errors, and store everything persistently in the
> filesystem.
>
I take it that you not only count I/O-errors, but also corrupted blocks
and errors generated by misdirected writes. These are informations that
are not available to the block layer.
> The goal is to replace disks that have an increased error rate with
> spare disks, and the goal is to repair this degenerated RAID state quickly.
>
>
> On 05/25/2012 17:18, Christoph Hellwig wrote:
>> Can you explain why the device error counters should be in a filesystem
>> instead of generic block layer code?
>>
>> On Fri, May 25, 2012 at 04:06:07PM +0200, Stefan Behrens wrote:
> [...]
>>> The goal is to detect when drives start to get an increased error rate,
>>> when drives should be replaced soon. Therefore statistic counters are
>>> added that count IO errors (read, write and flush). Additionally, the
>>> software detected errors like checksum errors and corrupted blocks are
>>> counted.
>>>
>>> An ioctl interface is added to get the device statistic counters.
>>> A second ioctl is added to atomically get and reset these counters.
>>>
>>> The device statistics are written into the device tree with each
>>> transaction commit. Only modified statistics are written.
>>> When a filesystem is mounted, the device statistics for each involved
>>> device are read from the device tree and used to initialize the
>>> counters.
>>>
>>> A patch for the btrfs-progs world will also be sent.
>>>
>>> Stefan Behrens (3):
>>> Btrfs: add device counters for detected IO and checksum errors
>>> Btrfs: add ioctl to get and reset the device stats
>>> Btrfs: read device stats on mount, write modified ones during commit
>>>
>>> fs/btrfs/ctree.h | 38 ++++++
>>> fs/btrfs/disk-io.c | 20 +++-
>>> fs/btrfs/extent_io.c | 18 ++-
>>> fs/btrfs/ioctl.c | 26 +++++
>>> fs/btrfs/ioctl.h | 33 ++++++
>>> fs/btrfs/print-tree.c | 3 +
>>> fs/btrfs/scrub.c | 65 ++++++++---
>>> fs/btrfs/transaction.c | 4 +
>>> fs/btrfs/volumes.c | 304
>>> +++++++++++++++++++++++++++++++++++++++++++++++-
>>> fs/btrfs/volumes.h | 52 +++++++++
>>> 10 files changed, 539 insertions(+), 24 deletions(-)
>>>
>>> --
>>> 1.7.10.2
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply	[flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-05-25 20:41 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1337954770-10086-1-git-send-email-sbehrens@giantdisaster.de>
2012-05-25 15:18 ` [PATCH v5 0/3] Btrfs: add IO error device stats Christoph Hellwig
2012-05-25 17:49   ` Stefan Behrens
2012-05-25 20:41     ` Arne Jansen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).