From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout-de.gmx.net ([213.165.64.23]:54316 "HELO mailout-de.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1753378Ab2EYUlD (ORCPT ); Fri, 25 May 2012 16:41:03 -0400 Message-ID: <4FBFEE7E.8060008@gmx.net> Date: Fri, 25 May 2012 22:41:34 +0200 From: Arne Jansen MIME-Version: 1.0 To: Stefan Behrens CC: Christoph Hellwig , linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 0/3] Btrfs: add IO error device stats References: <1337954770-10086-1-git-send-email-sbehrens@giantdisaster.de> <20120525151854.GA23362@infradead.org> <4FBFC63B.6050403@giantdisaster.de> In-Reply-To: <4FBFC63B.6050403@giantdisaster.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 05/25/12 19:49, Stefan Behrens wrote: > It would be helpful if already the generic block layer would offer > device error counters. Then btrfs could read them, add own counters for > its checksum detected errors, and store everything persistently in the > filesystem. > I take it that you not only count I/O-errors, but also corrupted blocks and errors generated by misdirected writes. These are informations that are not available to the block layer. > The goal is to replace disks that have an increased error rate with > spare disks, and the goal is to repair this degenerated RAID state quickly. > > > On 05/25/2012 17:18, Christoph Hellwig wrote: >> Can you explain why the device error counters should be in a filesystem >> instead of generic block layer code? >> >> On Fri, May 25, 2012 at 04:06:07PM +0200, Stefan Behrens wrote: > [...] >>> The goal is to detect when drives start to get an increased error rate, >>> when drives should be replaced soon. Therefore statistic counters are >>> added that count IO errors (read, write and flush). Additionally, the >>> software detected errors like checksum errors and corrupted blocks are >>> counted. >>> >>> An ioctl interface is added to get the device statistic counters. >>> A second ioctl is added to atomically get and reset these counters. >>> >>> The device statistics are written into the device tree with each >>> transaction commit. Only modified statistics are written. >>> When a filesystem is mounted, the device statistics for each involved >>> device are read from the device tree and used to initialize the >>> counters. >>> >>> A patch for the btrfs-progs world will also be sent. >>> >>> Stefan Behrens (3): >>> Btrfs: add device counters for detected IO and checksum errors >>> Btrfs: add ioctl to get and reset the device stats >>> Btrfs: read device stats on mount, write modified ones during commit >>> >>> fs/btrfs/ctree.h | 38 ++++++ >>> fs/btrfs/disk-io.c | 20 +++- >>> fs/btrfs/extent_io.c | 18 ++- >>> fs/btrfs/ioctl.c | 26 +++++ >>> fs/btrfs/ioctl.h | 33 ++++++ >>> fs/btrfs/print-tree.c | 3 + >>> fs/btrfs/scrub.c | 65 ++++++++--- >>> fs/btrfs/transaction.c | 4 + >>> fs/btrfs/volumes.c | 304 >>> +++++++++++++++++++++++++++++++++++++++++++++++- >>> fs/btrfs/volumes.h | 52 +++++++++ >>> 10 files changed, 539 insertions(+), 24 deletions(-) >>> >>> -- >>> 1.7.10.2 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html