From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay3.corp.sgi.com [198.149.34.15]) by oss.sgi.com (Postfix) with ESMTP id 497A27F56 for ; Thu, 19 Feb 2015 15:19:15 -0600 (CST) Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by relay3.corp.sgi.com (Postfix) with ESMTP id CD0F9AC003 for ; Thu, 19 Feb 2015 13:19:11 -0800 (PST) Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net [150.101.137.131]) by cuda.sgi.com with ESMTP id iWqg0ggGGbEGJosX for ; Thu, 19 Feb 2015 13:19:05 -0800 (PST) Date: Fri, 20 Feb 2015 08:18:52 +1100 From: Dave Chinner Subject: Re: [PATCH] xfs: Introduce permanent async buffer write IO failures Message-ID: <20150219211852.GT12722@dastard> References: <1424298740-25821-1-git-send-email-david@fromorbit.com> <54E51CC7.8040709@sandeen.net> <20150218235220.GQ4251@dastard> <20150219190419.GA8862@hades.maiolino.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20150219190419.GA8862@hades.maiolino.org> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Eric Sandeen , xfs@oss.sgi.com On Thu, Feb 19, 2015 at 05:04:19PM -0200, Carlos Maiolino wrote: > > > > Well, the switch is simple characterisation. What we do with that > > error type can be much more complex, and that's why I haven't tried > > to address those issues here. When we've sorted out what we need > > and how we are going to configure the error handling, then we can > > add it. > > > > e.g. if we need configurable error handling, it needs to be > > configurable for different error types, and it needs to be > > configurable on a per-mount basis. And it needs to be configurable > > at runtime, not just at mount time. That kind of leads to using > > sysfs for this. e.g. for each error type we ned to handle different > > behaviour for: > > > > $ cat /sys/fs/xfs/vda/meta_write_errors/enospc/type > > [transient] permanent > > $ cat /sys/fs/xfs/vda/meta_write_errors/enospc/perm_timeout_seconds > > 300 > > $ cat /sys/fs/xfs/vda/meta_write_errors/enospc/perm_max_retry_attempts > > 50 > > $ cat /sys/fs/xfs/vda/meta_write_errors/enospc/transient_fail_at_umount > > 1 > > > > And then have generic infrastructure to set it up and handle the > > buffer errors according to the config? > > > > > (I think that's accurately summing up irc-and-side-channel discussions) ;) > > > > Pretty much. > > > > talking about possible configurable error handlers, what about leave this choice > of failure to the sysadmin? Instead a time or type based configuration what > about something that the administrator could just say "next IO to device X > should fail permanently"? How is this different to just shutting down the filesystem immediately via 'xfs_io -x -c shutdown /path/to/mnt/pt' ? Regardless of this, leave failures as transient, then when an error condition occurs (say thinp device ENOSPC), this will error out on the next IO that is retried: # echo permanent > /sys/fs/xfs/vda/meta_write_errors/enospc/type # echo 0 > /sys/fs/xfs/vda/meta_write_errors/enospc/perm_max_retry_attempts Will make the next device ENOSPC IO error shut the filesystem down. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs