From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p8L0uI4r098211 for ; Tue, 20 Sep 2011 19:56:18 -0500 Received: from ipmail05.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id D97AD1607DB2 for ; Tue, 20 Sep 2011 18:01:36 -0700 (PDT) Received: from ipmail05.adl6.internode.on.net (ipmail05.adl6.internode.on.net [150.101.137.143]) by cuda.sgi.com with ESMTP id xjJ0EN3NeFbvY8Qk for ; Tue, 20 Sep 2011 18:01:36 -0700 (PDT) Date: Wed, 21 Sep 2011 10:56:12 +1000 From: Dave Chinner Subject: Re: [PATCH v2] xfs: Check the return value of xfs_trans_get_buf() Message-ID: <20110921005612.GL15688@dastard> References: <1316527015.9298.60.camel@chandra-lucid.austin.ibm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1316527015.9298.60.camel@chandra-lucid.austin.ibm.com> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Chandra Seetharaman Cc: Alex Elder , XFS Mailing List On Tue, Sep 20, 2011 at 08:56:55AM -0500, Chandra Seetharaman wrote: > Ran the xfstests (auto) overnight and didn't see any new issues. Sure, but xfstests won't be triggering the new failure paths. It looks to me like any failure to get a buffer will now result in a cancelled transaction and a filesystem shutdown - the new failure paths really need to be tested to ensure that failures are handled gracefully and don't result in filesystem corruption. As it is, I'm not sure we want to do this. The only reason we can fail to get a buffer is allocation failures in extremely low memory conditions. However, the last thing we want is for filesystem shutdowns to be triggered by transient low memory conditions. The current state of the code is that the xfs_buf_get() code tries really, really hard to allocate memory, and we don't have any evidence to point to the fact that is it failing to allocate memory. We'd be seeing asserts firing and/or NULL pointer deref panics if xfs_buf_get() was failing, and neither of these are happening. As it is, before we can gracefully handle memory allocation failures in the xfs_buf layer, we need to be able to roll back dirty transactions so that memory allocation failure does not result filesystem shutdowns. That's actually possible to do now with the delayed logging infrastructure (because the CIL keeps a copy of the previous in memory modifications prior to the bad transaction), so we should look towards implementing transaction rollback first before allowing memory allocation to fail inside transaction contexts.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs