From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Tue, 22 Aug 2006 21:50:08 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id k7N4nODW001957 for ; Tue, 22 Aug 2006 21:49:38 -0700 Date: Wed, 23 Aug 2006 14:48:30 +1000 From: David Chinner Subject: Re: Infinite loop in xfssyncd on full file system Message-ID: <20060823044829.GD807872@melbourne.sgi.com> References: <20060823040218.GC807872@melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060823040218.GC807872@melbourne.sgi.com> Sender: xfs-bounce@oss.sgi.com Errors-To: xfs-bounce@oss.sgi.com List-Id: xfs To: Stephane Doyon Cc: linux-xfs@oss.sgi.com, lnx1138@us.ibm.com On Wed, Aug 23, 2006 at 02:02:18PM +1000, David Chinner wrote: > On Tue, Aug 22, 2006 at 04:01:10PM -0400, Stephane Doyon wrote: > > I'm seeing what appears to be an infinite loop in xfssyncd. It is > > triggered when writing to a file system that is full or nearly full. I > > have pinpointed the change that introduced this problem: it's > > > > "TAKE 947395 - Fixing potential deadlock in space allocation and > > freeing due to ENOSPC" > > > > git commit d210a28cd851082cec9b282443f8cc0e6fc09830. > > Thanks for tracking that down - I've been trying to isolate a test case > for another report of this looping in xfssyncd. > > [Luciano - this is the same problem we've been trying to track down.] > > > I hope you XFS experts see what might be wrong with that bug fix. It's > > ironic but for me, this (apparent) infinite loop seems much easier to hit > > than the out-of-order locking problem that the commit in question was > > supposed to fix. Let me know if I can get you any more info. > > Now we know what patch introduces the problem, we know where to look. > Stay tuned... I've had a quick look at the above commit. I'm not yet certain that everything is correct in terms of the semantics laid down in the change or that enough blocks are reserved for btree splits , but I can see a hole in the implementation on multiprocessor machines. Stephane/Luciano - can you test the following patch (note: compile tested only) and see if it fixes the problem? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group --- fs/xfs/xfs_mount.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c 2006-08-18 15:29:28.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c 2006-08-23 14:28:18.059385018 +1000 @@ -2108,11 +2108,11 @@ again: case XFS_SBS_FDBLOCKS: BUG_ON((mp->m_resblks - mp->m_resblks_avail) != 0); - lcounter = icsbp->icsb_fdblocks; + lcounter = icsbp->icsb_fdblocks - SET_ASIDE_BLOCKS; lcounter += delta; if (unlikely(lcounter < 0)) goto slow_path; - icsbp->icsb_fdblocks = lcounter; + icsbp->icsb_fdblocks = lcounter + SET_ASIDE_BLOCKS; break; default: BUG();