From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id q55Nsp1B138357 for ; Tue, 5 Jun 2012 18:54:51 -0500 Received: from ipmail06.adl2.internode.on.net (ipmail06.adl2.internode.on.net [150.101.137.129]) by cuda.sgi.com with ESMTP id rXc1GNqr6mlucoTr for ; Tue, 05 Jun 2012 16:54:49 -0700 (PDT) Date: Wed, 6 Jun 2012 09:54:47 +1000 From: Dave Chinner Subject: Re: Still seeing hangs in xlog_grant_log_space Message-ID: <20120605235447.GF22848@dastard> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Peter Watkins Cc: Juerg Haefliger , bpm@sgi.com, xfs@oss.sgi.com On Fri, May 25, 2012 at 01:03:04PM -0400, Peter Watkins wrote: > On Fri, May 25, 2012 at 2:28 AM, Juerg Haefliger wrote: > >> Does your kernel have the effect of > >> > >> 0bf6a5bd4b55b466964ead6fa566d8f346a828ee xfs: convert the xfsaild > >> thread to a workqueue > > > > No. > > > > > >> c7eead1e118fb7e34ee8f5063c3c090c054c3820 xfs: revert to using a > >> kthread for AIL pushing > > > > No. > > > > > >> In particular, is this code in xfs_trans_ail_push: > >> > >> =A0 =A0 =A0 smp_wmb(); > >> =A0 =A0 =A0 xfs_trans_ail_copy_lsn(ailp, &ailp->xa_target, &threshold_= lsn); > >> =A0 =A0 =A0 smp_wmb(); > > > > No. xfs_trans_ail_push looks like this: > > > > void > > xfs_trans_ail_push( > > =A0 =A0 =A0 =A0struct xfs_ail =A0*ailp, > > =A0 =A0 =A0 =A0xfs_lsn_t =A0 =A0 =A0 threshold_lsn) > > { > > =A0 =A0 =A0 =A0xfs_log_item_t =A0*lip; > > > > =A0 =A0 =A0 =A0lip =3D xfs_ail_min(ailp); > > =A0 =A0 =A0 =A0if (lip && !XFS_FORCED_SHUTDOWN(ailp->xa_mount)) { > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (XFS_LSN_CMP(threshold_lsn, ailp->xa_= target) > 0) > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0xfsaild_wakeup(ailp, thr= eshold_lsn); > > =A0 =A0 =A0 =A0} > > } > > > > > > FWIW, the XFS driver in my kernel is identical to the vanilla 2.6.38 > > driver. I'm still trying to get a XFS trace from a production hang. I > > do have a crash dump from a production machine with /tmp hanging. > > Would it be helpful to share that dump? > > > > ...Juerg > = > It looks like the combined effect of those patches, perhaps the write > barriers, fix one log space hang. That problem exists in 2.6.38. There are a huge number of fixes to solve these problems since 2.6.38. It doesn't help us at all to test anymore on 2.6.38, especially as that kernel is not supported, and I'd suggest that you migrate production off it sooner rather than later. > Reading bug #922 I see your test case reproduces in recent kernels, so > there must be a newer problem also. Right, that's what we need to find - it appears to be a CIL stall/accounting leak, completely unrelated to all the other AIL/log space stalls that have been occurring. Last thing is that I was waiting for more information on the stall that mark T @ sgi was able to reproduce. I haven't heard anything from him since I asked for more information on May 23.... > I find the reproducer the most useful, so no need to upload the dump. At this point, running on a 3.5-rc1 kernel is what we need to get working reliably. Once we have the problems solved there, we can work out what set of patches need to be backported to 3.0-stable and other kernels to fix the problems in those supported kernels... Cheers, Dave. -- = Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs