From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id q56HfjHZ246635 for ; Wed, 6 Jun 2012 12:41:45 -0500 Message-ID: <4FCF9655.3070300@sgi.com> Date: Wed, 06 Jun 2012 12:41:41 -0500 From: Mark Tinguely MIME-Version: 1.0 Subject: Re: Still seeing hangs in xlog_grant_log_space References: <20120605235447.GF22848@dastard> <4FCF5DB9.2000808@redhat.com> In-Reply-To: <4FCF5DB9.2000808@redhat.com> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Brian Foster Cc: xfs@oss.sgi.com On 06/06/12 08:40, Brian Foster wrote: > > Hi guys, > > I've been reproducing a similar stall in my testing of the 're-enable > xfsaild idle mode' patch/thread that only occurs for me in the xfs tree. > I was able to do a bisect from rc2 down to commit 43ff2122, though the > history of this issue makes me wonder if this commit just makes the > problem more reproducible as opposed to introducing it. Anyways, the > characteristics I observe so far: > > - Task blocked for more than 120s message in xlog_grant_head_wait(). I > see xfs_sync_worker() in my current bt, but I'm pretty sure I've seen > the same issue without it involved. > - The AIL is not empty/idle. It spins with a relatively small and > constant number of entries (I've seen ~8-40). These items are all always > marked as "flushing." > - Via crash, all the inodes in the ail appear to be marked as stale > (i.e. li_cb == xfs_istale_done). The inode flags are > XFS_ISTALE|XFS_IRECLAIMABLE|XFS_IFLOCK. > - The iflock in particular is why the ail marks these items 'flushing' > and why nothing seems to proceed any further (xfsaild just waits for > these to complete). I can kick the fs back into action with a 'sync.' > > It looks like we only mark in inode stale when an inode cluster is > freed, so I repeated this test with 'ikeep' and cannot reproduce. I'm > not sure if anybody is testing for this in recent kernels (Mark?), but > if so I'd be curious if ikeep has any effect on your test (BTW, this is > still the looping 273 xfstest). > > It seems like there could be some kind of race here with inodes being > marked stale, but also appears that either completion (xfs_istale_done() > or xfs_iflush_done()) should release the flush lock. I'll see if I can > trace it further and get anything useful... > > Brian > I am looking at several instances of the log hang on Linux 3.4rc2. The problem was originally reported on Linux 2.6.38-8. The perl script to recreate this problem is very similar to xfstest 273. I use that because it avoids all the filesystem mount/unmount that happen between the test 273 loops. You can build the log size that you want to test, create the directories and let it run until it hangs. I will look at the AIL entries in my current hangs. The problem is the filesystem can be made to hang with a completely empty AIL. Sometimes the flusher is hung trying to write out pages. I will go and see if this just happened to fire after a hang, or if the pages are important. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs