From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p8LBtWUE139501 for ; Wed, 21 Sep 2011 06:55:33 -0500 Received: from server655-han.de-nserver.de (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 0540317B7DD for ; Wed, 21 Sep 2011 04:55:31 -0700 (PDT) Received: from server655-han.de-nserver.de (server655-han.de-nserver.de [85.158.177.45]) by cuda.sgi.com with ESMTP id cxfe6rBPcEfOpwsS for ; Wed, 21 Sep 2011 04:55:31 -0700 (PDT) Message-ID: <4E79D0B2.2010305@profihost.ag> Date: Wed, 21 Sep 2011 13:55:30 +0200 From: Stefan Priebe - Profihost AG MIME-Version: 1.0 Subject: Re: [xfs-masters] xfs deadlock in stable kernel 3.0.4 References: <4E75B660.1030502@profihost.ag> <20110918230245.GF15688@dastard> <4E78665E.8030409@profihost.ag> <20110920160226.GA25542@infradead.org> <4E78CBF4.1030505@profihost.ag> <20110920172455.GA30757@infradead.org> <4E78CEFD.9030603@profihost.ag> <20110920223047.GA13758@infradead.org> <20110921021133.GM15688@dastard> <4E7994D3.5020103@profihost.ag> <20110921114237.GP15688@dastard> In-Reply-To: <20110921114237.GP15688@dastard> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: Christoph Hellwig , "xfs-masters@oss.sgi.com" , "xfs@oss.sgi.com" Am 21.09.2011 13:42, schrieb Dave Chinner: > Ok, I got a hang in the random delete phase. Not sure what is wrong > yet, but inode reclaim is trying to reclaim inodes but failing, and > the AIL is trying to push items but failing. Hence the tail of the > log is not being moved forward and new transactions are being > blocked until log space bcomes available. OK that matches my findings. It was also mostly in the random delete phase. But i've also seen it on creates. > Given this, just triggering a log force is shoul dget everything > moving again. Running "echo 2> /proc/sys/vm/drop_caches" gets inode > reclaim running in sync mode, which causes pinned inodes to trigger > a log force. And once I've done this, everything starts running > again. Oh man i was thinking about trying this. But then i forgot that idea ;-( > So, the log force not triggering in the AIL code looks to be the > problem. That, I simply cannot explain right now - it makes no sense > but that is what all the stats and trace events point to. I need to > do more investigation. Thanks Dave and great that you were able to repeat it. What helps is to build bonnie++ yourself and just remove the stat tests. I've done this too - so bonnie++ runs a lot faster. Stefan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs