From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id 83D5F7CA0 for ; Mon, 11 Apr 2016 01:25:11 -0500 (CDT) Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by relay1.corp.sgi.com (Postfix) with ESMTP id 57DC28F8033 for ; Sun, 10 Apr 2016 23:25:08 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by cuda.sgi.com with ESMTP id pJWm1buZgqKDJpXU (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Sun, 10 Apr 2016 23:25:06 -0700 (PDT) Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 008C6804E5 for ; Mon, 11 Apr 2016 06:25:05 +0000 (UTC) Date: Mon, 11 Apr 2016 14:25:04 +0800 From: Eryu Guan Subject: Re: [PATCH 0/6 v2] xfs: xfs_iflush_cluster vs xfs_reclaim_inode Message-ID: <20160411062504.GD10345@eguan.usersys.redhat.com> References: <1460072271-23923-1-git-send-email-david@fromorbit.com> <20160408032841.GW10345@eguan.usersys.redhat.com> <20160408113709.GA30614@bfoster.bfoster> <20160410092235.GZ10345@eguan.usersys.redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160410092235.GZ10345@eguan.usersys.redhat.com> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Brian Foster Cc: xfs@oss.sgi.com On Sun, Apr 10, 2016 at 05:22:35PM +0800, Eryu Guan wrote: > On Fri, Apr 08, 2016 at 07:37:09AM -0400, Brian Foster wrote: > > On Fri, Apr 08, 2016 at 11:28:41AM +0800, Eryu Guan wrote: > > > On Fri, Apr 08, 2016 at 09:37:45AM +1000, Dave Chinner wrote: > > > > Hi folks, > > > > > > > > This is the second version of this patch set, first posted and > > > > described here: > > > > > > > > http://oss.sgi.com/archives/xfs/2016-04/msg00069.html > > > > > > Just a quick note here, I'm testing the v1 patchset right now, v4.6-rc2 > > > kernel + v1 patch, config file is based on rhel7 debug kernel config. > > > > > > The test is the same as the original reproducer (long term fsstress run > > > on XFS, exported from NFS). The test on x86_64 host has been running for > > > two days and everything looks fine. Test on ppc64 host has been running > > > for a few hours and I noticed a lock issue and a few warnings, not sure > > > if it's related to the patches or even to XFS yet(I need to run test on > > > stock -rc2 kernel to be sure), but just post the logs here for reference > > > > > > > Had the original problem ever been reproduced on an upstream kernel? > > No, I've never seen the original problem in my upstream kernel testings. > Perhaps that's because I didn't run tests on debug kernels. But I didn't > see it in RHEL7 debug kernel testings either. > > > > > FWIW, my rhel kernel based test is still running well approaching ~48 > > hours. I've seen some lockdep messages (bad unlock balance), but IIRC > > I've been seeing those from the start so I haven't been paying much > > attention to it while digging into the core problem. > > > > > [ 1911.626286] ====================================================== > > > [ 1911.626291] [ INFO: possible circular locking dependency detected ] > > > [ 1911.626297] 4.6.0-rc2.debug+ #1 Not tainted > > > [ 1911.626301] ------------------------------------------------------- > > > [ 1911.626306] nfsd/7402 is trying to acquire lock: > > > [ 1911.626311] (&s->s_sync_lock){+.+.+.}, at: [] .sync_inodes_sb+0xe0/0x230 > > > [ 1911.626327] > > > [ 1911.626327] but task is already holding lock: > > > [ 1911.626333] (sb_internal){.+.+.+}, at: [] .__sb_start_write+0x90/0x130 > > > [ 1911.626346] > > > [ 1911.626346] which lock already depends on the new lock. > > > [ 1911.626346] > > > [ 1911.626353] > > > [ 1911.626353] the existing dependency chain (in reverse order) is: > > > [ 1911.626358] > > ... > > > [ 1911.627134] Possible unsafe locking scenario: > > > [ 1911.627134] > > > [ 1911.627139] CPU0 CPU1 > > > [ 1911.627143] ---- ---- > > > [ 1911.627147] lock(sb_internal); > > > [ 1911.627153] lock(&s->s_sync_lock); > > > [ 1911.627160] lock(sb_internal); > > > [ 1911.627166] lock(&s->s_sync_lock); > > > [ 1911.627172] > > > [ 1911.627172] *** DEADLOCK *** > > > [ 1911.627172] > > ... > > > > We actually have a report of this one on the list: > > > > http://oss.sgi.com/archives/xfs/2016-04/msg00001.html > > > > ... so I don't think it's related to this series. I believe I've seen > > this once or twice when testing something completely unrelated, as well. > > > > > [ 2046.852739] kworker/dying (399) used greatest stack depth: 4352 bytes left > > > [ 2854.687381] XFS: Assertion failed: buffer_mapped(bh), file: fs/xfs/xfs_aops.c, line: 780 > > > [ 2854.687434] ------------[ cut here ]------------ > > > [ 2854.687488] WARNING: CPU: 5 PID: 28924 at fs/xfs/xfs_message.c:105 .asswarn+0x2c/0x40 [xfs] > > ... > > > [ 2854.687997] ---[ end trace 872ac2709186f780 ]--- > > > > These asserts look new to me, however. It would be interesting to see if > > these reproduce independently. > > I've seen just the assert failures in the same fsstress testing on ppc64 > host (no lock warnings in the beginning). Will see if it's still > reproducible on stock kernel. I saw the assert failures on stock kernel (v4.6-rc2) too, so at least it's not something introduced by this patchset. Thanks, Eryu _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs