From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <xfs-bounces@oss.sgi.com>
Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111])
	by oss.sgi.com (Postfix) with ESMTP id A83737CA1
	for <xfs@oss.sgi.com>; Wed,  3 Aug 2016 12:11:35 -0500 (CDT)
Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11])
	by relay1.corp.sgi.com (Postfix) with ESMTP id 5DCDA8F8035
	for <xfs@oss.sgi.com>; Wed,  3 Aug 2016 10:11:29 -0700 (PDT)
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by
	cuda.sgi.com with ESMTP id FiQRA4Gr73naVDtv for
	<xfs@oss.sgi.com>; Wed, 03 Aug 2016 10:11:28 -0700 (PDT)
Date: Wed, 3 Aug 2016 11:11:27 -0600
From: Ross Zwisler <ross.zwisler@linux.intel.com>
Subject: Re: [4.8 hang] xfstests generic/361 hangs on dax enabled filesystems
Message-ID: <20160803171127.GA15876@linux.intel.com>
References: <20160803003354.GP16044@dastard>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20160803003354.GP16044@dastard>
List-Id: XFS Filesystem from SGI <xfs.oss.sgi.com>
List-Unsubscribe: <http://oss.sgi.com/mailman/options/xfs>,
	<mailto:xfs-request@oss.sgi.com?subject=unsubscribe>
List-Archive: <http://oss.sgi.com/pipermail/xfs>
List-Post: <mailto:xfs@oss.sgi.com>
List-Help: <mailto:xfs-request@oss.sgi.com?subject=help>
List-Subscribe: <http://oss.sgi.com/mailman/listinfo/xfs>,
	<mailto:xfs-request@oss.sgi.com?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: xfs-bounces@oss.sgi.com
Sender: xfs-bounces@oss.sgi.com
To: Dave Chinner <david@fromorbit.com>
Cc: linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org, xfs@oss.sgi.com

On Wed, Aug 03, 2016 at 10:33:54AM +1000, Dave Chinner wrote:
> Hi folks,
> 
> Just hit a reproducable hang in generic/361. Essentially this on
> a 8GB pmem device:
> 
> mkfs.xfs -f /dev/pmem1
> mount -o dax /dev/pmem1 /mnt/scratch
> xfs_io -f -c "truncate 1g" test.img
> losetup -f --show /mnt/scratch/test.img
> mkfs.xfs -f /dev/loop0
> 
> And the mkfs.xfs command hangs with a discard that never completes:
> 
> [  243.413918] INFO: task mkfs.xfs:5708 blocked for more than 120 seconds.
> [  243.415678]       Not tainted 4.7.0-dgc+ #862
> [  243.416772] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  243.418769] mkfs.xfs        D ffff880835143c18 13848  5708   5441 0x00000000
> [  243.420620]  ffff880835143c18 ffff880835143c20 ffff88083a244780 ffff8808358ba3c0
> [  243.422636]  ffff88023aa20000 ffff880835144000 7fffffffffffffff 7fffffffffffffff
> [  243.424586]  ffff8808358ba3c0 00000000024000c0 ffff880835143c30 ffffffff81e5e38c
> [  243.426466] Call Trace:
> [  243.427050]  [<ffffffff81e5e38c>] schedule+0x3c/0x90
> [  243.428224]  [<ffffffff81e62be5>] schedule_timeout+0x265/0x330
> [  243.429563]  [<ffffffff8109f125>] ? kvm_clock_read+0x25/0x40
> [  243.430896]  [<ffffffff8109f149>] ? kvm_clock_get_cycles+0x9/0x10
> [  243.432360]  [<ffffffff81125edc>] ? ktime_get+0x3c/0xb0
> [  243.433556]  [<ffffffff81e5db54>] io_schedule_timeout+0xa4/0x110
> [  243.434932]  [<ffffffff81e5eed6>] wait_for_completion_io+0xd6/0x110
> [  243.436297]  [<ffffffff810decd0>] ? wake_up_q+0x70/0x70
> [  243.437436]  [<ffffffff817d6f06>] submit_bio_wait+0x56/0x70
> [  243.438671]  [<ffffffff817e851a>] blkdev_issue_discard+0x6a/0xb0
> [  243.439980]  [<ffffffff810dab69>] ? __might_sleep+0x49/0x80
> [  243.441182]  [<ffffffff817eea87>] blk_ioctl_discard+0x97/0xb0
> [  243.442370]  [<ffffffff817ef7bb>] blkdev_ioctl+0x7eb/0x9a0
> [  243.443485]  [<ffffffff81236a1d>] block_ioctl+0x3d/0x50
> [  243.444552]  [<ffffffff812100df>] do_vfs_ioctl+0x8f/0x670
> [  243.445630]  [<ffffffff81002434>] ? exit_to_usermode_loop+0x94/0xb0
> [  243.446902]  [<ffffffff81210739>] SyS_ioctl+0x79/0x90
> [  243.447927]  [<ffffffff81002bc5>] ? syscall_return_slowpath+0xf5/0x190
> [  243.449236]  [<ffffffff81e63d32>] entry_SYSCALL_64_fastpath+0x1a/0xa4
> 
> This only reproduces when the underlying filesystem is mounted with
> -o dax, so there is a bad interaction with loop devices and DAX
> occurring somewhere. generic/361 is a recent test (committed june 14)
> so this probably hasn't actually been tested until now.
> 
> I haven't got time to look at this right now, hence the report.

Cool, thanks for the report.  I've reproduced this with linux/master, and the
test passes with v4.7.

Running a bisect...

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs