From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (Postfix) with ESMTP id 0E6C97F6A for ; Tue, 21 Jul 2015 00:46:22 -0500 (CDT) Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by relay2.corp.sgi.com (Postfix) with ESMTP id D2C18304048 for ; Mon, 20 Jul 2015 22:46:18 -0700 (PDT) Received: from ipmail05.adl6.internode.on.net (ipmail05.adl6.internode.on.net [150.101.137.143]) by cuda.sgi.com with ESMTP id FJRM9sJ05OCUge1G for ; Mon, 20 Jul 2015 22:46:15 -0700 (PDT) Date: Tue, 21 Jul 2015 15:46:12 +1000 From: Dave Chinner Subject: Re: [regression 4.2-rc3] loop: xfstests xfs/073 deadlocked in low memory conditions Message-ID: <20150721054612.GZ7943@dastard> References: <20150721015934.GY7943@dastard> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Ming Lei Cc: Michal Hocko , linux-mm , Linux Kernel Mailing List , xfs@oss.sgi.com On Tue, Jul 21, 2015 at 12:05:56AM -0400, Ming Lei wrote: > On Mon, Jul 20, 2015 at 9:59 PM, Dave Chinner wrote: > > Hi Ming, > > > > With the recent merge of the loop device changes, I'm now seeing > > XFS deadlock on my single CPU, 1GB RAM VM running xfs/073. > > > > The deadlocked is as follows: > > > > kloopd1: loop_queue_read_work > > xfs_file_iter_read > > lock XFS inode XFS_IOLOCK_SHARED (on image file) > > page cache read (GFP_KERNEL) > > radix tree alloc > > memory reclaim > > reclaim XFS inodes > > log force to unpin inodes > > > > > > xfs-cil/loop1: > > xlog_cil_push > > xlog_write > > > > xlog_state_get_iclog_space() > > > > > > > > kloopd1: loop_queue_write_work > > xfs_file_write_iter > > lock XFS inode XFS_IOLOCK_EXCL (on image file) > > > > > > [The full stack traces are below]. > > > > i.e. the kloopd, with it's split read and write work queues, has > > introduced a dependency through memory reclaim. i.e. that writes > > need to be able to progress for reads make progress. > > This kind of change just makes READ vs READ OR WRITE submitted > to fs concurrently, and the use case should have been simulated from > user space on one regular XFS file too? Assuming the "regular XFS file" is on a normal block device (i.e. not a loop device) then this will not deadlock as there is not dependency on vfs level locking for log writes. i.e. normal userspace IO path is: userspace read vfs_read xfs_read page cache alloc (GFP_KERNEL) direct reclaim xfs_inode reclaim log force CIL push xlog_write submit_bio -> hardware. And then the log IO completes, and everything continues onward. What the loop device used to do: userspace read vfs_read xfs_read page cache alloc (GFP_KERNEL) submit_bio loop device splice_read (on image file) xfs_splice_read page cache alloc (GFP_NOFS) direct reclaim submit_bio -> hardware. And when the read Io completes, everything moves onwards. What the loop device now does: userspace read vfs_read xfs_read page cache alloc (GFP_KERNEL) submit_bio loop device vfs_read (on image file) xfs_read page cache alloc (GFP_KERNEL) direct reclaim xfs_inode reclaim log force CIL push xlog_write submit_bio loop device vfs_write (on image file) xfs_write > > The problem, fundamentally, is that mpage_readpages() does a > > GFP_KERNEL allocation, rather than paying attention to the inode's > > mapping gfp mask, which is set to GFP_NOFS. > > That looks the root cause, and I guess the issue is just triggered > after commit aa4d86163e4(block: loop: switch to VFS ITER_BVEC) > which changes splice to bvec iterator. Yup - you are the unfortunate person who has wandered into the minefield I'd been telling people about for quite some time. :( Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs