From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id 85A2B7F37 for ; Thu, 24 Oct 2013 03:48:09 -0500 (CDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay1.corp.sgi.com (Postfix) with ESMTP id 584F58F8054 for ; Thu, 24 Oct 2013 01:48:06 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) by cuda.sgi.com with ESMTP id OhlBzeN8Nv7vFciX (version=TLSv1 cipher=AES256-SHA bits=256 verify=NO) for ; Thu, 24 Oct 2013 01:48:04 -0700 (PDT) Date: Thu, 24 Oct 2013 01:48:03 -0700 From: Christoph Hellwig Subject: Re: [PATCH] xfs: prevent stack overflows from page cache allocation Message-ID: <20131024084803.GA28144@infradead.org> References: <1382585110-1796-1-git-send-email-david@fromorbit.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1382585110-1796-1-git-send-email-david@fromorbit.com> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com On Thu, Oct 24, 2013 at 02:25:10PM +1100, Dave Chinner wrote: > From: Dave Chinner > > Page cache allocation doesn't always go through ->begin_write and > hence we don't always get the opportunity to set the allocation > context to GFP_NOFS. Failing to do this means we open up the direct > relcaim stack to recurse into the filesystem and consume a > significant amount of stack. > > On RHEL6.4 kernels we are seeing ra_submit() and > generic_file_splice_read() from an nfsd context recursing into the > filesystem via the inode cache shrinker and evicting inodes. This is > causing truncation to be run (e.g EOF block freeing) and causing > bmap btree block merges and free space btree block splits to occur. > These btree manipulations are occurring with the call chain already > 30 functions deep and hence there is not enough stack space to > complete such operations. It seems like we really should fix this in the VFS as it could affect all non-trivial filesystems. > To avoid these specific overruns, we need to prevent the page cache > allocation from recursing via direct reclaim. We can do that because > the allocation functions take the allocation context from that which > is stored in the mapping for the inode. We don't set that right now, > so the default is GFP_HIGHUSER_MOVABLE, which is effectively a > GFP_KERNEL context. We need it to be the equivalent of GFP_NOFS, so > when we initialise an inode, set the mapping gfp mask appropriately. > > Signed-off-by: Dave Chinner > --- > fs/xfs/xfs_iops.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c > index c4cd6d4..27e0e54 100644 > --- a/fs/xfs/xfs_iops.c > +++ b/fs/xfs/xfs_iops.c > @@ -1168,6 +1168,7 @@ xfs_setup_inode( > struct xfs_inode *ip) > { > struct inode *inode = &ip->i_vnode; > + gfp_t gfp_mask; > > inode->i_ino = ip->i_ino; > inode->i_state = I_NEW; > @@ -1230,6 +1231,14 @@ xfs_setup_inode( > } > > /* > + * Ensure all page cache allocations are done from GFP_NOFS context to > + * prevent direct reclaim recursion back into the filesystem and blowing > + * stacks or deadlocking. > + */ > + gfp_mask = mapping_gfp_mask(inode->i_mapping); > + mapping_set_gfp_mask(inode->i_mapping, (gfp_mask & ~(__GFP_FS))); > + > + /* > * If there is no attribute fork no ACL can exist on this inode, > * and it can't have any file capabilities attached to it either. > */ > -- > 1.8.4.rc3 > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs ---end quoted text--- _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs