From: Christoph Hellwig <hch@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [PATCH] xfs: prevent stack overflows from page cache allocation
Date: Thu, 24 Oct 2013 01:48:03 -0700 [thread overview]
Message-ID: <20131024084803.GA28144@infradead.org> (raw)
In-Reply-To: <1382585110-1796-1-git-send-email-david@fromorbit.com>
On Thu, Oct 24, 2013 at 02:25:10PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Page cache allocation doesn't always go through ->begin_write and
> hence we don't always get the opportunity to set the allocation
> context to GFP_NOFS. Failing to do this means we open up the direct
> relcaim stack to recurse into the filesystem and consume a
> significant amount of stack.
>
> On RHEL6.4 kernels we are seeing ra_submit() and
> generic_file_splice_read() from an nfsd context recursing into the
> filesystem via the inode cache shrinker and evicting inodes. This is
> causing truncation to be run (e.g EOF block freeing) and causing
> bmap btree block merges and free space btree block splits to occur.
> These btree manipulations are occurring with the call chain already
> 30 functions deep and hence there is not enough stack space to
> complete such operations.
It seems like we really should fix this in the VFS as it could affect
all non-trivial filesystems.
> To avoid these specific overruns, we need to prevent the page cache
> allocation from recursing via direct reclaim. We can do that because
> the allocation functions take the allocation context from that which
> is stored in the mapping for the inode. We don't set that right now,
> so the default is GFP_HIGHUSER_MOVABLE, which is effectively a
> GFP_KERNEL context. We need it to be the equivalent of GFP_NOFS, so
> when we initialise an inode, set the mapping gfp mask appropriately.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_iops.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index c4cd6d4..27e0e54 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1168,6 +1168,7 @@ xfs_setup_inode(
> struct xfs_inode *ip)
> {
> struct inode *inode = &ip->i_vnode;
> + gfp_t gfp_mask;
>
> inode->i_ino = ip->i_ino;
> inode->i_state = I_NEW;
> @@ -1230,6 +1231,14 @@ xfs_setup_inode(
> }
>
> /*
> + * Ensure all page cache allocations are done from GFP_NOFS context to
> + * prevent direct reclaim recursion back into the filesystem and blowing
> + * stacks or deadlocking.
> + */
> + gfp_mask = mapping_gfp_mask(inode->i_mapping);
> + mapping_set_gfp_mask(inode->i_mapping, (gfp_mask & ~(__GFP_FS)));
> +
> + /*
> * If there is no attribute fork no ACL can exist on this inode,
> * and it can't have any file capabilities attached to it either.
> */
> --
> 1.8.4.rc3
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
---end quoted text---
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Christoph Hellwig <hch@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] xfs: prevent stack overflows from page cache allocation
Date: Thu, 24 Oct 2013 01:48:03 -0700 [thread overview]
Message-ID: <20131024084803.GA28144@infradead.org> (raw)
In-Reply-To: <1382585110-1796-1-git-send-email-david@fromorbit.com>
On Thu, Oct 24, 2013 at 02:25:10PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Page cache allocation doesn't always go through ->begin_write and
> hence we don't always get the opportunity to set the allocation
> context to GFP_NOFS. Failing to do this means we open up the direct
> relcaim stack to recurse into the filesystem and consume a
> significant amount of stack.
>
> On RHEL6.4 kernels we are seeing ra_submit() and
> generic_file_splice_read() from an nfsd context recursing into the
> filesystem via the inode cache shrinker and evicting inodes. This is
> causing truncation to be run (e.g EOF block freeing) and causing
> bmap btree block merges and free space btree block splits to occur.
> These btree manipulations are occurring with the call chain already
> 30 functions deep and hence there is not enough stack space to
> complete such operations.
It seems like we really should fix this in the VFS as it could affect
all non-trivial filesystems.
> To avoid these specific overruns, we need to prevent the page cache
> allocation from recursing via direct reclaim. We can do that because
> the allocation functions take the allocation context from that which
> is stored in the mapping for the inode. We don't set that right now,
> so the default is GFP_HIGHUSER_MOVABLE, which is effectively a
> GFP_KERNEL context. We need it to be the equivalent of GFP_NOFS, so
> when we initialise an inode, set the mapping gfp mask appropriately.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_iops.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index c4cd6d4..27e0e54 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1168,6 +1168,7 @@ xfs_setup_inode(
> struct xfs_inode *ip)
> {
> struct inode *inode = &ip->i_vnode;
> + gfp_t gfp_mask;
>
> inode->i_ino = ip->i_ino;
> inode->i_state = I_NEW;
> @@ -1230,6 +1231,14 @@ xfs_setup_inode(
> }
>
> /*
> + * Ensure all page cache allocations are done from GFP_NOFS context to
> + * prevent direct reclaim recursion back into the filesystem and blowing
> + * stacks or deadlocking.
> + */
> + gfp_mask = mapping_gfp_mask(inode->i_mapping);
> + mapping_set_gfp_mask(inode->i_mapping, (gfp_mask & ~(__GFP_FS)));
> +
> + /*
> * If there is no attribute fork no ACL can exist on this inode,
> * and it can't have any file capabilities attached to it either.
> */
> --
> 1.8.4.rc3
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
---end quoted text---
next prev parent reply other threads:[~2013-10-24 8:48 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-24 3:25 [PATCH] xfs: prevent stack overflows from page cache allocation Dave Chinner
2013-10-24 8:48 ` Christoph Hellwig [this message]
2013-10-24 8:48 ` Christoph Hellwig
2013-10-24 10:37 ` Dave Chinner
2013-10-24 10:37 ` Dave Chinner
2013-10-24 15:42 ` Christoph Hellwig
2013-10-24 15:42 ` Christoph Hellwig
2013-10-24 16:41 ` Ben Myers
2013-10-24 21:24 ` Dave Chinner
2013-10-24 21:24 ` Dave Chinner
2013-10-25 11:29 ` Christoph Hellwig
2013-10-25 11:29 ` Christoph Hellwig
2013-10-27 9:04 ` Dave Chinner
2013-10-27 9:04 ` Dave Chinner
2013-10-28 9:49 ` Christoph Hellwig
2013-10-28 9:49 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131024084803.GA28144@infradead.org \
--to=hch@infradead.org \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.