From: "Jiang, Dave" <dave.jiang@intel.com>
To: "ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>,
"david@fromorbit.com" <david@fromorbit.com>
Cc: "jack@suse.cz" <jack@suse.cz>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"tytso@mit.edu" <tytso@mit.edu>, "hch@lst.de" <hch@lst.de>
Subject: Re: [PATCH v4 1/3] dax: masking off __GFP_FS in fs DAX handlers
Date: Mon, 19 Dec 2016 17:56:57 +0000 [thread overview]
Message-ID: <1482170215.11563.1.camel@intel.com> (raw)
In-Reply-To: <20161216220450.GZ4219@dastard>
On Sat, 2016-12-17 at 09:04 +1100, Dave Chinner wrote:
> On Fri, Dec 16, 2016 at 09:19:16AM -0700, Ross Zwisler wrote:
> >
> > On Fri, Dec 16, 2016 at 12:07:30PM +1100, Dave Chinner wrote:
> > >
> > > On Thu, Dec 15, 2016 at 04:40:41PM -0700, Dave Jiang wrote:
> > > >
> > > > The caller into dax needs to clear __GFP_FS mask bit since it's
> > > > responsible for acquiring locks / transactions that blocks
> > > > __GFP_FS
> > > > allocation. The caller will restore the original mask when dax
> > > > function
> > > > returns.
> > >
> > > What's the allocation problem you're working around here? Can you
> > > please describe the call chain that is the problem?
> > >
> > > >
> > > > xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
> > > >
> > > > if (IS_DAX(inode)) {
> > > > + gfp_t old_gfp = vmf->gfp_mask;
> > > > +
> > > > + vmf->gfp_mask &= ~__GFP_FS;
> > > > ret = dax_iomap_fault(vma, vmf,
> > > > &xfs_iomap_ops);
> > > > + vmf->gfp_mask = old_gfp;
> > >
> > > I really have to say that I hate code that clears and restores
> > > flags
> > > without any explanation of why the code needs to play flag
> > > tricks. I
> > > take one look at the XFS fault handling code and ask myself now
> > > "why
> > > the hell do we need to clear those flags?" Especially as the
> > > other
> > > paths into generic fault handlers /don't/ require us to do this.
> > > What does DAX do that require us to treat memory allocation
> > > contexts
> > > differently to the filemap_fault() path?
> >
> > This was done in response to Jan Kara's concern:
> >
> > The gfp_mask that propagates from __do_fault() or
> > do_page_mkwrite() is fine
> > because at that point it is correct. But once we grab filesystem
> > locks which
> > are not reclaim safe, we should update vmf->gfp_mask we pass
> > further down
> > into DAX code to not contain __GFP_FS (that's a bug we apparently
> > have
> > there). And inside DAX code, we definitely are not generally safe
> > to add
> > __GFP_FS to mapping_gfp_mask(). Maybe we'd be better off
> > propagating struct
> > vm_fault into this function, using passed gfp_mask there and make
> > sure
> > callers update gfp_mask as appropriate.
> >
> > https://lkml.org/lkml/2016/10/4/37
> >
> > IIUC I think the concern is that, for example, in
> > xfs_filemap_page_mkwrite()
> > we take a read lock on the struct inode.i_rwsem before we call
> > dax_iomap_fault().
>
> That, my friends, is exactly the problem that mapping_gfp_mask() is
> meant to solve. This:
>
> >
> > >
> > > >
> > > > + vmf.gfp_mask = mapping_gfp_mask(mapping) | __GFP_FS
> > > > | __GFP_IO;
>
> Is just so wrong it's not funny.
>
> The whole point of mapping_gfp_mask() is to remove flags from the
> gfp_mask used to do mapping+page cache related allocations that the
> mapping->host considers dangerous when the host may be holding locks.
> This includes mapping tree allocations, and anything else required
> to set up a new entry in the mapping during IO path operations. That
> includes page fault operations...
>
> e.g. in xfs_setup_inode():
>
> /*
> * Ensure all page cache allocations are done from GFP_NOFS
> context to
> * prevent direct reclaim recursion back into the filesystem
> and blowing
> * stacks or deadlocking.
> */
> gfp_mask = mapping_gfp_mask(inode->i_mapping);
> mapping_set_gfp_mask(inode->i_mapping, (gfp_mask &
> ~(__GFP_FS)));
>
> i.e. XFS considers it invalid to use GFP_FS at all for mapping
> allocations in the io path, because we *know* that we hold
> filesystems locks over those allocations.
>
> >
> > dax_iomap_fault() then calls find_or_create_page(), etc. with the
> > vfm->gfp_mask we were given.
>
> Yup. Precisely why we should be using mapping_gfp_mask() as it was
> intended for vmf.gfp_mask....
>
> >
> > I believe the concern is that if that memory allocation tries to do
> > FS
> > operations to free memory because __GFP_FS is part of the gfp mask,
> > then we
> > could end up deadlocking because we are already holding FS locks.
>
> Which is a problem with the filesystem mapping mask setup, not a
> reason to sprinkle random gfpmask clear/set pairs around the code.
> i.e. For DAX inodes, the mapping mask should clear __GFP_FS as XFS
> does above, and the mapping_gfp_mask() should be used unadulterated
> by the DAX page fault code....
I'll drop this patch. We can address the issue separate from the
pmd_fault changes.
>
> Cheers,
>
> Dave.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: "Jiang, Dave" <dave.jiang@intel.com>
To: "ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>,
"david@fromorbit.com" <david@fromorbit.com>
Cc: "hch@lst.de" <hch@lst.de>,
"Williams, Dan J" <dan.j.williams@intel.com>,
"jack@suse.cz" <jack@suse.cz>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"tytso@mit.edu" <tytso@mit.edu>
Subject: Re: [PATCH v4 1/3] dax: masking off __GFP_FS in fs DAX handlers
Date: Mon, 19 Dec 2016 17:56:57 +0000 [thread overview]
Message-ID: <1482170215.11563.1.camel@intel.com> (raw)
In-Reply-To: <20161216220450.GZ4219@dastard>
On Sat, 2016-12-17 at 09:04 +1100, Dave Chinner wrote:
> On Fri, Dec 16, 2016 at 09:19:16AM -0700, Ross Zwisler wrote:
> >
> > On Fri, Dec 16, 2016 at 12:07:30PM +1100, Dave Chinner wrote:
> > >
> > > On Thu, Dec 15, 2016 at 04:40:41PM -0700, Dave Jiang wrote:
> > > >
> > > > The caller into dax needs to clear __GFP_FS mask bit since it's
> > > > responsible for acquiring locks / transactions that blocks
> > > > __GFP_FS
> > > > allocation. The caller will restore the original mask when dax
> > > > function
> > > > returns.
> > >
> > > What's the allocation problem you're working around here? Can you
> > > please describe the call chain that is the problem?
> > >
> > > >
> > > > xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
> > > >
> > > > if (IS_DAX(inode)) {
> > > > + gfp_t old_gfp = vmf->gfp_mask;
> > > > +
> > > > + vmf->gfp_mask &= ~__GFP_FS;
> > > > ret = dax_iomap_fault(vma, vmf,
> > > > &xfs_iomap_ops);
> > > > + vmf->gfp_mask = old_gfp;
> > >
> > > I really have to say that I hate code that clears and restores
> > > flags
> > > without any explanation of why the code needs to play flag
> > > tricks. I
> > > take one look at the XFS fault handling code and ask myself now
> > > "why
> > > the hell do we need to clear those flags?" Especially as the
> > > other
> > > paths into generic fault handlers /don't/ require us to do this.
> > > What does DAX do that require us to treat memory allocation
> > > contexts
> > > differently to the filemap_fault() path?
> >
> > This was done in response to Jan Kara's concern:
> >
> > The gfp_mask that propagates from __do_fault() or
> > do_page_mkwrite() is fine
> > because at that point it is correct. But once we grab filesystem
> > locks which
> > are not reclaim safe, we should update vmf->gfp_mask we pass
> > further down
> > into DAX code to not contain __GFP_FS (that's a bug we apparently
> > have
> > there). And inside DAX code, we definitely are not generally safe
> > to add
> > __GFP_FS to mapping_gfp_mask(). Maybe we'd be better off
> > propagating struct
> > vm_fault into this function, using passed gfp_mask there and make
> > sure
> > callers update gfp_mask as appropriate.
> >
> > https://lkml.org/lkml/2016/10/4/37
> >
> > IIUC I think the concern is that, for example, in
> > xfs_filemap_page_mkwrite()
> > we take a read lock on the struct inode.i_rwsem before we call
> > dax_iomap_fault().
>
> That, my friends, is exactly the problem that mapping_gfp_mask() is
> meant to solve. This:
>
> >
> > >
> > > >
> > > > + vmf.gfp_mask = mapping_gfp_mask(mapping) | __GFP_FS
> > > > | __GFP_IO;
>
> Is just so wrong it's not funny.
>
> The whole point of mapping_gfp_mask() is to remove flags from the
> gfp_mask used to do mapping+page cache related allocations that the
> mapping->host considers dangerous when the host may be holding locks.
> This includes mapping tree allocations, and anything else required
> to set up a new entry in the mapping during IO path operations. That
> includes page fault operations...
>
> e.g. in xfs_setup_inode():
>
> /*
> * Ensure all page cache allocations are done from GFP_NOFS
> context to
> * prevent direct reclaim recursion back into the filesystem
> and blowing
> * stacks or deadlocking.
> */
> gfp_mask = mapping_gfp_mask(inode->i_mapping);
> mapping_set_gfp_mask(inode->i_mapping, (gfp_mask &
> ~(__GFP_FS)));
>
> i.e. XFS considers it invalid to use GFP_FS at all for mapping
> allocations in the io path, because we *know* that we hold
> filesystems locks over those allocations.
>
> >
> > dax_iomap_fault() then calls find_or_create_page(), etc. with the
> > vfm->gfp_mask we were given.
>
> Yup. Precisely why we should be using mapping_gfp_mask() as it was
> intended for vmf.gfp_mask....
>
> >
> > I believe the concern is that if that memory allocation tries to do
> > FS
> > operations to free memory because __GFP_FS is part of the gfp mask,
> > then we
> > could end up deadlocking because we are already holding FS locks.
>
> Which is a problem with the filesystem mapping mask setup, not a
> reason to sprinkle random gfpmask clear/set pairs around the code.
> i.e. For DAX inodes, the mapping mask should clear __GFP_FS as XFS
> does above, and the mapping_gfp_mask() should be used unadulterated
> by the DAX page fault code....
I'll drop this patch. We can address the issue separate from the
pmd_fault changes.
>
> Cheers,
>
> Dave.
next prev parent reply other threads:[~2016-12-19 17:56 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-15 23:40 [PATCH v4 1/3] dax: masking off __GFP_FS in fs DAX handlers Dave Jiang
2016-12-15 23:40 ` Dave Jiang
2016-12-15 23:40 ` [PATCH v4 2/3] mm, dax: make pmd_fault() and friends to be the same as fault() Dave Jiang
2016-12-15 23:40 ` [PATCH v4 3/3] mm, dax: move pmd_fault() to take only vmf parameter Dave Jiang
2016-12-15 23:40 ` Dave Jiang
2016-12-19 17:41 ` Jan Kara
2016-12-19 17:41 ` Jan Kara
2016-12-16 1:07 ` [PATCH v4 1/3] dax: masking off __GFP_FS in fs DAX handlers Dave Chinner
2016-12-16 1:07 ` Dave Chinner
2016-12-16 16:19 ` Ross Zwisler
2016-12-16 16:19 ` Ross Zwisler
2016-12-16 22:04 ` Dave Chinner
2016-12-16 22:04 ` Dave Chinner
2016-12-19 17:56 ` Jiang, Dave [this message]
2016-12-19 17:56 ` Jiang, Dave
2016-12-19 19:53 ` Jan Kara
2016-12-19 19:53 ` Jan Kara
2016-12-19 21:17 ` Dave Chinner
2016-12-19 21:17 ` Dave Chinner
2016-12-20 10:13 ` Michal Hocko
2016-12-20 10:13 ` Michal Hocko
2016-12-21 12:36 ` Jan Kara
2016-12-21 12:36 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1482170215.11563.1.camel@intel.com \
--to=dave.jiang@intel.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=ross.zwisler@linux.intel.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.