From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@poochiereds.net>,
Andrew Morton <akpm@linux-foundation.org>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
Dave Chinner <david@fromorbit.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Christoph Hellwig <hch@lst.de>,
"J. Bruce Fields" <bfields@fieldses.org>,
Linux MM <linux-mm@kvack.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH 6/7] mm, fs: introduce file_operations->post_mmap()
Date: Wed, 27 Sep 2017 09:39:18 -0600 [thread overview]
Message-ID: <20170927153918.GA24314@linux.intel.com> (raw)
In-Reply-To: <20170927113527.GD25746@quack2.suse.cz>
On Wed, Sep 27, 2017 at 01:35:27PM +0200, Jan Kara wrote:
> On Tue 26-09-17 14:41:53, Dan Williams wrote:
> > On Tue, Sep 26, 2017 at 2:06 PM, Ross Zwisler
> > <ross.zwisler@linux.intel.com> wrote:
> > > On Tue, Sep 26, 2017 at 12:19:21PM -0700, Dan Williams wrote:
> > >> On Tue, Sep 26, 2017 at 11:57 AM, Ross Zwisler
> > > <>
> > >> > This decision can only be made (in this
> > >> > proposed scheme) *after* the inode->i_mapping->i_mmap tree has been
> > >> > populated, which means we need another call into the filesystem after this
> > >> > insertion has happened.
> > >>
> > >> I get that, but it seems over-engineered and something that can also
> > >> be safely cleaned up after the fact by the code path that is disabling
> > >> DAX.
> > >
> > > I don't think you can safely clean it up after the fact because some thread
> > > might have already called ->mmap() to set up the vma->vm_flags for their new
> > > mapping, but they haven't added it to inode->i_mapping->i_mmap.
> >
> > If madvise(MADV_NOHUGEPAGE) can dynamically change vm_flags, then the
> > DAX disable path can as well. VM_MIXEDMAP looks to be a nop for normal
> > memory mappings.
> >
> > > The inode->i_mapping->i_mmap tree is the only way (that I know of at least)
> > > that the filesystem has any idea about about the mapping. This is the method
> > > by which we would try and clean up mapping flags, if we were to do so, and
> > > it's the only way that the filesystem can know whether or not mappings exist.
> > >
> > > The only way that I could think of to make this safely work is to have the
> > > insertion into the inode->i_mapping->i_mmap tree be our sync point. After
> > > that the filesystem and the mapping code can communicate on the state of DAX,
> > > but before that I think it's basically indeterminate.
> >
> > If we lose the race and leak VM_HUGEPAGE to a non-DAX mapping what
> > breaks? I'd rather be in favor of not setting VM_HUGEPAGE at all in
> > the ->mmap() handler and let the default THP policy take over. In
> > fact, see transparent_hugepage_enabled() we already auto-enable huge
> > page support for dax mappings regardless of VM_HUGEPAGE.
>
> Hum, this is an interesting option. So do you suggest that filesystems
> supporting DAX would always setup mappings with VM_MIXEDMAP and without
> VM_HUGEPAGE and thus we'd get rid of dependency on S_DAX flag in ->mmap?
> That could actually work. The only possible issue I can see is that
> VM_MIXEDMAP is still slightly different from normal page mappings and it
> could have some performance implications - e.g. copy_page_range() does more
> work on VM_MIXEDMAP mappings but not on normal page mappings.
It looks like having VM_MIXEDMAP always set for filesystems that support DAX
might affect their memory's NUMA migration in the non-DAX case?
8e76d4e sched, numa: do not hint for NUMA balancing on VM_MIXEDMAP mappings
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>,
Ross Zwisler <ross.zwisler@linux.intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
Christoph Hellwig <hch@lst.de>,
Dave Chinner <david@fromorbit.com>,
Jeff Layton <jlayton@poochiereds.net>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linux MM <linux-mm@kvack.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH 6/7] mm, fs: introduce file_operations->post_mmap()
Date: Wed, 27 Sep 2017 09:39:18 -0600 [thread overview]
Message-ID: <20170927153918.GA24314@linux.intel.com> (raw)
In-Reply-To: <20170927113527.GD25746@quack2.suse.cz>
On Wed, Sep 27, 2017 at 01:35:27PM +0200, Jan Kara wrote:
> On Tue 26-09-17 14:41:53, Dan Williams wrote:
> > On Tue, Sep 26, 2017 at 2:06 PM, Ross Zwisler
> > <ross.zwisler@linux.intel.com> wrote:
> > > On Tue, Sep 26, 2017 at 12:19:21PM -0700, Dan Williams wrote:
> > >> On Tue, Sep 26, 2017 at 11:57 AM, Ross Zwisler
> > > <>
> > >> > This decision can only be made (in this
> > >> > proposed scheme) *after* the inode->i_mapping->i_mmap tree has been
> > >> > populated, which means we need another call into the filesystem after this
> > >> > insertion has happened.
> > >>
> > >> I get that, but it seems over-engineered and something that can also
> > >> be safely cleaned up after the fact by the code path that is disabling
> > >> DAX.
> > >
> > > I don't think you can safely clean it up after the fact because some thread
> > > might have already called ->mmap() to set up the vma->vm_flags for their new
> > > mapping, but they haven't added it to inode->i_mapping->i_mmap.
> >
> > If madvise(MADV_NOHUGEPAGE) can dynamically change vm_flags, then the
> > DAX disable path can as well. VM_MIXEDMAP looks to be a nop for normal
> > memory mappings.
> >
> > > The inode->i_mapping->i_mmap tree is the only way (that I know of at least)
> > > that the filesystem has any idea about about the mapping. This is the method
> > > by which we would try and clean up mapping flags, if we were to do so, and
> > > it's the only way that the filesystem can know whether or not mappings exist.
> > >
> > > The only way that I could think of to make this safely work is to have the
> > > insertion into the inode->i_mapping->i_mmap tree be our sync point. After
> > > that the filesystem and the mapping code can communicate on the state of DAX,
> > > but before that I think it's basically indeterminate.
> >
> > If we lose the race and leak VM_HUGEPAGE to a non-DAX mapping what
> > breaks? I'd rather be in favor of not setting VM_HUGEPAGE at all in
> > the ->mmap() handler and let the default THP policy take over. In
> > fact, see transparent_hugepage_enabled() we already auto-enable huge
> > page support for dax mappings regardless of VM_HUGEPAGE.
>
> Hum, this is an interesting option. So do you suggest that filesystems
> supporting DAX would always setup mappings with VM_MIXEDMAP and without
> VM_HUGEPAGE and thus we'd get rid of dependency on S_DAX flag in ->mmap?
> That could actually work. The only possible issue I can see is that
> VM_MIXEDMAP is still slightly different from normal page mappings and it
> could have some performance implications - e.g. copy_page_range() does more
> work on VM_MIXEDMAP mappings but not on normal page mappings.
It looks like having VM_MIXEDMAP always set for filesystems that support DAX
might affect their memory's NUMA migration in the non-DAX case?
8e76d4e sched, numa: do not hint for NUMA balancing on VM_MIXEDMAP mappings
WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>,
Ross Zwisler <ross.zwisler@linux.intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
Christoph Hellwig <hch@lst.de>,
Dave Chinner <david@fromorbit.com>,
Jeff Layton <jlayton@poochiereds.net>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linux MM <linux-mm@kvack.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH 6/7] mm, fs: introduce file_operations->post_mmap()
Date: Wed, 27 Sep 2017 09:39:18 -0600 [thread overview]
Message-ID: <20170927153918.GA24314@linux.intel.com> (raw)
In-Reply-To: <20170927113527.GD25746@quack2.suse.cz>
On Wed, Sep 27, 2017 at 01:35:27PM +0200, Jan Kara wrote:
> On Tue 26-09-17 14:41:53, Dan Williams wrote:
> > On Tue, Sep 26, 2017 at 2:06 PM, Ross Zwisler
> > <ross.zwisler@linux.intel.com> wrote:
> > > On Tue, Sep 26, 2017 at 12:19:21PM -0700, Dan Williams wrote:
> > >> On Tue, Sep 26, 2017 at 11:57 AM, Ross Zwisler
> > > <>
> > >> > This decision can only be made (in this
> > >> > proposed scheme) *after* the inode->i_mapping->i_mmap tree has been
> > >> > populated, which means we need another call into the filesystem after this
> > >> > insertion has happened.
> > >>
> > >> I get that, but it seems over-engineered and something that can also
> > >> be safely cleaned up after the fact by the code path that is disabling
> > >> DAX.
> > >
> > > I don't think you can safely clean it up after the fact because some thread
> > > might have already called ->mmap() to set up the vma->vm_flags for their new
> > > mapping, but they haven't added it to inode->i_mapping->i_mmap.
> >
> > If madvise(MADV_NOHUGEPAGE) can dynamically change vm_flags, then the
> > DAX disable path can as well. VM_MIXEDMAP looks to be a nop for normal
> > memory mappings.
> >
> > > The inode->i_mapping->i_mmap tree is the only way (that I know of at least)
> > > that the filesystem has any idea about about the mapping. This is the method
> > > by which we would try and clean up mapping flags, if we were to do so, and
> > > it's the only way that the filesystem can know whether or not mappings exist.
> > >
> > > The only way that I could think of to make this safely work is to have the
> > > insertion into the inode->i_mapping->i_mmap tree be our sync point. After
> > > that the filesystem and the mapping code can communicate on the state of DAX,
> > > but before that I think it's basically indeterminate.
> >
> > If we lose the race and leak VM_HUGEPAGE to a non-DAX mapping what
> > breaks? I'd rather be in favor of not setting VM_HUGEPAGE at all in
> > the ->mmap() handler and let the default THP policy take over. In
> > fact, see transparent_hugepage_enabled() we already auto-enable huge
> > page support for dax mappings regardless of VM_HUGEPAGE.
>
> Hum, this is an interesting option. So do you suggest that filesystems
> supporting DAX would always setup mappings with VM_MIXEDMAP and without
> VM_HUGEPAGE and thus we'd get rid of dependency on S_DAX flag in ->mmap?
> That could actually work. The only possible issue I can see is that
> VM_MIXEDMAP is still slightly different from normal page mappings and it
> could have some performance implications - e.g. copy_page_range() does more
> work on VM_MIXEDMAP mappings but not on normal page mappings.
It looks like having VM_MIXEDMAP always set for filesystems that support DAX
might affect their memory's NUMA migration in the non-DAX case?
8e76d4e sched, numa: do not hint for NUMA balancing on VM_MIXEDMAP mappings
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-09-27 15:36 UTC|newest]
Thread overview: 130+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-25 23:13 [PATCH 0/7] re-enable XFS per-inode DAX Ross Zwisler
2017-09-25 23:13 ` Ross Zwisler
2017-09-25 23:13 ` Ross Zwisler
2017-09-25 23:13 ` [PATCH 1/7] xfs: always use DAX if mount option is used Ross Zwisler
2017-09-25 23:13 ` Ross Zwisler
2017-09-25 23:13 ` Ross Zwisler
2017-09-25 23:38 ` Dave Chinner
2017-09-25 23:38 ` Dave Chinner
2017-09-25 23:38 ` Dave Chinner
2017-09-26 9:35 ` Jan Kara
2017-09-26 9:35 ` Jan Kara
2017-09-26 9:35 ` Jan Kara
2017-09-26 11:09 ` Dave Chinner
2017-09-26 11:09 ` Dave Chinner
2017-09-26 11:09 ` Dave Chinner
2017-09-26 14:37 ` Christoph Hellwig
2017-09-26 14:37 ` Christoph Hellwig
2017-09-26 17:30 ` Ross Zwisler
2017-09-26 17:30 ` Ross Zwisler
2017-09-26 17:30 ` Ross Zwisler
2017-09-26 19:48 ` Darrick J. Wong
2017-09-26 19:48 ` Darrick J. Wong
2017-09-26 22:00 ` Dave Chinner
2017-09-26 22:00 ` Dave Chinner
2017-09-26 22:00 ` Dave Chinner
2017-09-27 6:40 ` Christoph Hellwig
2017-09-27 6:40 ` Christoph Hellwig
2017-09-27 6:40 ` Christoph Hellwig
2017-09-27 16:15 ` Ross Zwisler
2017-09-27 16:15 ` Ross Zwisler
2017-10-01 8:17 ` Christoph Hellwig
2017-10-01 8:17 ` Christoph Hellwig
2017-10-01 8:17 ` Christoph Hellwig
2017-09-26 18:02 ` Eric Sandeen
2017-09-26 18:02 ` Eric Sandeen
2017-09-26 18:02 ` Eric Sandeen
2017-09-26 18:50 ` Ross Zwisler
2017-09-26 18:50 ` Ross Zwisler
2017-09-26 18:50 ` Ross Zwisler
2017-09-25 23:13 ` [PATCH 2/7] xfs: validate bdev support for DAX inode flag Ross Zwisler
2017-09-25 23:13 ` Ross Zwisler
2017-09-25 23:13 ` Ross Zwisler
2017-09-26 6:36 ` Christoph Hellwig
2017-09-26 6:36 ` Christoph Hellwig
2017-09-26 6:36 ` Christoph Hellwig
2017-09-26 17:16 ` Ross Zwisler
2017-09-26 17:16 ` Ross Zwisler
2017-09-26 17:16 ` Ross Zwisler
2017-09-26 17:57 ` Darrick J. Wong
2017-09-26 17:57 ` Darrick J. Wong
2017-09-25 23:14 ` [PATCH 3/7] xfs: protect S_DAX transitions in XFS read path Ross Zwisler
2017-09-25 23:14 ` Ross Zwisler
2017-09-25 23:14 ` Ross Zwisler
2017-09-25 23:27 ` Dave Chinner
2017-09-25 23:27 ` Dave Chinner
2017-09-25 23:27 ` Dave Chinner
2017-09-26 6:32 ` Christoph Hellwig
2017-09-26 6:32 ` Christoph Hellwig
2017-09-26 6:32 ` Christoph Hellwig
2017-09-26 13:59 ` Dan Williams
2017-09-26 13:59 ` Dan Williams
2017-09-26 13:59 ` Dan Williams
2017-09-26 14:33 ` Christoph Hellwig
2017-09-26 14:33 ` Christoph Hellwig
2017-09-26 14:33 ` Christoph Hellwig
2017-09-26 18:11 ` Dan Williams
2017-09-26 18:11 ` Dan Williams
2017-10-01 8:17 ` Christoph Hellwig
2017-10-01 8:17 ` Christoph Hellwig
2017-10-01 8:17 ` Christoph Hellwig
2017-09-25 23:14 ` [PATCH 4/7] xfs: protect S_DAX transitions in XFS write path Ross Zwisler
2017-09-25 23:14 ` Ross Zwisler
2017-09-25 23:14 ` Ross Zwisler
2017-09-25 23:29 ` Dave Chinner
2017-09-25 23:29 ` Dave Chinner
2017-09-25 23:29 ` Dave Chinner
2017-09-25 23:14 ` [PATCH 5/7] xfs: introduce xfs_is_dax_state_changing Ross Zwisler
2017-09-25 23:14 ` Ross Zwisler
2017-09-25 23:14 ` Ross Zwisler
2017-09-26 6:33 ` Christoph Hellwig
2017-09-26 6:33 ` Christoph Hellwig
2017-09-26 6:33 ` Christoph Hellwig
2017-09-25 23:14 ` [PATCH 6/7] mm, fs: introduce file_operations->post_mmap() Ross Zwisler
2017-09-25 23:14 ` Ross Zwisler
2017-09-25 23:14 ` Ross Zwisler
2017-09-25 23:38 ` Dan Williams
2017-09-25 23:38 ` Dan Williams
2017-09-26 18:57 ` Ross Zwisler
2017-09-26 18:57 ` Ross Zwisler
2017-09-26 18:57 ` Ross Zwisler
2017-09-26 19:19 ` Dan Williams
2017-09-26 19:19 ` Dan Williams
2017-09-26 19:19 ` Dan Williams
2017-09-26 21:06 ` Ross Zwisler
2017-09-26 21:06 ` Ross Zwisler
2017-09-26 21:06 ` Ross Zwisler
2017-09-26 21:41 ` Dan Williams
2017-09-26 21:41 ` Dan Williams
2017-09-26 21:41 ` Dan Williams
2017-09-27 11:35 ` Jan Kara
2017-09-27 11:35 ` Jan Kara
2017-09-27 11:35 ` Jan Kara
2017-09-27 14:00 ` Dan Williams
2017-09-27 14:00 ` Dan Williams
2017-09-27 14:00 ` Dan Williams
2017-09-27 15:07 ` Jan Kara
2017-09-27 15:07 ` Jan Kara
2017-09-27 15:07 ` Jan Kara
2017-09-27 15:36 ` Dan Williams
2017-09-27 15:36 ` Dan Williams
2017-09-27 15:39 ` Ross Zwisler [this message]
2017-09-27 15:39 ` Ross Zwisler
2017-09-27 15:39 ` Ross Zwisler
2017-09-27 15:54 ` Dan Williams
2017-09-27 15:54 ` Dan Williams
2017-09-27 15:54 ` Dan Williams
2017-09-26 6:34 ` Christoph Hellwig
2017-09-26 6:34 ` Christoph Hellwig
2017-09-26 6:34 ` Christoph Hellwig
2017-09-25 23:14 ` [PATCH 7/7] xfs: re-enable XFS per-inode DAX Ross Zwisler
2017-09-25 23:14 ` Ross Zwisler
2017-09-25 23:14 ` Ross Zwisler
2017-09-26 0:31 ` Dave Chinner
2017-09-26 0:31 ` Dave Chinner
2017-09-26 0:31 ` Dave Chinner
2017-09-26 6:36 ` Christoph Hellwig
2017-09-26 6:36 ` Christoph Hellwig
2017-09-26 19:01 ` Ross Zwisler
2017-09-26 19:01 ` Ross Zwisler
2017-09-26 19:01 ` Ross Zwisler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170927153918.GA24314@linux.intel.com \
--to=ross.zwisler@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=bfields@fieldses.org \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jlayton@poochiereds.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.