All of lore.kernel.org
 help / color / mirror / Atom feed
From: Toshi Kani <toshi.kani@hpe.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	mauricio.porto@hpe.com, Linux MM <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: Fix mmap MAP_POPULATE for DAX pmd mapping
Date: Wed, 02 Dec 2015 13:12:05 -0700	[thread overview]
Message-ID: <1449087125.31589.45.camel@hpe.com> (raw)
In-Reply-To: <1449086521.31589.39.camel@hpe.com>

On Wed, 2015-12-02 at 13:02 -0700, Toshi Kani wrote:
> On Wed, 2015-12-02 at 11:00 -0800, Dan Williams wrote:
> > On Wed, Dec 2, 2015 at 11:26 AM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > > On Wed, 2015-12-02 at 10:06 -0800, Dan Williams wrote:
> > > > On Wed, Dec 2, 2015 at 9:01 AM, Dan Williams <dan.j.williams@intel.com>
> > > > wrote:
> > > > > On Wed, Dec 2, 2015 at 9:43 AM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > > > > > Oh, I see.  I will setup the memmap array and run the tests again.
> > > > > > 
> > > > > > But, why does the PMD mapping depend on the memmap array?  We have
> > > > > > observed major performance improvement with PMD.  This feature 
> > > > > > should always be enabled with DAX regardless of the option to 
> > > > > > allocate the memmap array.
> > > > > > 
> > > > > 
> > > > > Several factors drove this decision, I'm open to considering
> > > > > alternatives but here's the reasoning:
> > > > > 
> > > > > 1/ DAX pmd mappings caused crashes in the get_user_pages path leading
> > > > > to commit e82c9ed41e8 "dax: disable pmd mappings".  The reason pte
> > > > > mappings don't crash and instead trigger -EFAULT is due to the
> > > > > _PAGE_SPECIAL pte bit.
> > > > > 
> > > > > 2/ To enable get_user_pages for DAX, in both the page and huge-page
> > > > > case, we need a new pte bit _PAGE_DEVMAP.
> > > > > 
> > > > > 3/ Given the pte bits are hard to come I'm assuming we won't get two,
> > > > > i.e. both _PAGE_DEVMAP and a new _PAGE_SPECIAL for pmds.  Even if we
> > > > > could get a _PAGE_SPECIAL for pmds I'm not in favor of pursuing it.
> > > > 
> > > > Actually, Dave says they aren't that hard to come by for pmds, so we
> > > > could go add _PMD_SPECIAL if we really wanted to support the limited
> > > > page-less DAX-pmd case.
> > > > 
> > > > But I'm still of the opinion that we run away from the page-less case
> > > > until it can be made a full class citizen with O_DIRECT for pfn
> > > > support.
> > > 
> > > I may be missing something, but per vm_normal_page(), I think 
> > > _PAGE_SPECIAL can be substituted by the following check when we do not
> > > have the memmap.
> > > 
> > >         if ((vma->vm_flags & VM_PFNMAP) ||
> > >             ((vma->vm_flags & VM_MIXEDMAP) && (!pfn_valid(pfn)))) {
> > > 
> > > This is what I did in this patch for follow_trans_huge_pmd(), although I
> > > missed the pfn_valid() check.
> > 
> > That works for __get_user_pages but not __get_user_pages_fast where we
> > don't have access to the vma.
> 
> __get_user_page_fast already refers current->mm, so we should be able to get 
> the vma, and pass it down to gup_pud_range().

Alternatively, we can obtain the vma from current->mm in gup_huge_pmd() when the
!pfn_valid() condition is met, so that we do not add the code to the main path
of __get_user_pages_fast.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Toshi Kani <toshi.kani@hpe.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	mauricio.porto@hpe.com, Linux MM <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: Fix mmap MAP_POPULATE for DAX pmd mapping
Date: Wed, 02 Dec 2015 13:12:05 -0700	[thread overview]
Message-ID: <1449087125.31589.45.camel@hpe.com> (raw)
In-Reply-To: <1449086521.31589.39.camel@hpe.com>

On Wed, 2015-12-02 at 13:02 -0700, Toshi Kani wrote:
> On Wed, 2015-12-02 at 11:00 -0800, Dan Williams wrote:
> > On Wed, Dec 2, 2015 at 11:26 AM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > > On Wed, 2015-12-02 at 10:06 -0800, Dan Williams wrote:
> > > > On Wed, Dec 2, 2015 at 9:01 AM, Dan Williams <dan.j.williams@intel.com>
> > > > wrote:
> > > > > On Wed, Dec 2, 2015 at 9:43 AM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > > > > > Oh, I see.  I will setup the memmap array and run the tests again.
> > > > > > 
> > > > > > But, why does the PMD mapping depend on the memmap array?  We have
> > > > > > observed major performance improvement with PMD.  This feature 
> > > > > > should always be enabled with DAX regardless of the option to 
> > > > > > allocate the memmap array.
> > > > > > 
> > > > > 
> > > > > Several factors drove this decision, I'm open to considering
> > > > > alternatives but here's the reasoning:
> > > > > 
> > > > > 1/ DAX pmd mappings caused crashes in the get_user_pages path leading
> > > > > to commit e82c9ed41e8 "dax: disable pmd mappings".  The reason pte
> > > > > mappings don't crash and instead trigger -EFAULT is due to the
> > > > > _PAGE_SPECIAL pte bit.
> > > > > 
> > > > > 2/ To enable get_user_pages for DAX, in both the page and huge-page
> > > > > case, we need a new pte bit _PAGE_DEVMAP.
> > > > > 
> > > > > 3/ Given the pte bits are hard to come I'm assuming we won't get two,
> > > > > i.e. both _PAGE_DEVMAP and a new _PAGE_SPECIAL for pmds.  Even if we
> > > > > could get a _PAGE_SPECIAL for pmds I'm not in favor of pursuing it.
> > > > 
> > > > Actually, Dave says they aren't that hard to come by for pmds, so we
> > > > could go add _PMD_SPECIAL if we really wanted to support the limited
> > > > page-less DAX-pmd case.
> > > > 
> > > > But I'm still of the opinion that we run away from the page-less case
> > > > until it can be made a full class citizen with O_DIRECT for pfn
> > > > support.
> > > 
> > > I may be missing something, but per vm_normal_page(), I think 
> > > _PAGE_SPECIAL can be substituted by the following check when we do not
> > > have the memmap.
> > > 
> > >         if ((vma->vm_flags & VM_PFNMAP) ||
> > >             ((vma->vm_flags & VM_MIXEDMAP) && (!pfn_valid(pfn)))) {
> > > 
> > > This is what I did in this patch for follow_trans_huge_pmd(), although I
> > > missed the pfn_valid() check.
> > 
> > That works for __get_user_pages but not __get_user_pages_fast where we
> > don't have access to the vma.
> 
> __get_user_page_fast already refers current->mm, so we should be able to get 
> the vma, and pass it down to gup_pud_range().

Alternatively, we can obtain the vma from current->mm in gup_huge_pmd() when the
!pfn_valid() condition is met, so that we do not add the code to the main path
of __get_user_pages_fast.

Thanks,
-Toshi

  reply	other threads:[~2015-12-02 20:12 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-23 20:04 [PATCH] mm: Fix mmap MAP_POPULATE for DAX pmd mapping Toshi Kani
2015-11-23 20:04 ` Toshi Kani
2015-11-23 20:53 ` Dan Williams
2015-11-23 20:53   ` Dan Williams
2015-11-23 22:15   ` Toshi Kani
2015-11-23 22:15     ` Toshi Kani
2015-11-30 22:08 ` Dan Williams
2015-11-30 22:08   ` Dan Williams
2015-12-02  2:19   ` Toshi Kani
2015-12-02  2:19     ` Toshi Kani
2015-12-02  3:45     ` Dan Williams
2015-12-02  3:45       ` Dan Williams
2015-12-02 17:43       ` Toshi Kani
2015-12-02 17:43         ` Toshi Kani
2015-12-02 17:01         ` Dan Williams
2015-12-02 17:01           ` Dan Williams
2015-12-02 18:06           ` Dan Williams
2015-12-02 18:06             ` Dan Williams
2015-12-02 19:26             ` Toshi Kani
2015-12-02 19:26               ` Toshi Kani
2015-12-02 19:00               ` Dan Williams
2015-12-02 19:00                 ` Dan Williams
2015-12-02 20:02                 ` Toshi Kani
2015-12-02 20:02                   ` Toshi Kani
2015-12-02 20:12                   ` Toshi Kani [this message]
2015-12-02 20:12                     ` Toshi Kani
2015-12-02 19:57                     ` Dan Williams
2015-12-02 19:57                       ` Dan Williams
2015-12-02 21:37                       ` Toshi Kani
2015-12-02 21:37                         ` Toshi Kani
2015-12-02 20:54                         ` Dan Williams
2015-12-02 20:54                           ` Dan Williams
2015-12-02 21:55                           ` Toshi Kani
2015-12-02 21:55                             ` Toshi Kani
2015-12-03 23:43                             ` Dan Williams
2015-12-03 23:43                               ` Dan Williams
2015-12-04 16:55                               ` Toshi Kani
2015-12-04 16:55                                 ` Toshi Kani
2015-12-02 22:00                           ` Dave Hansen
2015-12-02 22:00                             ` Dave Hansen
2015-12-02 22:03                             ` Dan Williams
2015-12-02 22:03                               ` Dan Williams
2015-12-02 22:09                               ` Dave Hansen
2015-12-02 22:09                                 ` Dave Hansen
2015-12-03  0:21         ` Toshi Kani
2015-12-03  0:21           ` Toshi Kani
2015-12-02 23:33           ` Dan Williams
2015-12-02 23:33             ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1449087125.31589.45.camel@hpe.com \
    --to=toshi.kani@hpe.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mauricio.porto@hpe.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.