From: Toshi Kani <toshi.kani@hpe.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Matthew Wilcox <willy@linux.intel.com>,
Ross Zwisler <ross.zwisler@linux.intel.com>,
mauricio.porto@hpe.com, Linux MM <linux-mm@kvack.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: Fix mmap MAP_POPULATE for DAX pmd mapping
Date: Wed, 02 Dec 2015 14:37:06 -0700 [thread overview]
Message-ID: <1449092226.31589.50.camel@hpe.com> (raw)
In-Reply-To: <CAPcyv4hvX_s3xN9UZ69v7npOhWVFehfGDPZG1MsDmKWBk4Gq1A@mail.gmail.com>
On Wed, 2015-12-02 at 11:57 -0800, Dan Williams wrote:
> On Wed, Dec 2, 2015 at 12:12 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > On Wed, 2015-12-02 at 13:02 -0700, Toshi Kani wrote:
> > > On Wed, 2015-12-02 at 11:00 -0800, Dan Williams wrote:
> > > > On Wed, Dec 2, 2015 at 11:26 AM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > > > > On Wed, 2015-12-02 at 10:06 -0800, Dan Williams wrote:
> > > > > > On Wed, Dec 2, 2015 at 9:01 AM, Dan Williams <
> > > > > > dan.j.williams@intel.com>
> > > > > > wrote:
> > > > > > > On Wed, Dec 2, 2015 at 9:43 AM, Toshi Kani <toshi.kani@hpe.com>
> > > > > > > wrote:
> > > > > > > > Oh, I see. I will setup the memmap array and run the tests
> > > > > > > > again.
> > > > > > > >
> > > > > > > > But, why does the PMD mapping depend on the memmap array? We
> > > > > > > > have observed major performance improvement with PMD. This
> > > > > > > > feature should always be enabled with DAX regardless of the
> > > > > > > > option to allocate the memmap array.
> > > > > > > >
> > > > > > >
> > > > > > > Several factors drove this decision, I'm open to considering
> > > > > > > alternatives but here's the reasoning:
> > > > > > >
> > > > > > > 1/ DAX pmd mappings caused crashes in the get_user_pages path
> > > > > > > leading to commit e82c9ed41e8 "dax: disable pmd mappings". The
> > > > > > > reason pte mappings don't crash and instead trigger -EFAULT is due
> > > > > > > to the _PAGE_SPECIAL pte bit.
> > > > > > >
> > > > > > > 2/ To enable get_user_pages for DAX, in both the page and huge
> > > > > > > -page case, we need a new pte bit _PAGE_DEVMAP.
> > > > > > >
> > > > > > > 3/ Given the pte bits are hard to come I'm assuming we won't get
> > > > > > > two, i.e. both _PAGE_DEVMAP and a new _PAGE_SPECIAL for pmds.
> > > > > > > Even if we could get a _PAGE_SPECIAL for pmds I'm not in favor of
> > > > > > > pursuing it.
> > > > > >
> > > > > > Actually, Dave says they aren't that hard to come by for pmds, so we
> > > > > > could go add _PMD_SPECIAL if we really wanted to support the limited
> > > > > > page-less DAX-pmd case.
> > > > > >
> > > > > > But I'm still of the opinion that we run away from the page-less
> > > > > > case until it can be made a full class citizen with O_DIRECT for pfn
> > > > > > support.
> > > > >
> > > > > I may be missing something, but per vm_normal_page(), I think
> > > > > _PAGE_SPECIAL can be substituted by the following check when we do not
> > > > > have the memmap.
> > > > >
> > > > > if ((vma->vm_flags & VM_PFNMAP) ||
> > > > > ((vma->vm_flags & VM_MIXEDMAP) && (!pfn_valid(pfn)))) {
> > > > >
> > > > > This is what I did in this patch for follow_trans_huge_pmd(), although
> > > > > I missed the pfn_valid() check.
> > > >
> > > > That works for __get_user_pages but not __get_user_pages_fast where we
> > > > don't have access to the vma.
> > >
> > > __get_user_page_fast already refers current->mm, so we should be able to
> > > get the vma, and pass it down to gup_pud_range().
> >
> > Alternatively, we can obtain the vma from current->mm in gup_huge_pmd() when
> > the !pfn_valid() condition is met, so that we do not add the code to the
> > main path of __get_user_pages_fast.
>
> The whole point of __get_user_page_fast() is to avoid the overhead of
> taking the mm semaphore to access the vma. _PAGE_SPECIAL simply tells
> __get_user_pages_fast that it needs to fallback to the
> __get_user_pages slow path.
I see. Then, I think gup_huge_pmd() can simply return 0 when !pfn_valid(),
instead of VM_BUG_ON.
Thanks,
-Toshi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Toshi Kani <toshi.kani@hpe.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Matthew Wilcox <willy@linux.intel.com>,
Ross Zwisler <ross.zwisler@linux.intel.com>,
mauricio.porto@hpe.com, Linux MM <linux-mm@kvack.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: Fix mmap MAP_POPULATE for DAX pmd mapping
Date: Wed, 02 Dec 2015 14:37:06 -0700 [thread overview]
Message-ID: <1449092226.31589.50.camel@hpe.com> (raw)
In-Reply-To: <CAPcyv4hvX_s3xN9UZ69v7npOhWVFehfGDPZG1MsDmKWBk4Gq1A@mail.gmail.com>
On Wed, 2015-12-02 at 11:57 -0800, Dan Williams wrote:
> On Wed, Dec 2, 2015 at 12:12 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > On Wed, 2015-12-02 at 13:02 -0700, Toshi Kani wrote:
> > > On Wed, 2015-12-02 at 11:00 -0800, Dan Williams wrote:
> > > > On Wed, Dec 2, 2015 at 11:26 AM, Toshi Kani <toshi.kani@hpe.com> wrote:
> > > > > On Wed, 2015-12-02 at 10:06 -0800, Dan Williams wrote:
> > > > > > On Wed, Dec 2, 2015 at 9:01 AM, Dan Williams <
> > > > > > dan.j.williams@intel.com>
> > > > > > wrote:
> > > > > > > On Wed, Dec 2, 2015 at 9:43 AM, Toshi Kani <toshi.kani@hpe.com>
> > > > > > > wrote:
> > > > > > > > Oh, I see. I will setup the memmap array and run the tests
> > > > > > > > again.
> > > > > > > >
> > > > > > > > But, why does the PMD mapping depend on the memmap array? We
> > > > > > > > have observed major performance improvement with PMD. This
> > > > > > > > feature should always be enabled with DAX regardless of the
> > > > > > > > option to allocate the memmap array.
> > > > > > > >
> > > > > > >
> > > > > > > Several factors drove this decision, I'm open to considering
> > > > > > > alternatives but here's the reasoning:
> > > > > > >
> > > > > > > 1/ DAX pmd mappings caused crashes in the get_user_pages path
> > > > > > > leading to commit e82c9ed41e8 "dax: disable pmd mappings". The
> > > > > > > reason pte mappings don't crash and instead trigger -EFAULT is due
> > > > > > > to the _PAGE_SPECIAL pte bit.
> > > > > > >
> > > > > > > 2/ To enable get_user_pages for DAX, in both the page and huge
> > > > > > > -page case, we need a new pte bit _PAGE_DEVMAP.
> > > > > > >
> > > > > > > 3/ Given the pte bits are hard to come I'm assuming we won't get
> > > > > > > two, i.e. both _PAGE_DEVMAP and a new _PAGE_SPECIAL for pmds.
> > > > > > > Even if we could get a _PAGE_SPECIAL for pmds I'm not in favor of
> > > > > > > pursuing it.
> > > > > >
> > > > > > Actually, Dave says they aren't that hard to come by for pmds, so we
> > > > > > could go add _PMD_SPECIAL if we really wanted to support the limited
> > > > > > page-less DAX-pmd case.
> > > > > >
> > > > > > But I'm still of the opinion that we run away from the page-less
> > > > > > case until it can be made a full class citizen with O_DIRECT for pfn
> > > > > > support.
> > > > >
> > > > > I may be missing something, but per vm_normal_page(), I think
> > > > > _PAGE_SPECIAL can be substituted by the following check when we do not
> > > > > have the memmap.
> > > > >
> > > > > if ((vma->vm_flags & VM_PFNMAP) ||
> > > > > ((vma->vm_flags & VM_MIXEDMAP) && (!pfn_valid(pfn)))) {
> > > > >
> > > > > This is what I did in this patch for follow_trans_huge_pmd(), although
> > > > > I missed the pfn_valid() check.
> > > >
> > > > That works for __get_user_pages but not __get_user_pages_fast where we
> > > > don't have access to the vma.
> > >
> > > __get_user_page_fast already refers current->mm, so we should be able to
> > > get the vma, and pass it down to gup_pud_range().
> >
> > Alternatively, we can obtain the vma from current->mm in gup_huge_pmd() when
> > the !pfn_valid() condition is met, so that we do not add the code to the
> > main path of __get_user_pages_fast.
>
> The whole point of __get_user_page_fast() is to avoid the overhead of
> taking the mm semaphore to access the vma. _PAGE_SPECIAL simply tells
> __get_user_pages_fast that it needs to fallback to the
> __get_user_pages slow path.
I see. Then, I think gup_huge_pmd() can simply return 0 when !pfn_valid(),
instead of VM_BUG_ON.
Thanks,
-Toshi
next prev parent reply other threads:[~2015-12-02 21:37 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-23 20:04 [PATCH] mm: Fix mmap MAP_POPULATE for DAX pmd mapping Toshi Kani
2015-11-23 20:04 ` Toshi Kani
2015-11-23 20:53 ` Dan Williams
2015-11-23 20:53 ` Dan Williams
2015-11-23 22:15 ` Toshi Kani
2015-11-23 22:15 ` Toshi Kani
2015-11-30 22:08 ` Dan Williams
2015-11-30 22:08 ` Dan Williams
2015-12-02 2:19 ` Toshi Kani
2015-12-02 2:19 ` Toshi Kani
2015-12-02 3:45 ` Dan Williams
2015-12-02 3:45 ` Dan Williams
2015-12-02 17:43 ` Toshi Kani
2015-12-02 17:43 ` Toshi Kani
2015-12-02 17:01 ` Dan Williams
2015-12-02 17:01 ` Dan Williams
2015-12-02 18:06 ` Dan Williams
2015-12-02 18:06 ` Dan Williams
2015-12-02 19:26 ` Toshi Kani
2015-12-02 19:26 ` Toshi Kani
2015-12-02 19:00 ` Dan Williams
2015-12-02 19:00 ` Dan Williams
2015-12-02 20:02 ` Toshi Kani
2015-12-02 20:02 ` Toshi Kani
2015-12-02 20:12 ` Toshi Kani
2015-12-02 20:12 ` Toshi Kani
2015-12-02 19:57 ` Dan Williams
2015-12-02 19:57 ` Dan Williams
2015-12-02 21:37 ` Toshi Kani [this message]
2015-12-02 21:37 ` Toshi Kani
2015-12-02 20:54 ` Dan Williams
2015-12-02 20:54 ` Dan Williams
2015-12-02 21:55 ` Toshi Kani
2015-12-02 21:55 ` Toshi Kani
2015-12-03 23:43 ` Dan Williams
2015-12-03 23:43 ` Dan Williams
2015-12-04 16:55 ` Toshi Kani
2015-12-04 16:55 ` Toshi Kani
2015-12-02 22:00 ` Dave Hansen
2015-12-02 22:00 ` Dave Hansen
2015-12-02 22:03 ` Dan Williams
2015-12-02 22:03 ` Dan Williams
2015-12-02 22:09 ` Dave Hansen
2015-12-02 22:09 ` Dave Hansen
2015-12-03 0:21 ` Toshi Kani
2015-12-03 0:21 ` Toshi Kani
2015-12-02 23:33 ` Dan Williams
2015-12-02 23:33 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1449092226.31589.50.camel@hpe.com \
--to=toshi.kani@hpe.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=mauricio.porto@hpe.com \
--cc=ross.zwisler@linux.intel.com \
--cc=willy@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.