linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Bob Liu <lliubbo@gmail.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave@sr71.net>, David Airlie <airlied@linux.ie>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Dave Chinner <david@fromorbit.com>, Linux-MM <linux-mm@kvack.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Christoph Hellwig <hch@lst.de>,
	Andrea Arcangeli <aarcange@redhat.com>,
	kbuild test robot <lkp@intel.com>,
	linux-nvdimm@lists.01.org, x86@kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Christoffer Dall <christoffer.dall@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [-mm PATCH v4 00/18] get_user_pages() for dax pte and pmd mappings
Date: Sun, 27 Dec 2015 16:33:58 +0800	[thread overview]
Message-ID: <CAA_GA1f44ADq7dw7LUM=rEex8m0vMXvGeOdW1YKkisbv51iuKw@mail.gmail.com> (raw)
In-Reply-To: <20151221054406.34542.64393.stgit@dwillia2-desk3.jf.intel.com>

Hey Dan,

On Mon, Dec 21, 2015 at 1:44 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> Changes since v3 [1]:
>
> 1/ Minimize the impact of the modifications to get_page() by moving
>    zone_device manipulations out of line and marking them unlikely().  In
>    v3 a simple function like:
>
>                 get_page(page);
>                 do_something_with_page(page);
>                 put_page(page);
>
>    ...had a text size of 672 bytes.  That is now down to 289 bytes,
>    compared to the pre-patch baseline size of 267 bytes.  Disassembly shows
>    that aside from conditional branch on the page zone number, data which
>    should already be dcache hot, there is no icache impact in the typical
>    path.  (Andrew, Dave Hansen)
>
> 2/ Minimize the impact to mm.h by moving ~200 lines of definitions to
>    pfn_t.h and memremap.h.  (Andrew)
>
> 3/ Move struct vmem_altmap helper routines to the only C file that
>    consumes them. (Andrew)
>
> 4/ Clean up definitions of pfn_pte, pfn_pmd, pte_devmap, and pmd_devmap
>    to have proper dependencies on CONFIG_MMU and
>    CONFIG_TRANSPARENT_HUGEPAGE to avoid the need to touch arch headers
>    outside of x86.
>
> 5/ Skip registering 'memory block' sysfs devices for zone_device ranges
>    since they are not normal memory and are not eligible to be 'onlined'.
>
> 6/ Improve the diagnostic debug messages in fs/dax.c to include
>    buffer_head details.  (Willy)
>
> These replace the following 18 patches:
>
>     kvm-rename-pfn_t-to-kvm_pfn_t.patch..dax-re-enable-dax-pmd-mappings.patch
>
> ...in the current -mm series, the other 7 patches from v3 are
> unmodified.  They have received a build success notification from the
> kbuild robot over 108 configs.
>
> [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-December/003370.html
>
> ---
> Original summary:
>
> To date, we have implemented two I/O usage models for persistent memory,
> PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
> userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
> to be the target of direct-i/o.  It allows userspace to coordinate
> DMA/RDMA from/to persistent memory.
>
> The implementation leverages the ZONE_DEVICE mm-zone that went into
> 4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
> and dynamically mapped by a device driver.  The pmem driver, after
> mapping a persistent memory range into the system memmap via
> devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
> page-backed pmem-pfns via flags in the new pfn_t type.
>
> The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
> resulting pte(s) inserted into the process page tables with a new
> _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
> off _PAGE_DEVMAP to pin the device hosting the page range active.
> Finally, get_page() and put_page() are modified to take references
> against the device driver established page mapping.
>
> Finally, this need for "struct page" for persistent memory requires
> memory capacity to store the memmap array.  Given the memmap array for a
> large pool of persistent may exhaust available DRAM introduce a
> mechanism to allocate the memmap from persistent memory.  The new


What about space for page tables?
Page tables(mapping all memory in PMEM to virtual address space) may
also consume significantly DRAM space if  huge page is not enabled or
split.
Should we also consider to allocate pte page tables from PMEM in future?

Thanks,
Bob

> "struct vmem_altmap *"  parameter to devm_memremap_pages() enables
> arch_add_memory() to use reserved pmem capacity rather than the page
> allocator.
>
> ---
>
> Dan Williams (18):
>       kvm: rename pfn_t to kvm_pfn_t
>       mm, dax, pmem: introduce pfn_t
>       mm: skip memory block registration for ZONE_DEVICE
>       mm: introduce find_dev_pagemap()
>       x86, mm: introduce vmem_altmap to augment vmemmap_populate()
>       libnvdimm, pfn, pmem: allocate memmap array in persistent memory
>       avr32: convert to asm-generic/memory_model.h
>       hugetlb: fix compile error on tile
>       frv: fix compiler warning from definition of __pmd()
>       x86, mm: introduce _PAGE_DEVMAP
>       mm, dax, gpu: convert vm_insert_mixed to pfn_t
>       mm, dax: convert vmf_insert_pfn_pmd() to pfn_t
>       libnvdimm, pmem: move request_queue allocation earlier in probe
>       mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup
>       mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd
>       mm, x86: get_user_pages() for dax mappings
>       dax: provide diagnostics for pmd mapping failures
>       dax: re-enable dax pmd mappings
>
>
>  arch/arm/include/asm/kvm_mmu.h          |    5 -
>  arch/arm/kvm/mmu.c                      |   10 +
>  arch/arm64/include/asm/kvm_mmu.h        |    3
>  arch/avr32/include/asm/page.h           |    8 +
>  arch/frv/include/asm/page.h             |    2
>  arch/ia64/include/asm/page.h            |    1
>  arch/mips/include/asm/kvm_host.h        |    6 -
>  arch/mips/kvm/emulate.c                 |    2
>  arch/mips/kvm/tlb.c                     |   14 +-
>  arch/powerpc/include/asm/kvm_book3s.h   |    4 -
>  arch/powerpc/include/asm/kvm_ppc.h      |    2
>  arch/powerpc/kvm/book3s.c               |    6 -
>  arch/powerpc/kvm/book3s_32_mmu_host.c   |    2
>  arch/powerpc/kvm/book3s_64_mmu_host.c   |    2
>  arch/powerpc/kvm/e500.h                 |    2
>  arch/powerpc/kvm/e500_mmu_host.c        |    8 +
>  arch/powerpc/kvm/trace_pr.h             |    2
>  arch/powerpc/sysdev/axonram.c           |    9 +
>  arch/x86/include/asm/pgtable.h          |   26 +++-
>  arch/x86/include/asm/pgtable_types.h    |    7 +
>  arch/x86/kvm/iommu.c                    |   11 +-
>  arch/x86/kvm/mmu.c                      |   37 +++--
>  arch/x86/kvm/mmu_audit.c                |    2
>  arch/x86/kvm/paging_tmpl.h              |    6 -
>  arch/x86/kvm/vmx.c                      |    2
>  arch/x86/kvm/x86.c                      |    2
>  arch/x86/mm/gup.c                       |   57 +++++++-
>  arch/x86/mm/init_64.c                   |   33 ++++-
>  arch/x86/mm/pat.c                       |    5 -
>  drivers/base/memory.c                   |   13 ++
>  drivers/block/brd.c                     |    7 +
>  drivers/gpu/drm/exynos/exynos_drm_gem.c |    4 -
>  drivers/gpu/drm/gma500/framebuffer.c    |    4 -
>  drivers/gpu/drm/msm/msm_gem.c           |    4 -
>  drivers/gpu/drm/omapdrm/omap_gem.c      |    7 +
>  drivers/gpu/drm/ttm/ttm_bo_vm.c         |    4 -
>  drivers/nvdimm/pfn_devs.c               |    3
>  drivers/nvdimm/pmem.c                   |   73 +++++++---
>  drivers/s390/block/dcssblk.c            |   11 +-
>  fs/Kconfig                              |    3
>  fs/dax.c                                |   76 ++++++++--
>  include/asm-generic/pgtable.h           |    6 +
>  include/linux/blkdev.h                  |    5 -
>  include/linux/huge_mm.h                 |   15 ++
>  include/linux/hugetlb.h                 |    1
>  include/linux/io.h                      |   15 --
>  include/linux/kvm_host.h                |   37 +++--
>  include/linux/kvm_types.h               |    2
>  include/linux/list.h                    |   12 ++
>  include/linux/memory_hotplug.h          |    3
>  include/linux/memremap.h                |  114 ++++++++++++++++
>  include/linux/mm.h                      |   72 ++++++++--
>  include/linux/mm_types.h                |    5 +
>  include/linux/pfn.h                     |    9 +
>  include/linux/pfn_t.h                   |  102 ++++++++++++++
>  kernel/memremap.c                       |  227 ++++++++++++++++++++++++++++++-
>  lib/list_debug.c                        |    9 +
>  mm/gup.c                                |   19 ++-
>  mm/huge_memory.c                        |  119 ++++++++++++----
>  mm/memory.c                             |   26 ++--
>  mm/memory_hotplug.c                     |   67 +++++++--
>  mm/mprotect.c                           |    5 -
>  mm/page_alloc.c                         |   11 +-
>  mm/pgtable-generic.c                    |    2
>  mm/sparse-vmemmap.c                     |   76 ++++++++++
>  mm/sparse.c                             |    8 +
>  mm/swap.c                               |    3
>  virt/kvm/kvm_main.c                     |   47 +++---
>  68 files changed, 1204 insertions(+), 298 deletions(-)
>  create mode 100644 include/linux/memremap.h
>  create mode 100644 include/linux/pfn_t.h
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2015-12-27  8:34 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-21  5:44 [-mm PATCH v4 00/18] get_user_pages() for dax pte and pmd mappings Dan Williams
2015-12-21  5:44 ` [-mm PATCH v4 01/18] kvm: rename pfn_t to kvm_pfn_t Dan Williams
2015-12-21  5:44 ` [-mm PATCH v4 02/18] mm, dax, pmem: introduce pfn_t Dan Williams
2015-12-21  5:44 ` [-mm PATCH v4 03/18] mm: skip memory block registration for ZONE_DEVICE Dan Williams
2015-12-21  5:44 ` [-mm PATCH v4 04/18] mm: introduce find_dev_pagemap() Dan Williams
2015-12-21  5:44 ` [-mm PATCH v4 05/18] x86, mm: introduce vmem_altmap to augment vmemmap_populate() Dan Williams
2015-12-27  8:40   ` Bob Liu
2015-12-21  5:44 ` [-mm PATCH v4 06/18] libnvdimm, pfn, pmem: allocate memmap array in persistent memory Dan Williams
2015-12-21  5:44 ` [-mm PATCH v4 07/18] avr32: convert to asm-generic/memory_model.h Dan Williams
2015-12-21  5:44 ` [-mm PATCH v4 08/18] hugetlb: fix compile error on tile Dan Williams
2015-12-21  5:44 ` [-mm PATCH v4 09/18] frv: fix compiler warning from definition of __pmd() Dan Williams
2015-12-21  5:45 ` [-mm PATCH v4 10/18] x86, mm: introduce _PAGE_DEVMAP Dan Williams
2015-12-21  5:45 ` [-mm PATCH v4 11/18] mm, dax, gpu: convert vm_insert_mixed to pfn_t Dan Williams
2015-12-21  5:45 ` [-mm PATCH v4 12/18] mm, dax: convert vmf_insert_pfn_pmd() " Dan Williams
2015-12-21  5:45 ` [-mm PATCH v4 13/18] libnvdimm, pmem: move request_queue allocation earlier in probe Dan Williams
2015-12-21  5:45 ` [-mm PATCH v4 14/18] mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup Dan Williams
2015-12-27  8:46   ` Bob Liu
2015-12-27 19:02     ` Dan Williams
2015-12-21  5:45 ` [-mm PATCH v4 15/18] mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd Dan Williams
2015-12-25  0:59   ` [-mm PATCH v5 " Dan Williams
2015-12-25  1:11     ` Sasha Levin
2015-12-30  5:32   ` [-mm PATCH v4 " Williams, Dan J
2015-12-21  5:45 ` [-mm PATCH v4 16/18] mm, x86: get_user_pages() for dax mappings Dan Williams
2015-12-25  1:03   ` [-mm PATCH v5 " Dan Williams
2015-12-21  5:45 ` [-mm PATCH v4 17/18] dax: provide diagnostics for pmd mapping failures Dan Williams
2015-12-21  5:45 ` [-mm PATCH v4 18/18] dax: re-enable dax pmd mappings Dan Williams
2015-12-27  8:33 ` Bob Liu [this message]
2015-12-27 18:55   ` [-mm PATCH v4 00/18] get_user_pages() for dax pte and " Dan Williams
2015-12-29  3:23     ` Bob Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAA_GA1f44ADq7dw7LUM=rEex8m0vMXvGeOdW1YKkisbv51iuKw@mail.gmail.com' \
    --to=lliubbo@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=christoffer.dall@linaro.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@sr71.net \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=lkp@intel.com \
    --cc=logang@deltatee.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).