* [PATCH 0/7] mm: faster get user pages
@ 2018-09-19 21:02 Keith Busch
2018-09-19 21:02 ` [PATCH 1/7] mm/gup_benchmark: Time put_page Keith Busch
` (7 more replies)
0 siblings, 8 replies; 14+ messages in thread
From: Keith Busch @ 2018-09-19 21:02 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Kirill Shutemov, Dave Hansen, Dan Williams, Keith Busch
Pinning user pages out of nvdimm dax memory is significantly slower
compared to system ram. Analysis points to software overhead incurred
from a radix tree lookup. This patch series fixes that by removing the
relatively costly dev_pagemap lookup that was repeated for each page,
significantly increasing gup time.
The first 5 patches are just updating the benchmark to help test and
demonstrate the value of the last 2 patches.
The results were compared with following benchmark command for device
DAX memory:
# gup_benchmark -m $((12*1024)) -n 512 -L -f /dev/dax0.0
Before: 1037581 usec
After: 375786 usec
Not bad; the after is the same time as using baseline anonymous system
RAM after this patch set, where before was nearly 3x longer.
Keith Busch (7):
mm/gup_benchmark: Time put_page
mm/gup_benchmark: Add additional pinning methods
tools/gup_benchmark: Fix 'write' flag usage
tools/gup_benchmark: Allow user specified file
tools/gup_benchmark: Add parameter for hugetlb
mm/gup: Combine parameters into struct
mm/gup: Cache dev_pagemap while pinning pages
include/linux/huge_mm.h | 12 +-
include/linux/hugetlb.h | 2 +-
include/linux/mm.h | 27 ++-
mm/gup.c | 279 ++++++++++++++---------------
mm/gup_benchmark.c | 36 +++-
mm/huge_memory.c | 67 ++++---
mm/nommu.c | 6 +-
tools/testing/selftests/vm/gup_benchmark.c | 40 ++++-
8 files changed, 262 insertions(+), 207 deletions(-)
--
2.14.4
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH 1/7] mm/gup_benchmark: Time put_page 2018-09-19 21:02 [PATCH 0/7] mm: faster get user pages Keith Busch @ 2018-09-19 21:02 ` Keith Busch 2018-09-19 21:02 ` [PATCH 2/7] mm/gup_benchmark: Add additional pinning methods Keith Busch ` (6 subsequent siblings) 7 siblings, 0 replies; 14+ messages in thread From: Keith Busch @ 2018-09-19 21:02 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: Kirill Shutemov, Dave Hansen, Dan Williams, Keith Busch We'd like to measure time to unpin user pages, so this adds a second benchmark timer on put_page, separate from get_page. This will break ABI on this ioctl, but being an in-kernel benchmark may be acceptable. Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> --- mm/gup_benchmark.c | 8 ++++++-- tools/testing/selftests/vm/gup_benchmark.c | 6 ++++-- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c index 6a473709e9b6..76cd35e477af 100644 --- a/mm/gup_benchmark.c +++ b/mm/gup_benchmark.c @@ -8,7 +8,8 @@ #define GUP_FAST_BENCHMARK _IOWR('g', 1, struct gup_benchmark) struct gup_benchmark { - __u64 delta_usec; + __u64 get_delta_usec; + __u64 put_delta_usec; __u64 addr; __u64 size; __u32 nr_pages_per_call; @@ -47,14 +48,17 @@ static int __gup_benchmark_ioctl(unsigned int cmd, } end_time = ktime_get(); - gup->delta_usec = ktime_us_delta(end_time, start_time); + gup->get_delta_usec = ktime_us_delta(end_time, start_time); gup->size = addr - gup->addr; + start_time = ktime_get(); for (i = 0; i < nr_pages; i++) { if (!pages[i]) break; put_page(pages[i]); } + end_time = ktime_get(); + gup->put_delta_usec = ktime_us_delta(end_time, start_time); kvfree(pages); return 0; diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c index 36df55132036..bdcb97acd0ac 100644 --- a/tools/testing/selftests/vm/gup_benchmark.c +++ b/tools/testing/selftests/vm/gup_benchmark.c @@ -17,7 +17,8 @@ #define GUP_FAST_BENCHMARK _IOWR('g', 1, struct gup_benchmark) struct gup_benchmark { - __u64 delta_usec; + __u64 get_delta_usec; + __u64 put_delta_usec; __u64 addr; __u64 size; __u32 nr_pages_per_call; @@ -81,7 +82,8 @@ int main(int argc, char **argv) if (ioctl(fd, GUP_FAST_BENCHMARK, &gup)) perror("ioctl"), exit(1); - printf("Time: %lld us", gup.delta_usec); + printf("Time: get:%lld put:%lld us", gup.get_delta_usec, + gup.put_delta_usec); if (gup.size != size) printf(", truncated (size: %lld)", gup.size); printf("\n"); -- 2.14.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/7] mm/gup_benchmark: Add additional pinning methods 2018-09-19 21:02 [PATCH 0/7] mm: faster get user pages Keith Busch 2018-09-19 21:02 ` [PATCH 1/7] mm/gup_benchmark: Time put_page Keith Busch @ 2018-09-19 21:02 ` Keith Busch 2018-09-19 21:02 ` [PATCH 3/7] tools/gup_benchmark: Fix 'write' flag usage Keith Busch ` (5 subsequent siblings) 7 siblings, 0 replies; 14+ messages in thread From: Keith Busch @ 2018-09-19 21:02 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: Kirill Shutemov, Dave Hansen, Dan Williams, Keith Busch This patch provides new gup benchmark ioctl commands to run different user page pinning methods, get_user_pages_longterm and get_user_pages, in addition to the existing get_user_pages_fast. Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> --- mm/gup_benchmark.c | 28 ++++++++++++++++++++++++++-- tools/testing/selftests/vm/gup_benchmark.c | 13 +++++++++++-- 2 files changed, 37 insertions(+), 4 deletions(-) diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c index 76cd35e477af..e6d9ce001ffa 100644 --- a/mm/gup_benchmark.c +++ b/mm/gup_benchmark.c @@ -6,6 +6,8 @@ #include <linux/debugfs.h> #define GUP_FAST_BENCHMARK _IOWR('g', 1, struct gup_benchmark) +#define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) +#define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark) struct gup_benchmark { __u64 get_delta_usec; @@ -41,7 +43,23 @@ static int __gup_benchmark_ioctl(unsigned int cmd, nr = (next - addr) / PAGE_SIZE; } - nr = get_user_pages_fast(addr, nr, gup->flags & 1, pages + i); + switch (cmd) { + case GUP_FAST_BENCHMARK: + nr = get_user_pages_fast(addr, nr, gup->flags & 1, + pages + i); + break; + case GUP_LONGTERM_BENCHMARK: + nr = get_user_pages_longterm(addr, nr, gup->flags & 1, + pages + i, NULL); + break; + case GUP_BENCHMARK: + nr = get_user_pages(addr, nr, gup->flags & 1, pages + i, + NULL); + break; + default: + return -1; + } + if (nr <= 0) break; i += nr; @@ -70,8 +88,14 @@ static long gup_benchmark_ioctl(struct file *filep, unsigned int cmd, struct gup_benchmark gup; int ret; - if (cmd != GUP_FAST_BENCHMARK) + switch (cmd) { + case GUP_FAST_BENCHMARK: + case GUP_LONGTERM_BENCHMARK: + case GUP_BENCHMARK: + break; + default: return -EINVAL; + } if (copy_from_user(&gup, (void __user *)arg, sizeof(gup))) return -EFAULT; diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c index bdcb97acd0ac..c2f785ded9b9 100644 --- a/tools/testing/selftests/vm/gup_benchmark.c +++ b/tools/testing/selftests/vm/gup_benchmark.c @@ -15,6 +15,8 @@ #define PAGE_SIZE sysconf(_SC_PAGESIZE) #define GUP_FAST_BENCHMARK _IOWR('g', 1, struct gup_benchmark) +#define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) +#define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark) struct gup_benchmark { __u64 get_delta_usec; @@ -30,9 +32,10 @@ int main(int argc, char **argv) struct gup_benchmark gup; unsigned long size = 128 * MB; int i, fd, opt, nr_pages = 1, thp = -1, repeats = 1, write = 0; + int cmd = GUP_FAST_BENCHMARK; char *p; - while ((opt = getopt(argc, argv, "m:r:n:tT")) != -1) { + while ((opt = getopt(argc, argv, "m:r:n:tTLU")) != -1) { switch (opt) { case 'm': size = atoi(optarg) * MB; @@ -49,6 +52,12 @@ int main(int argc, char **argv) case 'T': thp = 0; break; + case 'L': + cmd = GUP_LONGTERM_BENCHMARK; + break; + case 'U': + cmd = GUP_BENCHMARK; + break; case 'w': write = 1; default: @@ -79,7 +88,7 @@ int main(int argc, char **argv) for (i = 0; i < repeats; i++) { gup.size = size; - if (ioctl(fd, GUP_FAST_BENCHMARK, &gup)) + if (ioctl(fd, cmd, &gup)) perror("ioctl"), exit(1); printf("Time: get:%lld put:%lld us", gup.get_delta_usec, -- 2.14.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/7] tools/gup_benchmark: Fix 'write' flag usage 2018-09-19 21:02 [PATCH 0/7] mm: faster get user pages Keith Busch 2018-09-19 21:02 ` [PATCH 1/7] mm/gup_benchmark: Time put_page Keith Busch 2018-09-19 21:02 ` [PATCH 2/7] mm/gup_benchmark: Add additional pinning methods Keith Busch @ 2018-09-19 21:02 ` Keith Busch 2018-09-19 21:02 ` [PATCH 4/7] tools/gup_benchmark: Allow user specified file Keith Busch ` (4 subsequent siblings) 7 siblings, 0 replies; 14+ messages in thread From: Keith Busch @ 2018-09-19 21:02 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: Kirill Shutemov, Dave Hansen, Dan Williams, Keith Busch If the '-w' parameter was provided, the benchmark would exit due to a mssing 'break'. Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> --- tools/testing/selftests/vm/gup_benchmark.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c index c2f785ded9b9..b2082df8beb4 100644 --- a/tools/testing/selftests/vm/gup_benchmark.c +++ b/tools/testing/selftests/vm/gup_benchmark.c @@ -60,6 +60,7 @@ int main(int argc, char **argv) break; case 'w': write = 1; + break; default: return -1; } -- 2.14.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/7] tools/gup_benchmark: Allow user specified file 2018-09-19 21:02 [PATCH 0/7] mm: faster get user pages Keith Busch ` (2 preceding siblings ...) 2018-09-19 21:02 ` [PATCH 3/7] tools/gup_benchmark: Fix 'write' flag usage Keith Busch @ 2018-09-19 21:02 ` Keith Busch 2018-09-19 21:02 ` [PATCH 5/7] tools/gup_benchmark: Add parameter for hugetlb Keith Busch ` (3 subsequent siblings) 7 siblings, 0 replies; 14+ messages in thread From: Keith Busch @ 2018-09-19 21:02 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: Kirill Shutemov, Dave Hansen, Dan Williams, Keith Busch The gup benchmark by default maps anonymous memory. This patch allows a user to specify a file to map, providing a means to test various file backings, like device and filesystem DAX. Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> --- tools/testing/selftests/vm/gup_benchmark.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c index b2082df8beb4..f2c99e2436f8 100644 --- a/tools/testing/selftests/vm/gup_benchmark.c +++ b/tools/testing/selftests/vm/gup_benchmark.c @@ -33,9 +33,12 @@ int main(int argc, char **argv) unsigned long size = 128 * MB; int i, fd, opt, nr_pages = 1, thp = -1, repeats = 1, write = 0; int cmd = GUP_FAST_BENCHMARK; + int file_map = -1; + int flags = MAP_ANONYMOUS | MAP_PRIVATE; + char *file = NULL; char *p; - while ((opt = getopt(argc, argv, "m:r:n:tTLU")) != -1) { + while ((opt = getopt(argc, argv, "m:r:n:f:tTLU")) != -1) { switch (opt) { case 'm': size = atoi(optarg) * MB; @@ -61,11 +64,22 @@ int main(int argc, char **argv) case 'w': write = 1; break; + case 'f': + file = optarg; + flags &= ~(MAP_PRIVATE | MAP_ANONYMOUS); + flags |= MAP_SHARED; + break; default: return -1; } } + if (file) { + file_map = open(file, O_RDWR|O_CREAT); + if (file_map < 0) + perror("open"), exit(file_map); + } + gup.nr_pages_per_call = nr_pages; gup.flags = write; @@ -73,8 +87,7 @@ int main(int argc, char **argv) if (fd == -1) perror("open"), exit(1); - p = mmap(NULL, size, PROT_READ | PROT_WRITE, - MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + p = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, file_map, 0); if (p == MAP_FAILED) perror("mmap"), exit(1); gup.addr = (unsigned long)p; -- 2.14.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 5/7] tools/gup_benchmark: Add parameter for hugetlb 2018-09-19 21:02 [PATCH 0/7] mm: faster get user pages Keith Busch ` (3 preceding siblings ...) 2018-09-19 21:02 ` [PATCH 4/7] tools/gup_benchmark: Allow user specified file Keith Busch @ 2018-09-19 21:02 ` Keith Busch 2018-09-19 21:02 ` [PATCH 6/7] mm/gup: Combine parameters into struct Keith Busch ` (2 subsequent siblings) 7 siblings, 0 replies; 14+ messages in thread From: Keith Busch @ 2018-09-19 21:02 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: Kirill Shutemov, Dave Hansen, Dan Williams, Keith Busch Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> --- tools/testing/selftests/vm/gup_benchmark.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c index f2c99e2436f8..5d96e2b3d2f1 100644 --- a/tools/testing/selftests/vm/gup_benchmark.c +++ b/tools/testing/selftests/vm/gup_benchmark.c @@ -38,7 +38,7 @@ int main(int argc, char **argv) char *file = NULL; char *p; - while ((opt = getopt(argc, argv, "m:r:n:f:tTLU")) != -1) { + while ((opt = getopt(argc, argv, "m:r:n:f:tTLUH")) != -1) { switch (opt) { case 'm': size = atoi(optarg) * MB; @@ -64,6 +64,9 @@ int main(int argc, char **argv) case 'w': write = 1; break; + case 'H': + flags |= MAP_HUGETLB; + break; case 'f': file = optarg; flags &= ~(MAP_PRIVATE | MAP_ANONYMOUS); -- 2.14.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 6/7] mm/gup: Combine parameters into struct 2018-09-19 21:02 [PATCH 0/7] mm: faster get user pages Keith Busch ` (4 preceding siblings ...) 2018-09-19 21:02 ` [PATCH 5/7] tools/gup_benchmark: Add parameter for hugetlb Keith Busch @ 2018-09-19 21:02 ` Keith Busch 2018-09-19 22:40 ` Keith Busch ` (2 more replies) 2018-09-19 21:02 ` [PATCH 7/7] mm/gup: Cache dev_pagemap while pinning pages Keith Busch 2018-09-19 21:15 ` [PATCH 0/7] mm: faster get user pages Dave Hansen 7 siblings, 3 replies; 14+ messages in thread From: Keith Busch @ 2018-09-19 21:02 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: Kirill Shutemov, Dave Hansen, Dan Williams, Keith Busch This will make it easier to add new parameters that we may wish to thread through these function calls. Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> --- include/linux/huge_mm.h | 12 +-- include/linux/hugetlb.h | 2 +- include/linux/mm.h | 21 ++++- mm/gup.c | 238 +++++++++++++++++++++++------------------------- mm/huge_memory.c | 32 +++---- mm/nommu.c | 6 +- 6 files changed, 151 insertions(+), 160 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 99c19b06d9a4..7d22e2c7f154 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -212,10 +212,8 @@ static inline int hpage_nr_pages(struct page *page) return 1; } -struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, int flags); -struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud, int flags); +struct page *follow_devmap_pmd(struct follow_page_context *ctx, pmd_t *pmd); +struct page *follow_devmap_pud(struct follow_page_context *ctx, pud_t *pud); extern vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t orig_pmd); @@ -343,14 +341,12 @@ static inline void mm_put_huge_zero_page(struct mm_struct *mm) return; } -static inline struct page *follow_devmap_pmd(struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmd, int flags) +static inline struct page *follow_devmap_pmd(struct gup_context *ctx, pmd_t *pmd) { return NULL; } -static inline struct page *follow_devmap_pud(struct vm_area_struct *vma, - unsigned long addr, pud_t *pud, int flags) +static inline struct page *follow_devmap_pud(struct gup_context *ctx, pud_t *pud) { return NULL; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6b68e345f0ca..64b675863793 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -180,7 +180,7 @@ static inline void hugetlb_report_meminfo(struct seq_file *m) static inline void hugetlb_show_meminfo(void) { } -#define follow_huge_pd(vma, addr, hpd, flags, pdshift) NULL +#define follow_huge_pd(ctx, hpd, pdshift) NULL #define follow_huge_pmd(mm, addr, pmd, flags) NULL #define follow_huge_pud(mm, addr, pud, flags) NULL #define follow_huge_pgd(mm, addr, pgd, flags) NULL diff --git a/include/linux/mm.h b/include/linux/mm.h index a61ebe8ad4ca..f1fd241c9071 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -378,6 +378,13 @@ struct vm_fault { */ }; +struct follow_page_context { + struct vm_area_struct *vma; + unsigned long address; + unsigned int page_mask; + unsigned int flags; +}; + /* page entry size for vm->huge_fault() */ enum page_entry_size { PE_SIZE_PTE = 0, @@ -2534,15 +2541,19 @@ static inline vm_fault_t vmf_error(int err) return VM_FAULT_SIGBUS; } -struct page *follow_page_mask(struct vm_area_struct *vma, - unsigned long address, unsigned int foll_flags, - unsigned int *page_mask); +struct page *follow_page_mask(struct follow_page_context *ctx); static inline struct page *follow_page(struct vm_area_struct *vma, unsigned long address, unsigned int foll_flags) { - unsigned int unused_page_mask; - return follow_page_mask(vma, address, foll_flags, &unused_page_mask); + struct follow_page_context ctx = { + .vma = vma, + .address = address, + .page_mask = 0, + .flags = foll_flags, + }; + + return follow_page_mask(&ctx); } #define FOLL_WRITE 0x01 /* check pte is writable */ diff --git a/mm/gup.c b/mm/gup.c index 1abc8b4afff6..a61a6874c80c 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -20,8 +20,7 @@ #include "internal.h" -static struct page *no_page_table(struct vm_area_struct *vma, - unsigned int flags) +static struct page *no_page_table(struct follow_page_context *ctx) { /* * When core dumping an enormous anonymous area that nobody @@ -31,28 +30,28 @@ static struct page *no_page_table(struct vm_area_struct *vma, * But we can only make this optimization where a hole would surely * be zero-filled if handle_mm_fault() actually did handle it. */ - if ((flags & FOLL_DUMP) && (!vma->vm_ops || !vma->vm_ops->fault)) + if ((ctx->flags & FOLL_DUMP) && (!ctx->vma->vm_ops || + !ctx->vma->vm_ops->fault)) return ERR_PTR(-EFAULT); return NULL; } -static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address, - pte_t *pte, unsigned int flags) +static int follow_pfn_pte(struct follow_page_context *ctx, pte_t *pte) { /* No page to get reference */ - if (flags & FOLL_GET) + if (ctx->flags & FOLL_GET) return -EFAULT; - if (flags & FOLL_TOUCH) { + if (ctx->flags & FOLL_TOUCH) { pte_t entry = *pte; - if (flags & FOLL_WRITE) + if (ctx->flags & FOLL_WRITE) entry = pte_mkdirty(entry); entry = pte_mkyoung(entry); if (!pte_same(*pte, entry)) { - set_pte_at(vma->vm_mm, address, pte, entry); - update_mmu_cache(vma, address, pte); + set_pte_at(ctx->vma->vm_mm, ctx->address, pte, entry); + update_mmu_cache(ctx->vma, ctx->address, pte); } } @@ -70,10 +69,9 @@ static inline bool can_follow_write_pte(pte_t pte, unsigned int flags) ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte)); } -static struct page *follow_page_pte(struct vm_area_struct *vma, - unsigned long address, pmd_t *pmd, unsigned int flags) +static struct page *follow_page_pte(struct follow_page_context *ctx, pmd_t *pmd) { - struct mm_struct *mm = vma->vm_mm; + struct mm_struct *mm = ctx->vma->vm_mm; struct dev_pagemap *pgmap = NULL; struct page *page; spinlock_t *ptl; @@ -81,9 +79,9 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, retry: if (unlikely(pmd_bad(*pmd))) - return no_page_table(vma, flags); + return no_page_table(ctx); - ptep = pte_offset_map_lock(mm, pmd, address, &ptl); + ptep = pte_offset_map_lock(mm, pmd, ctx->address, &ptl); pte = *ptep; if (!pte_present(pte)) { swp_entry_t entry; @@ -92,7 +90,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, * even while it is being migrated, so for that case we * need migration_entry_wait(). */ - if (likely(!(flags & FOLL_MIGRATION))) + if (likely(!(ctx->flags & FOLL_MIGRATION))) goto no_page; if (pte_none(pte)) goto no_page; @@ -100,18 +98,18 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, if (!is_migration_entry(entry)) goto no_page; pte_unmap_unlock(ptep, ptl); - migration_entry_wait(mm, pmd, address); + migration_entry_wait(mm, pmd, ctx->address); goto retry; } - if ((flags & FOLL_NUMA) && pte_protnone(pte)) + if ((ctx->flags & FOLL_NUMA) && pte_protnone(pte)) goto no_page; - if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) { + if ((ctx->flags & FOLL_WRITE) && !can_follow_write_pte(pte, ctx->flags)) { pte_unmap_unlock(ptep, ptl); return NULL; } - page = vm_normal_page(vma, address, pte); - if (!page && pte_devmap(pte) && (flags & FOLL_GET)) { + page = vm_normal_page(ctx->vma, ctx->address, pte); + if (!page && pte_devmap(pte) && (ctx->flags & FOLL_GET)) { /* * Only return device mapping pages in the FOLL_GET case since * they are only valid while holding the pgmap reference. @@ -122,7 +120,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, else goto no_page; } else if (unlikely(!page)) { - if (flags & FOLL_DUMP) { + if (ctx->flags & FOLL_DUMP) { /* Avoid special (like zero) pages in core dumps */ page = ERR_PTR(-EFAULT); goto out; @@ -133,13 +131,13 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, } else { int ret; - ret = follow_pfn_pte(vma, address, ptep, flags); + ret = follow_pfn_pte(ctx, ptep); page = ERR_PTR(ret); goto out; } } - if (flags & FOLL_SPLIT && PageTransCompound(page)) { + if (ctx->flags & FOLL_SPLIT && PageTransCompound(page)) { int ret; get_page(page); pte_unmap_unlock(ptep, ptl); @@ -152,7 +150,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, goto retry; } - if (flags & FOLL_GET) { + if (ctx->flags & FOLL_GET) { get_page(page); /* drop the pgmap reference now that we hold the page */ @@ -161,8 +159,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, pgmap = NULL; } } - if (flags & FOLL_TOUCH) { - if ((flags & FOLL_WRITE) && + if (ctx->flags & FOLL_TOUCH) { + if ((ctx->flags & FOLL_WRITE) && !pte_dirty(pte) && !PageDirty(page)) set_page_dirty(page); /* @@ -172,7 +170,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, */ mark_page_accessed(page); } - if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { + if ((ctx->flags & FOLL_MLOCK) && (ctx->vma->vm_flags & VM_LOCKED)) { /* Do not mlock pte-mapped THP */ if (PageTransCompound(page)) goto out; @@ -205,44 +203,42 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, pte_unmap_unlock(ptep, ptl); if (!pte_none(pte)) return NULL; - return no_page_table(vma, flags); + return no_page_table(ctx); } -static struct page *follow_pmd_mask(struct vm_area_struct *vma, - unsigned long address, pud_t *pudp, - unsigned int flags, unsigned int *page_mask) +static struct page *follow_pmd_mask(struct follow_page_context *ctx, pud_t *pudp) { pmd_t *pmd, pmdval; spinlock_t *ptl; struct page *page; - struct mm_struct *mm = vma->vm_mm; + struct mm_struct *mm = ctx->vma->vm_mm; - pmd = pmd_offset(pudp, address); + pmd = pmd_offset(pudp, ctx->address); /* * The READ_ONCE() will stabilize the pmdval in a register or * on the stack so that it will stop changing under the code. */ pmdval = READ_ONCE(*pmd); if (pmd_none(pmdval)) - return no_page_table(vma, flags); - if (pmd_huge(pmdval) && vma->vm_flags & VM_HUGETLB) { - page = follow_huge_pmd(mm, address, pmd, flags); + return no_page_table(ctx); + if (pmd_huge(pmdval) && ctx->vma->vm_flags & VM_HUGETLB) { + page = follow_huge_pmd(mm, ctx->address, pmd, ctx->flags); if (page) return page; - return no_page_table(vma, flags); + return no_page_table(ctx); } if (is_hugepd(__hugepd(pmd_val(pmdval)))) { - page = follow_huge_pd(vma, address, - __hugepd(pmd_val(pmdval)), flags, - PMD_SHIFT); + page = follow_huge_pd(ctx->vma, ctx->address, + __hugepd(pmd_val(pmdval)), + ctx->flags, PGDIR_SHIFT); if (page) return page; - return no_page_table(vma, flags); + return no_page_table(ctx); } retry: if (!pmd_present(pmdval)) { - if (likely(!(flags & FOLL_MIGRATION))) - return no_page_table(vma, flags); + if (likely(!(ctx->flags & FOLL_MIGRATION))) + return no_page_table(ctx); VM_BUG_ON(thp_migration_supported() && !is_pmd_migration_entry(pmdval)); if (is_pmd_migration_entry(pmdval)) @@ -253,46 +249,46 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, * mmap_sem is held in read mode */ if (pmd_none(pmdval)) - return no_page_table(vma, flags); + return no_page_table(ctx); goto retry; } if (pmd_devmap(pmdval)) { ptl = pmd_lock(mm, pmd); - page = follow_devmap_pmd(vma, address, pmd, flags); + page = follow_devmap_pmd(ctx, pmd); spin_unlock(ptl); if (page) return page; } if (likely(!pmd_trans_huge(pmdval))) - return follow_page_pte(vma, address, pmd, flags); + return follow_page_pte(ctx, pmd); - if ((flags & FOLL_NUMA) && pmd_protnone(pmdval)) - return no_page_table(vma, flags); + if ((ctx->flags & FOLL_NUMA) && pmd_protnone(pmdval)) + return no_page_table(ctx); retry_locked: ptl = pmd_lock(mm, pmd); if (unlikely(pmd_none(*pmd))) { spin_unlock(ptl); - return no_page_table(vma, flags); + return no_page_table(ctx); } if (unlikely(!pmd_present(*pmd))) { spin_unlock(ptl); - if (likely(!(flags & FOLL_MIGRATION))) - return no_page_table(vma, flags); + if (likely(!(ctx->flags & FOLL_MIGRATION))) + return no_page_table(ctx); pmd_migration_entry_wait(mm, pmd); goto retry_locked; } if (unlikely(!pmd_trans_huge(*pmd))) { spin_unlock(ptl); - return follow_page_pte(vma, address, pmd, flags); + return follow_page_pte(ctx, pmd); } - if (flags & FOLL_SPLIT) { + if (ctx->flags & FOLL_SPLIT) { int ret; page = pmd_page(*pmd); if (is_huge_zero_page(page)) { spin_unlock(ptl); ret = 0; - split_huge_pmd(vma, pmd, address); + split_huge_pmd(ctx->vma, pmd, ctx->address); if (pmd_trans_unstable(pmd)) ret = -EBUSY; } else { @@ -303,82 +299,76 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, unlock_page(page); put_page(page); if (pmd_none(*pmd)) - return no_page_table(vma, flags); + return no_page_table(ctx); } return ret ? ERR_PTR(ret) : - follow_page_pte(vma, address, pmd, flags); + follow_page_pte(ctx, pmd); } - page = follow_trans_huge_pmd(vma, address, pmd, flags); + page = follow_trans_huge_pmd(ctx->vma, ctx->address, pmd, ctx->flags); spin_unlock(ptl); - *page_mask = HPAGE_PMD_NR - 1; + ctx->page_mask = HPAGE_PMD_NR - 1; return page; } - -static struct page *follow_pud_mask(struct vm_area_struct *vma, - unsigned long address, p4d_t *p4dp, - unsigned int flags, unsigned int *page_mask) +static struct page *follow_pud_mask(struct follow_page_context *ctx, p4d_t *p4dp) { pud_t *pud; spinlock_t *ptl; struct page *page; - struct mm_struct *mm = vma->vm_mm; + struct mm_struct *mm = ctx->vma->vm_mm; - pud = pud_offset(p4dp, address); + pud = pud_offset(p4dp, ctx->address); if (pud_none(*pud)) - return no_page_table(vma, flags); - if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) { - page = follow_huge_pud(mm, address, pud, flags); + return no_page_table(ctx); + if (pud_huge(*pud) && ctx->vma->vm_flags & VM_HUGETLB) { + page = follow_huge_pud(mm, ctx->address, pud, ctx->flags); if (page) return page; - return no_page_table(vma, flags); + return no_page_table(ctx); } if (is_hugepd(__hugepd(pud_val(*pud)))) { - page = follow_huge_pd(vma, address, - __hugepd(pud_val(*pud)), flags, - PUD_SHIFT); + page = follow_huge_pd(ctx->vma, ctx->address, + __hugepd(pud_val(*pud)), + ctx->flags, PUD_SHIFT); if (page) return page; - return no_page_table(vma, flags); + return no_page_table(ctx); } if (pud_devmap(*pud)) { ptl = pud_lock(mm, pud); - page = follow_devmap_pud(vma, address, pud, flags); + page = follow_devmap_pud(ctx, pud); spin_unlock(ptl); if (page) return page; } if (unlikely(pud_bad(*pud))) - return no_page_table(vma, flags); + return no_page_table(ctx); - return follow_pmd_mask(vma, address, pud, flags, page_mask); + return follow_pmd_mask(ctx, pud); } - -static struct page *follow_p4d_mask(struct vm_area_struct *vma, - unsigned long address, pgd_t *pgdp, - unsigned int flags, unsigned int *page_mask) +static struct page *follow_p4d_mask(struct follow_page_context *ctx, pgd_t *pgdp) { p4d_t *p4d; struct page *page; - p4d = p4d_offset(pgdp, address); + p4d = p4d_offset(pgdp, ctx->address); if (p4d_none(*p4d)) - return no_page_table(vma, flags); + return no_page_table(ctx); BUILD_BUG_ON(p4d_huge(*p4d)); if (unlikely(p4d_bad(*p4d))) - return no_page_table(vma, flags); + return no_page_table(ctx); if (is_hugepd(__hugepd(p4d_val(*p4d)))) { - page = follow_huge_pd(vma, address, - __hugepd(p4d_val(*p4d)), flags, - P4D_SHIFT); + page = follow_huge_pd(ctx->vma, ctx->address, + __hugepd(p4d_val(*p4d)), + ctx->flags, P4D_SHIFT); if (page) return page; - return no_page_table(vma, flags); + return no_page_table(ctx); } - return follow_pud_mask(vma, address, p4d, flags, page_mask); + return follow_pud_mask(ctx, p4d); } /** @@ -394,44 +384,40 @@ static struct page *follow_p4d_mask(struct vm_area_struct *vma, * an error pointer if there is a mapping to something not represented * by a page descriptor (see also vm_normal_page()). */ -struct page *follow_page_mask(struct vm_area_struct *vma, - unsigned long address, unsigned int flags, - unsigned int *page_mask) +struct page *follow_page_mask(struct follow_page_context *ctx) { pgd_t *pgd; struct page *page; - struct mm_struct *mm = vma->vm_mm; - - *page_mask = 0; + struct mm_struct *mm = ctx->vma->vm_mm; /* make this handle hugepd */ - page = follow_huge_addr(mm, address, flags & FOLL_WRITE); + page = follow_huge_addr(mm, ctx->address, ctx->flags & FOLL_WRITE); if (!IS_ERR(page)) { - BUG_ON(flags & FOLL_GET); + BUG_ON(ctx->flags & FOLL_GET); return page; } - pgd = pgd_offset(mm, address); + pgd = pgd_offset(mm, ctx->address); if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd))) - return no_page_table(vma, flags); + return no_page_table(ctx); if (pgd_huge(*pgd)) { - page = follow_huge_pgd(mm, address, pgd, flags); + page = follow_huge_pgd(mm, ctx->address, pgd, ctx->flags); if (page) return page; - return no_page_table(vma, flags); + return no_page_table(ctx); } if (is_hugepd(__hugepd(pgd_val(*pgd)))) { - page = follow_huge_pd(vma, address, - __hugepd(pgd_val(*pgd)), flags, - PGDIR_SHIFT); + page = follow_huge_pd(ctx->vma, ctx->address, + __hugepd(pgd_val(*pgd)), + ctx->flags, PGDIR_SHIFT); if (page) return page; - return no_page_table(vma, flags); + return no_page_table(ctx); } - return follow_p4d_mask(vma, address, pgd, flags, page_mask); + return follow_p4d_mask(ctx, pgd); } static int get_gate_page(struct mm_struct *mm, unsigned long address, @@ -493,31 +479,31 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address, * *@flags does not include FOLL_NOWAIT, the mmap_sem may be released. * If it is, *@nonblocking will be set to 0 and -EBUSY returned. */ -static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, - unsigned long address, unsigned int *flags, int *nonblocking) +static int faultin_page(struct task_struct *tsk, struct follow_page_context *ctx, + int *nonblocking) { unsigned int fault_flags = 0; vm_fault_t ret; /* mlock all present pages, but do not fault in new pages */ - if ((*flags & (FOLL_POPULATE | FOLL_MLOCK)) == FOLL_MLOCK) + if ((ctx->flags & (FOLL_POPULATE | FOLL_MLOCK)) == FOLL_MLOCK) return -ENOENT; - if (*flags & FOLL_WRITE) + if (ctx->flags & FOLL_WRITE) fault_flags |= FAULT_FLAG_WRITE; - if (*flags & FOLL_REMOTE) + if (ctx->flags & FOLL_REMOTE) fault_flags |= FAULT_FLAG_REMOTE; if (nonblocking) fault_flags |= FAULT_FLAG_ALLOW_RETRY; - if (*flags & FOLL_NOWAIT) + if (ctx->flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; - if (*flags & FOLL_TRIED) { + if (ctx->flags & FOLL_TRIED) { VM_WARN_ON_ONCE(fault_flags & FAULT_FLAG_ALLOW_RETRY); fault_flags |= FAULT_FLAG_TRIED; } - ret = handle_mm_fault(vma, address, fault_flags); + ret = handle_mm_fault(ctx->vma, ctx->address, fault_flags); if (ret & VM_FAULT_ERROR) { - int err = vm_fault_to_errno(ret, *flags); + int err = vm_fault_to_errno(ret, ctx->flags); if (err) return err; @@ -546,8 +532,8 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, * which a read fault here might prevent (a readonly page might get * reCOWed by userspace write). */ - if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE)) - *flags |= FOLL_COW; + if ((ret & VM_FAULT_WRITE) && !(ctx->vma->vm_flags & VM_WRITE)) + ctx->flags |= FOLL_COW; return 0; } @@ -660,8 +646,8 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, struct vm_area_struct **vmas, int *nonblocking) { long i = 0; - unsigned int page_mask; struct vm_area_struct *vma = NULL; + struct follow_page_context ctx = {}; if (!nr_pages) return 0; @@ -676,9 +662,9 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, if (!(gup_flags & FOLL_FORCE)) gup_flags |= FOLL_NUMA; + ctx.flags = gup_flags; do { struct page *page; - unsigned int foll_flags = gup_flags; unsigned int page_increm; /* first iteration or cross vma bound */ @@ -691,7 +677,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, pages ? &pages[i] : NULL); if (ret) return i ? : ret; - page_mask = 0; + ctx.page_mask = 0; goto next_page; } @@ -704,6 +690,8 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, continue; } } + ctx.vma = vma; + ctx.address = start; retry: /* * If we have a pending SIGKILL, don't keep faulting pages and @@ -712,11 +700,11 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, if (unlikely(fatal_signal_pending(current))) return i ? i : -ERESTARTSYS; cond_resched(); - page = follow_page_mask(vma, start, foll_flags, &page_mask); + + page = follow_page_mask(&ctx); if (!page) { int ret; - ret = faultin_page(tsk, vma, start, &foll_flags, - nonblocking); + ret = faultin_page(tsk, &ctx, nonblocking); switch (ret) { case 0: goto retry; @@ -743,14 +731,14 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, pages[i] = page; flush_anon_page(vma, page, start); flush_dcache_page(page); - page_mask = 0; + ctx.page_mask = 0; } next_page: if (vmas) { vmas[i] = vma; - page_mask = 0; + ctx.page_mask = 0; } - page_increm = 1 + (~(start >> PAGE_SHIFT) & page_mask); + page_increm = 1 + (~(start >> PAGE_SHIFT) & ctx.page_mask); if (page_increm > nr_pages) page_increm = nr_pages; i += page_increm; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 533f9b00147d..abd36e6afe2c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -851,11 +851,10 @@ static void touch_pmd(struct vm_area_struct *vma, unsigned long addr, update_mmu_cache_pmd(vma, addr, pmd); } -struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, int flags) +struct page *follow_devmap_pmd(struct follow_page_context *ctx, pmd_t *pmd) { unsigned long pfn = pmd_pfn(*pmd); - struct mm_struct *mm = vma->vm_mm; + struct mm_struct *mm = ctx->vma->vm_mm; struct dev_pagemap *pgmap; struct page *page; @@ -865,9 +864,9 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, * When we COW a devmap PMD entry, we split it into PTEs, so we should * not be in this function with `flags & FOLL_COW` set. */ - WARN_ONCE(flags & FOLL_COW, "mm: In follow_devmap_pmd with FOLL_COW set"); + WARN_ONCE(ctx->flags & FOLL_COW, "mm: In follow_devmap_pmd with FOLL_COW set"); - if (flags & FOLL_WRITE && !pmd_write(*pmd)) + if (ctx->flags & FOLL_WRITE && !pmd_write(*pmd)) return NULL; if (pmd_present(*pmd) && pmd_devmap(*pmd)) @@ -875,17 +874,17 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, else return NULL; - if (flags & FOLL_TOUCH) - touch_pmd(vma, addr, pmd, flags); + if (ctx->flags & FOLL_TOUCH) + touch_pmd(ctx->vma, ctx->address, pmd, ctx->flags); /* * device mapped pages can only be returned if the * caller will manage the page reference count. */ - if (!(flags & FOLL_GET)) + if (!(ctx->flags & FOLL_GET)) return ERR_PTR(-EEXIST); - pfn += (addr & ~PMD_MASK) >> PAGE_SHIFT; + pfn += (ctx->address & ~PMD_MASK) >> PAGE_SHIFT; pgmap = get_dev_pagemap(pfn, NULL); if (!pgmap) return ERR_PTR(-EFAULT); @@ -999,17 +998,16 @@ static void touch_pud(struct vm_area_struct *vma, unsigned long addr, update_mmu_cache_pud(vma, addr, pud); } -struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud, int flags) +struct page *follow_devmap_pud(struct follow_page_context *ctx, pud_t *pud) { unsigned long pfn = pud_pfn(*pud); - struct mm_struct *mm = vma->vm_mm; + struct mm_struct *mm = ctx->vma->vm_mm; struct dev_pagemap *pgmap; struct page *page; assert_spin_locked(pud_lockptr(mm, pud)); - if (flags & FOLL_WRITE && !pud_write(*pud)) + if (ctx->flags & FOLL_WRITE && !pud_write(*pud)) return NULL; if (pud_present(*pud) && pud_devmap(*pud)) @@ -1017,17 +1015,17 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, else return NULL; - if (flags & FOLL_TOUCH) - touch_pud(vma, addr, pud, flags); + if (ctx->flags & FOLL_TOUCH) + touch_pud(ctx->vma, ctx->address, pud, ctx->flags); /* * device mapped pages can only be returned if the * caller will manage the page reference count. */ - if (!(flags & FOLL_GET)) + if (!(ctx->flags & FOLL_GET)) return ERR_PTR(-EEXIST); - pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT; + pfn += (ctx->address & ~PUD_MASK) >> PAGE_SHIFT; pgmap = get_dev_pagemap(pfn, NULL); if (!pgmap) return ERR_PTR(-EFAULT); diff --git a/mm/nommu.c b/mm/nommu.c index e4aac33216ae..59db9f5dbb4e 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1709,11 +1709,9 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, return ret; } -struct page *follow_page_mask(struct vm_area_struct *vma, - unsigned long address, unsigned int flags, - unsigned int *page_mask) +struct page *follow_page_mask(struct follow_page_context *ctx) { - *page_mask = 0; + ctx->page_mask = 0; return NULL; } -- 2.14.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 6/7] mm/gup: Combine parameters into struct 2018-09-19 21:02 ` [PATCH 6/7] mm/gup: Combine parameters into struct Keith Busch @ 2018-09-19 22:40 ` Keith Busch 2018-09-21 2:34 ` kbuild test robot 2018-09-21 2:58 ` kbuild test robot 2 siblings, 0 replies; 14+ messages in thread From: Keith Busch @ 2018-09-19 22:40 UTC (permalink / raw) To: linux-mm, linux-kernel; +Cc: Kirill Shutemov, Dave Hansen, Dan Williams On Wed, Sep 19, 2018 at 03:02:49PM -0600, Keith Busch wrote: > if (is_hugepd(__hugepd(pmd_val(pmdval)))) { > - page = follow_huge_pd(vma, address, > - __hugepd(pmd_val(pmdval)), flags, > - PMD_SHIFT); > + page = follow_huge_pd(ctx->vma, ctx->address, > + __hugepd(pmd_val(pmdval)), > + ctx->flags, PGDIR_SHIFT); Shoot, that should have been PMD_SHIFT. I'll let this current set sit a little longer before considering v2. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 6/7] mm/gup: Combine parameters into struct 2018-09-19 21:02 ` [PATCH 6/7] mm/gup: Combine parameters into struct Keith Busch 2018-09-19 22:40 ` Keith Busch @ 2018-09-21 2:34 ` kbuild test robot 2018-09-21 2:58 ` kbuild test robot 2 siblings, 0 replies; 14+ messages in thread From: kbuild test robot @ 2018-09-21 2:34 UTC (permalink / raw) To: Keith Busch Cc: kbuild-all, linux-mm, linux-kernel, Kirill Shutemov, Dave Hansen, Dan Williams [-- Attachment #1: Type: text/plain, Size: 3039 bytes --] Hi Keith, I love your patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v4.19-rc4 next-20180920] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Keith-Busch/mm-faster-get-user-pages/20180920-184931 config: sh-rsk7201_defconfig (attached as .config) compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=sh All warnings (new ones prefixed by >>): In file included from include/linux/mm.h:506:0, from arch/sh/kernel/asm-offsets.c:14: >> include/linux/huge_mm.h:344:53: warning: 'struct gup_context' declared inside parameter list will not be visible outside of this definition or declaration static inline struct page *follow_devmap_pmd(struct gup_context *ctx, pmd_t *pmd) ^~~~~~~~~~~ include/linux/huge_mm.h:349:53: warning: 'struct gup_context' declared inside parameter list will not be visible outside of this definition or declaration static inline struct page *follow_devmap_pud(struct gup_context *ctx, pud_t *pud) ^~~~~~~~~~~ -- In file included from include/linux/mm.h:506:0, from arch/sh/kernel/asm-offsets.c:14: >> include/linux/huge_mm.h:344:53: warning: 'struct gup_context' declared inside parameter list will not be visible outside of this definition or declaration static inline struct page *follow_devmap_pmd(struct gup_context *ctx, pmd_t *pmd) ^~~~~~~~~~~ include/linux/huge_mm.h:349:53: warning: 'struct gup_context' declared inside parameter list will not be visible outside of this definition or declaration static inline struct page *follow_devmap_pud(struct gup_context *ctx, pud_t *pud) ^~~~~~~~~~~ <stdin>:1317:2: warning: #warning syscall pkey_mprotect not implemented [-Wcpp] <stdin>:1320:2: warning: #warning syscall pkey_alloc not implemented [-Wcpp] <stdin>:1323:2: warning: #warning syscall pkey_free not implemented [-Wcpp] <stdin>:1326:2: warning: #warning syscall statx not implemented [-Wcpp] <stdin>:1332:2: warning: #warning syscall io_pgetevents not implemented [-Wcpp] <stdin>:1335:2: warning: #warning syscall rseq not implemented [-Wcpp] vim +344 include/linux/huge_mm.h 343 > 344 static inline struct page *follow_devmap_pmd(struct gup_context *ctx, pmd_t *pmd) 345 { 346 return NULL; 347 } 348 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 7824 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 6/7] mm/gup: Combine parameters into struct 2018-09-19 21:02 ` [PATCH 6/7] mm/gup: Combine parameters into struct Keith Busch 2018-09-19 22:40 ` Keith Busch 2018-09-21 2:34 ` kbuild test robot @ 2018-09-21 2:58 ` kbuild test robot 2 siblings, 0 replies; 14+ messages in thread From: kbuild test robot @ 2018-09-21 2:58 UTC (permalink / raw) To: Keith Busch Cc: kbuild-all, linux-mm, linux-kernel, Kirill Shutemov, Dave Hansen, Dan Williams [-- Attachment #1: Type: text/plain, Size: 9266 bytes --] Hi Keith, I love your patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.19-rc4 next-20180919] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Keith-Busch/mm-faster-get-user-pages/20180920-184931 config: arm-oxnas_v6_defconfig (attached as .config) compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=arm All errors (new ones prefixed by >>): In file included from include/linux/mm.h:506:0, from mm/gup.c:6: include/linux/huge_mm.h:344:53: warning: 'struct gup_context' declared inside parameter list will not be visible outside of this definition or declaration static inline struct page *follow_devmap_pmd(struct gup_context *ctx, pmd_t *pmd) ^~~~~~~~~~~ include/linux/huge_mm.h:349:53: warning: 'struct gup_context' declared inside parameter list will not be visible outside of this definition or declaration static inline struct page *follow_devmap_pud(struct gup_context *ctx, pud_t *pud) ^~~~~~~~~~~ mm/gup.c: In function 'follow_pmd_mask': >> mm/gup.c:233:34: error: macro "follow_huge_pd" passed 5 arguments, but takes just 3 ctx->flags, PGDIR_SHIFT); ^ >> mm/gup.c:231:10: error: 'follow_huge_pd' undeclared (first use in this function); did you mean 'follow_page_pte'? page = follow_huge_pd(ctx->vma, ctx->address, ^~~~~~~~~~~~~~ follow_page_pte mm/gup.c:231:10: note: each undeclared identifier is reported only once for each function it appears in >> mm/gup.c:257:28: error: passing argument 1 of 'follow_devmap_pmd' from incompatible pointer type [-Werror=incompatible-pointer-types] page = follow_devmap_pmd(ctx, pmd); ^~~ In file included from include/linux/mm.h:506:0, from mm/gup.c:6: include/linux/huge_mm.h:344:28: note: expected 'struct gup_context *' but argument is of type 'struct follow_page_context *' static inline struct page *follow_devmap_pmd(struct gup_context *ctx, pmd_t *pmd) ^~~~~~~~~~~~~~~~~ mm/gup.c: In function 'follow_pud_mask': mm/gup.c:333:32: error: macro "follow_huge_pd" passed 5 arguments, but takes just 3 ctx->flags, PUD_SHIFT); ^ mm/gup.c:331:10: error: 'follow_huge_pd' undeclared (first use in this function); did you mean 'follow_page_pte'? page = follow_huge_pd(ctx->vma, ctx->address, ^~~~~~~~~~~~~~ follow_page_pte >> mm/gup.c:340:28: error: passing argument 1 of 'follow_devmap_pud' from incompatible pointer type [-Werror=incompatible-pointer-types] page = follow_devmap_pud(ctx, pud); ^~~ In file included from include/linux/mm.h:506:0, from mm/gup.c:6: include/linux/huge_mm.h:349:28: note: expected 'struct gup_context *' but argument is of type 'struct follow_page_context *' static inline struct page *follow_devmap_pud(struct gup_context *ctx, pud_t *pud) ^~~~~~~~~~~~~~~~~ mm/gup.c: In function 'follow_p4d_mask': mm/gup.c:366:32: error: macro "follow_huge_pd" passed 5 arguments, but takes just 3 ctx->flags, P4D_SHIFT); ^ mm/gup.c:364:10: error: 'follow_huge_pd' undeclared (first use in this function); did you mean 'follow_page_pte'? page = follow_huge_pd(ctx->vma, ctx->address, ^~~~~~~~~~~~~~ follow_page_pte mm/gup.c: In function 'follow_page_mask': mm/gup.c:414:34: error: macro "follow_huge_pd" passed 5 arguments, but takes just 3 ctx->flags, PGDIR_SHIFT); ^ mm/gup.c:412:10: error: 'follow_huge_pd' undeclared (first use in this function); did you mean 'follow_page_pte'? page = follow_huge_pd(ctx->vma, ctx->address, ^~~~~~~~~~~~~~ follow_page_pte cc1: some warnings being treated as errors vim +/follow_huge_pd +233 mm/gup.c 208 209 static struct page *follow_pmd_mask(struct follow_page_context *ctx, pud_t *pudp) 210 { 211 pmd_t *pmd, pmdval; 212 spinlock_t *ptl; 213 struct page *page; 214 struct mm_struct *mm = ctx->vma->vm_mm; 215 216 pmd = pmd_offset(pudp, ctx->address); 217 /* 218 * The READ_ONCE() will stabilize the pmdval in a register or 219 * on the stack so that it will stop changing under the code. 220 */ 221 pmdval = READ_ONCE(*pmd); 222 if (pmd_none(pmdval)) 223 return no_page_table(ctx); 224 if (pmd_huge(pmdval) && ctx->vma->vm_flags & VM_HUGETLB) { 225 page = follow_huge_pmd(mm, ctx->address, pmd, ctx->flags); 226 if (page) 227 return page; 228 return no_page_table(ctx); 229 } 230 if (is_hugepd(__hugepd(pmd_val(pmdval)))) { > 231 page = follow_huge_pd(ctx->vma, ctx->address, 232 __hugepd(pmd_val(pmdval)), > 233 ctx->flags, PGDIR_SHIFT); 234 if (page) 235 return page; 236 return no_page_table(ctx); 237 } 238 retry: 239 if (!pmd_present(pmdval)) { 240 if (likely(!(ctx->flags & FOLL_MIGRATION))) 241 return no_page_table(ctx); 242 VM_BUG_ON(thp_migration_supported() && 243 !is_pmd_migration_entry(pmdval)); 244 if (is_pmd_migration_entry(pmdval)) 245 pmd_migration_entry_wait(mm, pmd); 246 pmdval = READ_ONCE(*pmd); 247 /* 248 * MADV_DONTNEED may convert the pmd to null because 249 * mmap_sem is held in read mode 250 */ 251 if (pmd_none(pmdval)) 252 return no_page_table(ctx); 253 goto retry; 254 } 255 if (pmd_devmap(pmdval)) { 256 ptl = pmd_lock(mm, pmd); > 257 page = follow_devmap_pmd(ctx, pmd); 258 spin_unlock(ptl); 259 if (page) 260 return page; 261 } 262 if (likely(!pmd_trans_huge(pmdval))) 263 return follow_page_pte(ctx, pmd); 264 265 if ((ctx->flags & FOLL_NUMA) && pmd_protnone(pmdval)) 266 return no_page_table(ctx); 267 268 retry_locked: 269 ptl = pmd_lock(mm, pmd); 270 if (unlikely(pmd_none(*pmd))) { 271 spin_unlock(ptl); 272 return no_page_table(ctx); 273 } 274 if (unlikely(!pmd_present(*pmd))) { 275 spin_unlock(ptl); 276 if (likely(!(ctx->flags & FOLL_MIGRATION))) 277 return no_page_table(ctx); 278 pmd_migration_entry_wait(mm, pmd); 279 goto retry_locked; 280 } 281 if (unlikely(!pmd_trans_huge(*pmd))) { 282 spin_unlock(ptl); 283 return follow_page_pte(ctx, pmd); 284 } 285 if (ctx->flags & FOLL_SPLIT) { 286 int ret; 287 page = pmd_page(*pmd); 288 if (is_huge_zero_page(page)) { 289 spin_unlock(ptl); 290 ret = 0; 291 split_huge_pmd(ctx->vma, pmd, ctx->address); 292 if (pmd_trans_unstable(pmd)) 293 ret = -EBUSY; 294 } else { 295 get_page(page); 296 spin_unlock(ptl); 297 lock_page(page); 298 ret = split_huge_page(page); 299 unlock_page(page); 300 put_page(page); 301 if (pmd_none(*pmd)) 302 return no_page_table(ctx); 303 } 304 305 return ret ? ERR_PTR(ret) : 306 follow_page_pte(ctx, pmd); 307 } 308 page = follow_trans_huge_pmd(ctx->vma, ctx->address, pmd, ctx->flags); 309 spin_unlock(ptl); 310 ctx->page_mask = HPAGE_PMD_NR - 1; 311 return page; 312 } 313 314 static struct page *follow_pud_mask(struct follow_page_context *ctx, p4d_t *p4dp) 315 { 316 pud_t *pud; 317 spinlock_t *ptl; 318 struct page *page; 319 struct mm_struct *mm = ctx->vma->vm_mm; 320 321 pud = pud_offset(p4dp, ctx->address); 322 if (pud_none(*pud)) 323 return no_page_table(ctx); 324 if (pud_huge(*pud) && ctx->vma->vm_flags & VM_HUGETLB) { 325 page = follow_huge_pud(mm, ctx->address, pud, ctx->flags); 326 if (page) 327 return page; 328 return no_page_table(ctx); 329 } 330 if (is_hugepd(__hugepd(pud_val(*pud)))) { 331 page = follow_huge_pd(ctx->vma, ctx->address, 332 __hugepd(pud_val(*pud)), > 333 ctx->flags, PUD_SHIFT); 334 if (page) 335 return page; 336 return no_page_table(ctx); 337 } 338 if (pud_devmap(*pud)) { 339 ptl = pud_lock(mm, pud); > 340 page = follow_devmap_pud(ctx, pud); 341 spin_unlock(ptl); 342 if (page) 343 return page; 344 } 345 if (unlikely(pud_bad(*pud))) 346 return no_page_table(ctx); 347 348 return follow_pmd_mask(ctx, pud); 349 } 350 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 15051 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 7/7] mm/gup: Cache dev_pagemap while pinning pages 2018-09-19 21:02 [PATCH 0/7] mm: faster get user pages Keith Busch ` (5 preceding siblings ...) 2018-09-19 21:02 ` [PATCH 6/7] mm/gup: Combine parameters into struct Keith Busch @ 2018-09-19 21:02 ` Keith Busch 2018-09-19 21:15 ` [PATCH 0/7] mm: faster get user pages Dave Hansen 7 siblings, 0 replies; 14+ messages in thread From: Keith Busch @ 2018-09-19 21:02 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: Kirill Shutemov, Dave Hansen, Dan Williams, Keith Busch This avoids a repeated costly radix tree lookup when dev_pagemap is used. Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> --- include/linux/mm.h | 8 +++++++- mm/gup.c | 41 ++++++++++++++++++++++++----------------- mm/huge_memory.c | 35 +++++++++++++++-------------------- 3 files changed, 46 insertions(+), 38 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index f1fd241c9071..d688e18a19c4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -380,6 +380,7 @@ struct vm_fault { struct follow_page_context { struct vm_area_struct *vma; + struct dev_pagemap *pgmap; unsigned long address; unsigned int page_mask; unsigned int flags; @@ -2546,14 +2547,19 @@ struct page *follow_page_mask(struct follow_page_context *ctx); static inline struct page *follow_page(struct vm_area_struct *vma, unsigned long address, unsigned int foll_flags) { + struct page *page; struct follow_page_context ctx = { .vma = vma, + .pgmap = NULL, .address = address, .page_mask = 0, .flags = foll_flags, }; - return follow_page_mask(&ctx); + page = follow_page_mask(&ctx); + if (ctx.pgmap) + put_dev_pagemap(ctx.pgmap); + return page; } #define FOLL_WRITE 0x01 /* check pte is writable */ diff --git a/mm/gup.c b/mm/gup.c index a61a6874c80c..1565b79d883b 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -72,7 +72,6 @@ static inline bool can_follow_write_pte(pte_t pte, unsigned int flags) static struct page *follow_page_pte(struct follow_page_context *ctx, pmd_t *pmd) { struct mm_struct *mm = ctx->vma->vm_mm; - struct dev_pagemap *pgmap = NULL; struct page *page; spinlock_t *ptl; pte_t *ptep, pte; @@ -114,8 +113,8 @@ static struct page *follow_page_pte(struct follow_page_context *ctx, pmd_t *pmd) * Only return device mapping pages in the FOLL_GET case since * they are only valid while holding the pgmap reference. */ - pgmap = get_dev_pagemap(pte_pfn(pte), NULL); - if (pgmap) + ctx->pgmap = get_dev_pagemap(pte_pfn(pte), ctx->pgmap); + if (ctx->pgmap) page = pte_page(pte); else goto no_page; @@ -154,9 +153,9 @@ static struct page *follow_page_pte(struct follow_page_context *ctx, pmd_t *pmd) get_page(page); /* drop the pgmap reference now that we hold the page */ - if (pgmap) { - put_dev_pagemap(pgmap); - pgmap = NULL; + if (ctx->pgmap) { + put_dev_pagemap(ctx->pgmap); + ctx->pgmap = NULL; } } if (ctx->flags & FOLL_TOUCH) { @@ -645,7 +644,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas, int *nonblocking) { - long i = 0; + long ret = 0, i = 0; struct vm_area_struct *vma = NULL; struct follow_page_context ctx = {}; @@ -681,8 +680,10 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, goto next_page; } - if (!vma || check_vma_flags(vma, gup_flags)) - return i ? : -EFAULT; + if (!vma || check_vma_flags(vma, gup_flags)) { + ret = -EFAULT; + goto out; + } if (is_vm_hugetlb_page(vma)) { i = follow_hugetlb_page(mm, vma, pages, vmas, &start, &nr_pages, i, @@ -697,23 +698,25 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, * If we have a pending SIGKILL, don't keep faulting pages and * potentially allocating memory. */ - if (unlikely(fatal_signal_pending(current))) - return i ? i : -ERESTARTSYS; + if (unlikely(fatal_signal_pending(current))) { + ret = -ERESTARTSYS; + goto out; + } cond_resched(); page = follow_page_mask(&ctx); if (!page) { - int ret; ret = faultin_page(tsk, &ctx, nonblocking); switch (ret) { case 0: goto retry; + case -EBUSY: + ret = 0; + /* FALLTHRU */ case -EFAULT: case -ENOMEM: case -EHWPOISON: - return i ? i : ret; - case -EBUSY: - return i; + goto out; case -ENOENT: goto next_page; } @@ -725,7 +728,8 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, */ goto next_page; } else if (IS_ERR(page)) { - return i ? i : PTR_ERR(page); + ret = PTR_ERR(page); + goto out; } if (pages) { pages[i] = page; @@ -745,7 +749,10 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, start += page_increm * PAGE_SIZE; nr_pages -= page_increm; } while (nr_pages); - return i; +out: + if (ctx.pgmap) + put_dev_pagemap(ctx.pgmap); + return i ? i : ret; } static bool vma_permits_fault(struct vm_area_struct *vma, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index abd36e6afe2c..6787011385ce 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -851,12 +851,23 @@ static void touch_pmd(struct vm_area_struct *vma, unsigned long addr, update_mmu_cache_pmd(vma, addr, pmd); } +static struct page *pagemap_page(struct follow_page_context *ctx, + unsigned long pfn) +{ + struct page *page; + + ctx->pgmap = get_dev_pagemap(pfn, ctx->pgmap); + if (!ctx->pgmap) + return ERR_PTR(-EFAULT); + page = pfn_to_page(pfn); + get_page(page); + return page; +} + struct page *follow_devmap_pmd(struct follow_page_context *ctx, pmd_t *pmd) { unsigned long pfn = pmd_pfn(*pmd); struct mm_struct *mm = ctx->vma->vm_mm; - struct dev_pagemap *pgmap; - struct page *page; assert_spin_locked(pmd_lockptr(mm, pmd)); @@ -885,14 +896,7 @@ struct page *follow_devmap_pmd(struct follow_page_context *ctx, pmd_t *pmd) return ERR_PTR(-EEXIST); pfn += (ctx->address & ~PMD_MASK) >> PAGE_SHIFT; - pgmap = get_dev_pagemap(pfn, NULL); - if (!pgmap) - return ERR_PTR(-EFAULT); - page = pfn_to_page(pfn); - get_page(page); - put_dev_pagemap(pgmap); - - return page; + return pagemap_page(ctx, pfn); } int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, @@ -1002,8 +1006,6 @@ struct page *follow_devmap_pud(struct follow_page_context *ctx, pud_t *pud) { unsigned long pfn = pud_pfn(*pud); struct mm_struct *mm = ctx->vma->vm_mm; - struct dev_pagemap *pgmap; - struct page *page; assert_spin_locked(pud_lockptr(mm, pud)); @@ -1026,14 +1028,7 @@ struct page *follow_devmap_pud(struct follow_page_context *ctx, pud_t *pud) return ERR_PTR(-EEXIST); pfn += (ctx->address & ~PUD_MASK) >> PAGE_SHIFT; - pgmap = get_dev_pagemap(pfn, NULL); - if (!pgmap) - return ERR_PTR(-EFAULT); - page = pfn_to_page(pfn); - get_page(page); - put_dev_pagemap(pgmap); - - return page; + return pagemap_page(ctx, pfn); } int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, -- 2.14.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 0/7] mm: faster get user pages 2018-09-19 21:02 [PATCH 0/7] mm: faster get user pages Keith Busch ` (6 preceding siblings ...) 2018-09-19 21:02 ` [PATCH 7/7] mm/gup: Cache dev_pagemap while pinning pages Keith Busch @ 2018-09-19 21:15 ` Dave Hansen 2018-09-19 22:26 ` Keith Busch 2018-09-19 22:27 ` Dan Williams 7 siblings, 2 replies; 14+ messages in thread From: Dave Hansen @ 2018-09-19 21:15 UTC (permalink / raw) To: Keith Busch, linux-mm, linux-kernel; +Cc: Kirill Shutemov, Dan Williams On 09/19/2018 02:02 PM, Keith Busch wrote: > Pinning user pages out of nvdimm dax memory is significantly slower > compared to system ram. Analysis points to software overhead incurred > from a radix tree lookup. This patch series fixes that by removing the > relatively costly dev_pagemap lookup that was repeated for each page, > significantly increasing gup time. Could you also remind us why DAX pages are such special snowflakes and *require* radix tree lookups in the first place? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/7] mm: faster get user pages 2018-09-19 21:15 ` [PATCH 0/7] mm: faster get user pages Dave Hansen @ 2018-09-19 22:26 ` Keith Busch 2018-09-19 22:27 ` Dan Williams 1 sibling, 0 replies; 14+ messages in thread From: Keith Busch @ 2018-09-19 22:26 UTC (permalink / raw) To: Dave Hansen; +Cc: linux-mm, linux-kernel, Kirill Shutemov, Dan Williams On Wed, Sep 19, 2018 at 02:15:28PM -0700, Dave Hansen wrote: > On 09/19/2018 02:02 PM, Keith Busch wrote: > > Pinning user pages out of nvdimm dax memory is significantly slower > > compared to system ram. Analysis points to software overhead incurred > > from a radix tree lookup. This patch series fixes that by removing the > > relatively costly dev_pagemap lookup that was repeated for each page, > > significantly increasing gup time. > > Could you also remind us why DAX pages are such special snowflakes and > *require* radix tree lookups in the first place? Yeah, ZONE_DEVICE memory is special. It has struct page mappings, but not for online general use. The dev_pagemap is the metadata to the zone device memory, and that metadata is stashed in a radix tree. We're looking up the dev_pagemap in this path to take a reference so the zone device can't be unmapped. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/7] mm: faster get user pages 2018-09-19 21:15 ` [PATCH 0/7] mm: faster get user pages Dave Hansen 2018-09-19 22:26 ` Keith Busch @ 2018-09-19 22:27 ` Dan Williams 1 sibling, 0 replies; 14+ messages in thread From: Dan Williams @ 2018-09-19 22:27 UTC (permalink / raw) To: Dave Hansen Cc: Keith Busch, Linux MM, Linux Kernel Mailing List, Kirill A. Shutemov On Wed, Sep 19, 2018 at 2:15 PM Dave Hansen <dave.hansen@intel.com> wrote: > > On 09/19/2018 02:02 PM, Keith Busch wrote: > > Pinning user pages out of nvdimm dax memory is significantly slower > > compared to system ram. Analysis points to software overhead incurred > > from a radix tree lookup. This patch series fixes that by removing the > > relatively costly dev_pagemap lookup that was repeated for each page, > > significantly increasing gup time. > > Could you also remind us why DAX pages are such special snowflakes and > *require* radix tree lookups in the first place? They are special because they need to check backing device live-ness when taking new references. We manage a percpu-ref for each device that registers physical memory with devm_memremap_pages(). When that device is disabled we kill the percpu-ref to block new references being taken, and then wait for existing references to drain. This allows for disabling persistent-memory namepace-devices at will relative to new get_user_pages() requests. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-09-21 2:59 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-09-19 21:02 [PATCH 0/7] mm: faster get user pages Keith Busch 2018-09-19 21:02 ` [PATCH 1/7] mm/gup_benchmark: Time put_page Keith Busch 2018-09-19 21:02 ` [PATCH 2/7] mm/gup_benchmark: Add additional pinning methods Keith Busch 2018-09-19 21:02 ` [PATCH 3/7] tools/gup_benchmark: Fix 'write' flag usage Keith Busch 2018-09-19 21:02 ` [PATCH 4/7] tools/gup_benchmark: Allow user specified file Keith Busch 2018-09-19 21:02 ` [PATCH 5/7] tools/gup_benchmark: Add parameter for hugetlb Keith Busch 2018-09-19 21:02 ` [PATCH 6/7] mm/gup: Combine parameters into struct Keith Busch 2018-09-19 22:40 ` Keith Busch 2018-09-21 2:34 ` kbuild test robot 2018-09-21 2:58 ` kbuild test robot 2018-09-19 21:02 ` [PATCH 7/7] mm/gup: Cache dev_pagemap while pinning pages Keith Busch 2018-09-19 21:15 ` [PATCH 0/7] mm: faster get user pages Dave Hansen 2018-09-19 22:26 ` Keith Busch 2018-09-19 22:27 ` Dan Williams
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).