* [PATCH 01/10] mm: pagewalk: consolidate vma->vm_start checks
2014-06-02 21:36 [PATCH 00/10] mm: pagewalk: huge page cleanups and VMA passing Dave Hansen
@ 2014-06-02 21:36 ` Dave Hansen
2014-06-02 21:36 ` [PATCH 02/10] mm: pagewalk: always skip hugetlbfs except when explicitly handled Dave Hansen
` (9 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2014-06-02 21:36 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, kirill.shutemov, Dave Hansen
From: Dave Hansen <dave.hansen@linux.intel.com>
We check vma->vm_start against the address being walked twice.
Consolidate the locations down to a single one.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---
b/mm/pagewalk.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff -puN mm/pagewalk.c~pagewalk-always-skip-hugetlbfs-except-when-explicitly-handled mm/pagewalk.c
--- a/mm/pagewalk.c~pagewalk-always-skip-hugetlbfs-except-when-explicitly-handled 2014-06-02 14:20:18.938791410 -0700
+++ b/mm/pagewalk.c 2014-06-02 14:20:18.941791545 -0700
@@ -192,13 +192,12 @@ int walk_page_range(unsigned long addr,
* - VM_PFNMAP vma's
*/
vma = find_vma(walk->mm, addr);
- if (vma) {
+ if (vma && (vma->vm_start <= addr)) {
/*
* There are no page structures backing a VM_PFNMAP
* range, so do not allow split_huge_page_pmd().
*/
- if ((vma->vm_start <= addr) &&
- (vma->vm_flags & VM_PFNMAP)) {
+ if (vma->vm_flags & VM_PFNMAP) {
next = vma->vm_end;
pgd = pgd_offset(walk->mm, next);
continue;
@@ -209,8 +208,7 @@ int walk_page_range(unsigned long addr,
* architecture and we can't handled it in the same
* manner as non-huge pages.
*/
- if (walk->hugetlb_entry && (vma->vm_start <= addr) &&
- is_vm_hugetlb_page(vma)) {
+ if (walk->hugetlb_entry && is_vm_hugetlb_page(vma)) {
if (vma->vm_end < next)
next = vma->vm_end;
/*
_
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 02/10] mm: pagewalk: always skip hugetlbfs except when explicitly handled
2014-06-02 21:36 [PATCH 00/10] mm: pagewalk: huge page cleanups and VMA passing Dave Hansen
2014-06-02 21:36 ` [PATCH 01/10] mm: pagewalk: consolidate vma->vm_start checks Dave Hansen
@ 2014-06-02 21:36 ` Dave Hansen
2014-06-02 21:36 ` [PATCH 03/10] mm: pagewalk: have generic code keep track of VMA Dave Hansen
` (8 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2014-06-02 21:36 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, kirill.shutemov, Dave Hansen
From: Dave Hansen <dave.hansen@linux.intel.com>
The short story:
The walk_page_range() code is fragile for hugetlbfs VMAs. Each
walker instance must either exclude hugetlbfs from being walked,
or add a ->hugetlb_entry handler. If this is not done, the code
will go off the rails and start clearing huge page table entries.
This patch removes that requirement on the walkers. They can
merrily call walk_page_range() on hugetlbfs areas, and those
areas will simply be skipped inside the page walker code if they
have not set up a handler.
This makes the code more robust, shorter, and makes it more
intuitive to write a page table walker. Yay.
Long story:
I was looking at the page walker code and thought I found a bug.
If the walker hits a hugetlbfs VMA where walk->hugetlb_entry was
not set, it would hit the if(), and the clear out the pgd
thinking it was bad.
This essentially means that *EVERY* page walker has to *KNOW* to
either exclude hugetlbfs VMAs, or set a ->hugetlb_entry handler.
The good news is that all 9 users of walk_page_range() do this
implicitly or explicitly. The bad news is that it took me an
hour to convince myself of this, and future walk_page_range()
instances are vulnerable to making this mistake. I think the
madvise() use was probably just lucky (details below).
Here's the code trimmed down. Note what happens if we have a
is_vm_hugetlb_page(), !walk->hugetlb_entry, and a huge page pgd
entry in 'pgd' (or any of the lower levels).
int walk_page_range(unsigned long addr, unsigned long end, ...
{
...
vma = find_vma(walk->mm, addr);
if (vma) {
if (walk->hugetlb_entry && is_vm_hugetlb_page(vma)) {
walk_hugetlb_range(vma, addr, next, walk);
...
continue;
}
}
if (pgd_none_or_clear_bad(pgd)) {
There are currently 9 users of walk_page_range(). They handle
hugetlbfs pages in 5 ways:
/proc/$pid/smaps:
/proc/$pid/clear_refs:
cgroup precharge:
cgroup move charge:
checks VMA explicitly for hugetblfs and skips, does not set
->hugetlb_entry (this patch removes the now unnecessary
hugetlbfs checks for these)
openrisc dma alloc:
works on kernel memory, so no hugetlbfs, also arch does not
even support hugetlbfs
powerpc subpage protection:
uses arch-specific is_hugepage_only_range() check
/proc/$pid/pagemap:
/proc/$pid/numa_map:
sets ->hugetlb_entry
(these are unaffected by this patch)
MADV_WILLNEED:
does not set ->hugetlb_entry
only called via:
madvise_willneed() {
if (!vma->file)
force_swapin_readahead(...) {
walk_page_range(...)
}
}
That !vma->file check just _happens_ to cover hugetlbfs
vmas since they are always file-backed (or at least have
vma->file set as far as I can tell)
(this case is unaffected by this patch)
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---
b/fs/proc/task_mmu.c | 4 +---
b/mm/memcontrol.c | 4 ----
b/mm/pagewalk.c | 5 ++++-
3 files changed, 5 insertions(+), 8 deletions(-)
diff -puN fs/proc/task_mmu.c~pagewalk-always-skip-hugetlbfs-except-when-explicitly-handled-1 fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c~pagewalk-always-skip-hugetlbfs-except-when-explicitly-handled-1 2014-06-02 14:20:19.210803615 -0700
+++ b/fs/proc/task_mmu.c 2014-06-02 14:20:19.218803974 -0700
@@ -590,7 +590,7 @@ static int show_smap(struct seq_file *m,
memset(&mss, 0, sizeof mss);
mss.vma = vma;
/* mmap_sem is held in m_start */
- if (vma->vm_mm && !is_vm_hugetlb_page(vma))
+ if (vma->vm_mm)
walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk);
show_map_vma(m, vma, is_pid);
@@ -829,8 +829,6 @@ static ssize_t clear_refs_write(struct f
mmu_notifier_invalidate_range_start(mm, 0, -1);
for (vma = mm->mmap; vma; vma = vma->vm_next) {
cp.vma = vma;
- if (is_vm_hugetlb_page(vma))
- continue;
/*
* Writing 1 to /proc/pid/clear_refs affects all pages.
*
diff -puN mm/memcontrol.c~pagewalk-always-skip-hugetlbfs-except-when-explicitly-handled-1 mm/memcontrol.c
--- a/mm/memcontrol.c~pagewalk-always-skip-hugetlbfs-except-when-explicitly-handled-1 2014-06-02 14:20:19.212803706 -0700
+++ b/mm/memcontrol.c 2014-06-02 14:20:19.220804064 -0700
@@ -6821,8 +6821,6 @@ static unsigned long mem_cgroup_count_pr
.mm = mm,
.private = vma,
};
- if (is_vm_hugetlb_page(vma))
- continue;
walk_page_range(vma->vm_start, vma->vm_end,
&mem_cgroup_count_precharge_walk);
}
@@ -7087,8 +7085,6 @@ retry:
.mm = mm,
.private = vma,
};
- if (is_vm_hugetlb_page(vma))
- continue;
ret = walk_page_range(vma->vm_start, vma->vm_end,
&mem_cgroup_move_charge_walk);
if (ret)
diff -puN mm/pagewalk.c~pagewalk-always-skip-hugetlbfs-except-when-explicitly-handled-1 mm/pagewalk.c
--- a/mm/pagewalk.c~pagewalk-always-skip-hugetlbfs-except-when-explicitly-handled-1 2014-06-02 14:20:19.214803794 -0700
+++ b/mm/pagewalk.c 2014-06-02 14:20:19.220804064 -0700
@@ -115,6 +115,9 @@ static int walk_hugetlb_range(struct vm_
pte_t *pte;
int err = 0;
+ if (!walk->hugetlb_entry)
+ return 0;
+
do {
next = hugetlb_entry_end(h, addr, end);
pte = huge_pte_offset(walk->mm, addr & hmask);
@@ -208,7 +211,7 @@ int walk_page_range(unsigned long addr,
* architecture and we can't handled it in the same
* manner as non-huge pages.
*/
- if (walk->hugetlb_entry && is_vm_hugetlb_page(vma)) {
+ if (is_vm_hugetlb_page(vma)) {
if (vma->vm_end < next)
next = vma->vm_end;
/*
_
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 03/10] mm: pagewalk: have generic code keep track of VMA
2014-06-02 21:36 [PATCH 00/10] mm: pagewalk: huge page cleanups and VMA passing Dave Hansen
2014-06-02 21:36 ` [PATCH 01/10] mm: pagewalk: consolidate vma->vm_start checks Dave Hansen
2014-06-02 21:36 ` [PATCH 02/10] mm: pagewalk: always skip hugetlbfs except when explicitly handled Dave Hansen
@ 2014-06-02 21:36 ` Dave Hansen
2014-06-02 21:36 ` [PATCH 04/10] mm: pagewalk: add page walker for mincore() Dave Hansen
` (7 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2014-06-02 21:36 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, kirill.shutemov, Dave Hansen
From: Dave Hansen <dave.hansen@linux.intel.com>
7 out of 9 of the page walkers need the VMA and pass it in some
way through mm_walk->private. Let's add it in the page walker
infrastructure.
This will increase the number of find_vma() calls, but the VMA
cache should help us out pretty nicely here. This is also quite
easy to optimize if this turns out to be an issue by skipping the
find_vma() call if 'addr' is still within our current
mm_walk->vma.
/proc/$pid/numa_map:
/proc/$pid/smaps:
lots of stuff including vma (vma is a drop in the bucket)
in a struct
/proc/$pid/clear_refs:
passes vma plus an enum in a struct
/proc/$pid/pagemap:
openrisc:
no VMA
MADV_WILLNEED:
walk->private is set to vma
cgroup precharge:
walk->private is set to vma
cgroup move charge:
walk->private is set to vma
powerpc subpages:
walk->private is set to vma
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---
b/arch/powerpc/mm/subpage-prot.c | 3 --
b/fs/proc/task_mmu.c | 25 ++++++---------------
b/include/linux/mm.h | 1
b/mm/madvise.c | 3 --
b/mm/memcontrol.c | 4 +--
b/mm/pagewalk.c | 45 ++++++++++++++++++++++++++++++++++-----
6 files changed, 52 insertions(+), 29 deletions(-)
diff -puN arch/powerpc/mm/subpage-prot.c~page-walker-pass-vma arch/powerpc/mm/subpage-prot.c
--- a/arch/powerpc/mm/subpage-prot.c~page-walker-pass-vma 2014-06-02 14:20:19.524817706 -0700
+++ b/arch/powerpc/mm/subpage-prot.c 2014-06-02 14:20:19.536818243 -0700
@@ -134,7 +134,7 @@ static void subpage_prot_clear(unsigned
static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr,
unsigned long end, struct mm_walk *walk)
{
- struct vm_area_struct *vma = walk->private;
+ struct vm_area_struct *vma = walk->vma;
split_huge_page_pmd(vma, addr, pmd);
return 0;
}
@@ -163,7 +163,6 @@ static void subpage_mark_vma_nohuge(stru
if (vma->vm_start >= (addr + len))
break;
vma->vm_flags |= VM_NOHUGEPAGE;
- subpage_proto_walk.private = vma;
walk_page_range(vma->vm_start, vma->vm_end,
&subpage_proto_walk);
vma = vma->vm_next;
diff -puN fs/proc/task_mmu.c~page-walker-pass-vma fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c~page-walker-pass-vma 2014-06-02 14:20:19.526817794 -0700
+++ b/fs/proc/task_mmu.c 2014-06-02 14:20:19.537818287 -0700
@@ -424,7 +424,6 @@ const struct file_operations proc_tid_ma
#ifdef CONFIG_PROC_PAGE_MONITOR
struct mem_size_stats {
- struct vm_area_struct *vma;
unsigned long resident;
unsigned long shared_clean;
unsigned long shared_dirty;
@@ -443,7 +442,7 @@ static void smaps_pte_entry(pte_t ptent,
unsigned long ptent_size, struct mm_walk *walk)
{
struct mem_size_stats *mss = walk->private;
- struct vm_area_struct *vma = mss->vma;
+ struct vm_area_struct *vma = walk->vma;
pgoff_t pgoff = linear_page_index(vma, addr);
struct page *page = NULL;
int mapcount;
@@ -495,7 +494,7 @@ static int smaps_pte_range(pmd_t *pmd, u
struct mm_walk *walk)
{
struct mem_size_stats *mss = walk->private;
- struct vm_area_struct *vma = mss->vma;
+ struct vm_area_struct *vma = walk->vma;
pte_t *pte;
spinlock_t *ptl;
@@ -588,7 +587,6 @@ static int show_smap(struct seq_file *m,
};
memset(&mss, 0, sizeof mss);
- mss.vma = vma;
/* mmap_sem is held in m_start */
if (vma->vm_mm)
walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk);
@@ -712,7 +710,6 @@ enum clear_refs_types {
};
struct clear_refs_private {
- struct vm_area_struct *vma;
enum clear_refs_types type;
};
@@ -748,7 +745,7 @@ static int clear_refs_pte_range(pmd_t *p
unsigned long end, struct mm_walk *walk)
{
struct clear_refs_private *cp = walk->private;
- struct vm_area_struct *vma = cp->vma;
+ struct vm_area_struct *vma = walk->vma;
pte_t *pte, ptent;
spinlock_t *ptl;
struct page *page;
@@ -828,7 +825,6 @@ static ssize_t clear_refs_write(struct f
if (type == CLEAR_REFS_SOFT_DIRTY)
mmu_notifier_invalidate_range_start(mm, 0, -1);
for (vma = mm->mmap; vma; vma = vma->vm_next) {
- cp.vma = vma;
/*
* Writing 1 to /proc/pid/clear_refs affects all pages.
*
@@ -1073,15 +1069,11 @@ static int pagemap_hugetlb_range(pte_t *
struct mm_walk *walk)
{
struct pagemapread *pm = walk->private;
- struct vm_area_struct *vma;
int err = 0;
int flags2;
pagemap_entry_t pme;
- vma = find_vma(walk->mm, addr);
- WARN_ON_ONCE(!vma);
-
- if (vma && (vma->vm_flags & VM_SOFTDIRTY))
+ if (walk->vma && (walk->vma->vm_flags & VM_SOFTDIRTY))
flags2 = __PM_SOFT_DIRTY;
else
flags2 = 0;
@@ -1241,7 +1233,6 @@ const struct file_operations proc_pagema
#ifdef CONFIG_NUMA
struct numa_maps {
- struct vm_area_struct *vma;
unsigned long pages;
unsigned long anon;
unsigned long active;
@@ -1317,11 +1308,11 @@ static int gather_pte_stats(pmd_t *pmd,
md = walk->private;
- if (pmd_trans_huge_lock(pmd, md->vma, &ptl) == 1) {
+ if (pmd_trans_huge_lock(pmd, walk->vma, &ptl) == 1) {
pte_t huge_pte = *(pte_t *)pmd;
struct page *page;
- page = can_gather_numa_stats(huge_pte, md->vma, addr);
+ page = can_gather_numa_stats(huge_pte, walk->vma, addr);
if (page)
gather_stats(page, md, pte_dirty(huge_pte),
HPAGE_PMD_SIZE/PAGE_SIZE);
@@ -1333,7 +1324,7 @@ static int gather_pte_stats(pmd_t *pmd,
return 0;
orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
do {
- struct page *page = can_gather_numa_stats(*pte, md->vma, addr);
+ struct page *page = can_gather_numa_stats(*pte, walk->vma, addr);
if (!page)
continue;
gather_stats(page, md, pte_dirty(*pte), 1);
@@ -1392,8 +1383,6 @@ static int show_numa_map(struct seq_file
/* Ensure we start with an empty set of numa_maps statistics. */
memset(md, 0, sizeof(*md));
- md->vma = vma;
-
walk.hugetlb_entry = gather_hugetbl_stats;
walk.pmd_entry = gather_pte_stats;
walk.private = md;
diff -puN include/linux/mm.h~page-walker-pass-vma include/linux/mm.h
--- a/include/linux/mm.h~page-walker-pass-vma 2014-06-02 14:20:19.528817884 -0700
+++ b/include/linux/mm.h 2014-06-02 14:20:19.538818332 -0700
@@ -1118,6 +1118,7 @@ struct mm_walk {
unsigned long addr, unsigned long next,
struct mm_walk *walk);
struct mm_struct *mm;
+ struct vm_area_struct *vma;
void *private;
};
diff -puN mm/madvise.c~page-walker-pass-vma mm/madvise.c
--- a/mm/madvise.c~page-walker-pass-vma 2014-06-02 14:20:19.529817929 -0700
+++ b/mm/madvise.c 2014-06-02 14:20:19.539818378 -0700
@@ -139,7 +139,7 @@ static int swapin_walk_pmd_entry(pmd_t *
unsigned long end, struct mm_walk *walk)
{
pte_t *orig_pte;
- struct vm_area_struct *vma = walk->private;
+ struct vm_area_struct *vma = walk->vma;
unsigned long index;
if (pmd_none_or_trans_huge_or_clear_bad(pmd))
@@ -176,7 +176,6 @@ static void force_swapin_readahead(struc
struct mm_walk walk = {
.mm = vma->vm_mm,
.pmd_entry = swapin_walk_pmd_entry,
- .private = vma,
};
walk_page_range(start, end, &walk);
diff -puN mm/memcontrol.c~page-walker-pass-vma mm/memcontrol.c
--- a/mm/memcontrol.c~page-walker-pass-vma 2014-06-02 14:20:19.532818064 -0700
+++ b/mm/memcontrol.c 2014-06-02 14:20:19.541818468 -0700
@@ -6786,7 +6786,7 @@ static int mem_cgroup_count_precharge_pt
unsigned long addr, unsigned long end,
struct mm_walk *walk)
{
- struct vm_area_struct *vma = walk->private;
+ struct vm_area_struct *vma = walk->vma;
pte_t *pte;
spinlock_t *ptl;
@@ -6962,7 +6962,7 @@ static int mem_cgroup_move_charge_pte_ra
struct mm_walk *walk)
{
int ret = 0;
- struct vm_area_struct *vma = walk->private;
+ struct vm_area_struct *vma = walk->vma;
pte_t *pte;
spinlock_t *ptl;
enum mc_target_type target_type;
diff -puN mm/pagewalk.c~page-walker-pass-vma mm/pagewalk.c
--- a/mm/pagewalk.c~page-walker-pass-vma 2014-06-02 14:20:19.533818109 -0700
+++ b/mm/pagewalk.c 2014-06-02 14:20:19.542818513 -0700
@@ -3,6 +3,38 @@
#include <linux/sched.h>
#include <linux/hugetlb.h>
+
+/*
+ * The VMA which applies to the current place in the
+ * page walk is tracked in walk->vma. If there is
+ * no VMA covering the current area (when in a pte_hole)
+ * walk->vma will be NULL.
+ *
+ * If the area bing walked is covered by more than one
+ * VMA, then the first one will be set in walk->vma.
+ * Additional VMAs can be found by walking the VMA sibling
+ * list, or by calling this function or find_vma() directly.
+ *
+ * In a situation where the area being walked is not
+ * entirely covered by a VMA, the _first_ VMA which covers
+ * part of the area will be set in walk->vma.
+ */
+static void walk_update_vma(unsigned long addr, unsigned long end,
+ struct mm_walk *walk)
+{
+ struct vm_area_struct *new_vma = find_vma(walk->mm, addr);
+
+ /*
+ * find_vma() is not exact and returns the next VMA
+ * ending after addr. The vma we found may be outside
+ * the range which we are walking, so clear it if so.
+ */
+ if (new_vma && new_vma->vm_start >= end)
+ new_vma = NULL;
+
+ walk->vma = new_vma;
+}
+
static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
struct mm_walk *walk)
{
@@ -15,6 +47,7 @@ static int walk_pte_range(pmd_t *pmd, un
if (err)
break;
addr += PAGE_SIZE;
+ walk_update_vma(addr, addr + PAGE_SIZE, walk);
if (addr == end)
break;
pte++;
@@ -35,6 +68,7 @@ static int walk_pmd_range(pud_t *pud, un
do {
again:
next = pmd_addr_end(addr, end);
+ walk_update_vma(addr, next, walk);
if (pmd_none(*pmd)) {
if (walk->pte_hole)
err = walk->pte_hole(addr, next, walk);
@@ -79,6 +113,7 @@ static int walk_pud_range(pgd_t *pgd, un
pud = pud_offset(pgd, addr);
do {
next = pud_addr_end(addr, end);
+ walk_update_vma(addr, next, walk);
if (pud_none_or_clear_bad(pud)) {
if (walk->pte_hole)
err = walk->pte_hole(addr, next, walk);
@@ -105,10 +140,10 @@ static unsigned long hugetlb_entry_end(s
return boundary < end ? boundary : end;
}
-static int walk_hugetlb_range(struct vm_area_struct *vma,
- unsigned long addr, unsigned long end,
+static int walk_hugetlb_range(unsigned long addr, unsigned long end,
struct mm_walk *walk)
{
+ struct vm_area_struct *vma = walk->vma;
struct hstate *h = hstate_vma(vma);
unsigned long next;
unsigned long hmask = huge_page_mask(h);
@@ -187,14 +222,14 @@ int walk_page_range(unsigned long addr,
struct vm_area_struct *vma = NULL;
next = pgd_addr_end(addr, end);
-
+ walk_update_vma(addr, next, walk);
/*
* This function was not intended to be vma based.
* But there are vma special cases to be handled:
* - hugetlb vma's
* - VM_PFNMAP vma's
*/
- vma = find_vma(walk->mm, addr);
+ vma = walk->vma;
if (vma && (vma->vm_start <= addr)) {
/*
* There are no page structures backing a VM_PFNMAP
@@ -219,7 +254,7 @@ int walk_page_range(unsigned long addr,
* so walk through hugetlb entries within a
* given vma.
*/
- err = walk_hugetlb_range(vma, addr, next, walk);
+ err = walk_hugetlb_range(addr, next, walk);
if (err)
break;
pgd = pgd_offset(walk->mm, next);
_
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 04/10] mm: pagewalk: add page walker for mincore()
2014-06-02 21:36 [PATCH 00/10] mm: pagewalk: huge page cleanups and VMA passing Dave Hansen
` (2 preceding siblings ...)
2014-06-02 21:36 ` [PATCH 03/10] mm: pagewalk: have generic code keep track of VMA Dave Hansen
@ 2014-06-02 21:36 ` Dave Hansen
2014-06-02 21:36 ` [PATCH 05/10] mm: mincore: clean up hugetlbfs handling (part 1) Dave Hansen
` (6 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2014-06-02 21:36 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, kirill.shutemov, Dave Hansen
From: Dave Hansen <dave.hansen@linux.intel.com>
This converts the sys_mincore() code over to use the
walk_page_range() infrastructure. This provides some pretty
nice code savings.
Note that the (now removed) comment:
/*
* Huge pages are always in RAM for now, but
* theoretically it needs to be checked.
*/
is bogus and has been for years. We started demand-faulting them
long ago. Thank goodness theory matters!
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---
b/mm/mincore.c | 123 +++++++++++++++++----------------------------------------
1 file changed, 37 insertions(+), 86 deletions(-)
diff -puN mm/mincore.c~mincore-page-walker-0 mm/mincore.c
--- a/mm/mincore.c~mincore-page-walker-0 2014-06-02 14:20:19.879833634 -0700
+++ b/mm/mincore.c 2014-06-02 14:20:19.883833814 -0700
@@ -19,38 +19,29 @@
#include <asm/uaccess.h>
#include <asm/pgtable.h>
-static void mincore_hugetlb_page_range(struct vm_area_struct *vma,
- unsigned long addr, unsigned long end,
- unsigned char *vec)
+static int mincore_hugetlb_page_range(pte_t *ptep, unsigned long hmask,
+ unsigned long addr, unsigned long end,
+ struct mm_walk *walk)
{
#ifdef CONFIG_HUGETLB_PAGE
- struct hstate *h;
-
- h = hstate_vma(vma);
+ unsigned char *vec = walk->private;
while (1) {
- unsigned char present;
- pte_t *ptep;
- /*
- * Huge pages are always in RAM for now, but
- * theoretically it needs to be checked.
- */
- ptep = huge_pte_offset(current->mm,
- addr & huge_page_mask(h));
- present = ptep && !huge_pte_none(huge_ptep_get(ptep));
+ int present = !huge_pte_none(huge_ptep_get(ptep));
while (1) {
*vec = present;
vec++;
addr += PAGE_SIZE;
if (addr == end)
- return;
+ return 0;
/* check hugepage border */
- if (!(addr & ~huge_page_mask(h)))
+ if (!(addr & hmask))
break;
}
}
#else
BUG();
#endif
+ return 0;
}
/*
@@ -94,10 +85,11 @@ static unsigned char mincore_page(struct
return present;
}
-static void mincore_unmapped_range(struct vm_area_struct *vma,
- unsigned long addr, unsigned long end,
- unsigned char *vec)
+static int mincore_unmapped_range(unsigned long addr, unsigned long end,
+ struct mm_walk *walk)
{
+ struct vm_area_struct *vma = walk->vma;
+ unsigned char *vec = walk->private;
unsigned long nr = (end - addr) >> PAGE_SHIFT;
int i;
@@ -111,27 +103,35 @@ static void mincore_unmapped_range(struc
for (i = 0; i < nr; i++)
vec[i] = 0;
}
+ return 0;
}
-static void mincore_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
- unsigned long addr, unsigned long end,
- unsigned char *vec)
+static int mincore_pte_range(pmd_t *pmd,
+ unsigned long addr, unsigned long end,
+ struct mm_walk *walk)
{
+ struct vm_area_struct *vma = walk->vma;
+ unsigned char *vec = walk->private;
unsigned long next;
spinlock_t *ptl;
pte_t *ptep;
+ if (pmd_trans_huge(*pmd)) {
+ int success = mincore_huge_pmd(vma, pmd, addr, end, vec);
+ if (success)
+ return 0;
+ /* the trans huge pmd just split, handle as small */
+ }
+
ptep = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
do {
pte_t pte = *ptep;
pgoff_t pgoff;
next = addr + PAGE_SIZE;
- if (pte_none(pte))
- mincore_unmapped_range(vma, addr, next, vec);
- else if (pte_present(pte))
+ if (pte_present(pte)) {
*vec = 1;
- else if (pte_file(pte)) {
+ } else if (pte_file(pte)) {
pgoff = pte_to_pgoff(pte);
*vec = mincore_page(vma->vm_file->f_mapping, pgoff);
} else { /* pte is a swap entry */
@@ -154,67 +154,21 @@ static void mincore_pte_range(struct vm_
vec++;
} while (ptep++, addr = next, addr != end);
pte_unmap_unlock(ptep - 1, ptl);
-}
-
-static void mincore_pmd_range(struct vm_area_struct *vma, pud_t *pud,
- unsigned long addr, unsigned long end,
- unsigned char *vec)
-{
- unsigned long next;
- pmd_t *pmd;
-
- pmd = pmd_offset(pud, addr);
- do {
- next = pmd_addr_end(addr, end);
- if (pmd_trans_huge(*pmd)) {
- if (mincore_huge_pmd(vma, pmd, addr, next, vec)) {
- vec += (next - addr) >> PAGE_SHIFT;
- continue;
- }
- /* fall through */
- }
- if (pmd_none_or_trans_huge_or_clear_bad(pmd))
- mincore_unmapped_range(vma, addr, next, vec);
- else
- mincore_pte_range(vma, pmd, addr, next, vec);
- vec += (next - addr) >> PAGE_SHIFT;
- } while (pmd++, addr = next, addr != end);
-}
-
-static void mincore_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
- unsigned long addr, unsigned long end,
- unsigned char *vec)
-{
- unsigned long next;
- pud_t *pud;
-
- pud = pud_offset(pgd, addr);
- do {
- next = pud_addr_end(addr, end);
- if (pud_none_or_clear_bad(pud))
- mincore_unmapped_range(vma, addr, next, vec);
- else
- mincore_pmd_range(vma, pud, addr, next, vec);
- vec += (next - addr) >> PAGE_SHIFT;
- } while (pud++, addr = next, addr != end);
+ return 0;
}
static void mincore_page_range(struct vm_area_struct *vma,
unsigned long addr, unsigned long end,
unsigned char *vec)
{
- unsigned long next;
- pgd_t *pgd;
-
- pgd = pgd_offset(vma->vm_mm, addr);
- do {
- next = pgd_addr_end(addr, end);
- if (pgd_none_or_clear_bad(pgd))
- mincore_unmapped_range(vma, addr, next, vec);
- else
- mincore_pud_range(vma, pgd, addr, next, vec);
- vec += (next - addr) >> PAGE_SHIFT;
- } while (pgd++, addr = next, addr != end);
+ struct mm_walk mincore_walk = {
+ .pte_hole = mincore_unmapped_range,
+ .pmd_entry = mincore_pte_range,
+ .hugetlb_entry = mincore_hugetlb_page_range,
+ .private = vec,
+ .mm = vma->vm_mm,
+ };
+ walk_page_range(vma->vm_start, vma->vm_end, &mincore_walk);
}
/*
@@ -233,10 +187,7 @@ static long do_mincore(unsigned long add
end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
- if (is_vm_hugetlb_page(vma))
- mincore_hugetlb_page_range(vma, addr, end, vec);
- else
- mincore_page_range(vma, addr, end, vec);
+ mincore_page_range(vma, addr, end, vec);
return (end - addr) >> PAGE_SHIFT;
}
_
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 05/10] mm: mincore: clean up hugetlbfs handling (part 1)
2014-06-02 21:36 [PATCH 00/10] mm: pagewalk: huge page cleanups and VMA passing Dave Hansen
` (3 preceding siblings ...)
2014-06-02 21:36 ` [PATCH 04/10] mm: pagewalk: add page walker for mincore() Dave Hansen
@ 2014-06-02 21:36 ` Dave Hansen
2014-06-02 21:36 ` [PATCH 06/10] mm: mincore: clean up hugetlbfs handler (part 2) Dave Hansen
` (5 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2014-06-02 21:36 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, kirill.shutemov, Dave Hansen
From: Dave Hansen <dave.hansen@linux.intel.com>
The page walker functions are only called _via_ the page
walker. I don't see this changing any time soon. The
page walker only calls walk->hugetlb_entry() under an
#ifdef CONFIG_HUGETLB_PAGE.
With this in place, I think putting BUG()s in the
->hugetlb_entry handlers is a bit like wearing a belt and
suspenders.
This axes the BUG() from the mincore ->hugetlb_entry along
with the #ifdef. The compiler is more than smart enough
to do the right thing when it sees:
if (1)
return;
// unreachable
The only downside here is that we now need some header stubs
for huge_pte_none() / huge_pte_get().
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---
b/include/linux/hugetlb.h | 10 ++++++++++
b/mm/mincore.c | 9 +++++----
2 files changed, 15 insertions(+), 4 deletions(-)
diff -puN include/linux/hugetlb.h~cleanup-hugetlbfs-mincore-1 include/linux/hugetlb.h
--- a/include/linux/hugetlb.h~cleanup-hugetlbfs-mincore-1 2014-06-02 14:20:20.144845525 -0700
+++ b/include/linux/hugetlb.h 2014-06-02 14:20:20.149845750 -0700
@@ -458,6 +458,16 @@ static inline spinlock_t *huge_pte_lockp
{
return &mm->page_table_lock;
}
+static inline int huge_pte_none(pte_t pte)
+{
+ WARN_ONCE(1, "%s() called when hugetlbfs disabled", __func__);
+ return 1;
+}
+static inline pte_t huge_ptep_get(pte_t *pte)
+{
+ WARN_ONCE(1, "%s() called when hugetlbfs disabled", __func__);
+ return __pte(0);
+}
#endif /* CONFIG_HUGETLB_PAGE */
static inline spinlock_t *huge_pte_lock(struct hstate *h,
diff -puN mm/mincore.c~cleanup-hugetlbfs-mincore-1 mm/mincore.c
--- a/mm/mincore.c~cleanup-hugetlbfs-mincore-1 2014-06-02 14:20:20.146845615 -0700
+++ b/mm/mincore.c 2014-06-02 14:20:20.149845750 -0700
@@ -23,8 +23,12 @@ static int mincore_hugetlb_page_range(pt
unsigned long addr, unsigned long end,
struct mm_walk *walk)
{
-#ifdef CONFIG_HUGETLB_PAGE
unsigned char *vec = walk->private;
+
+ /* This is as good as an explicit ifdef */
+ if (!is_vm_hugetlb_page(walk->vma))
+ return 0;
+
while (1) {
int present = !huge_pte_none(huge_ptep_get(ptep));
while (1) {
@@ -38,9 +42,6 @@ static int mincore_hugetlb_page_range(pt
break;
}
}
-#else
- BUG();
-#endif
return 0;
}
_
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 06/10] mm: mincore: clean up hugetlbfs handler (part 2)
2014-06-02 21:36 [PATCH 00/10] mm: pagewalk: huge page cleanups and VMA passing Dave Hansen
` (4 preceding siblings ...)
2014-06-02 21:36 ` [PATCH 05/10] mm: mincore: clean up hugetlbfs handling (part 1) Dave Hansen
@ 2014-06-02 21:36 ` Dave Hansen
2014-06-02 21:36 ` [PATCH 07/10] mm: pagewalk: kill check for hugetlbfs inside /proc pagemap code Dave Hansen
` (4 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2014-06-02 21:36 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, kirill.shutemov, Dave Hansen
From: Dave Hansen <dave.hansen@linux.intel.com>
The walk_page_range() code calls in to the ->hugetlbfs_entry
handler once for each huge page table entry. This means that
addr and end are always within the same huge page. (Well, end is
not technically _within_ it, because it is exclusive.)
The outer while() loop in mincore_hugetlb_page_range() appears to
be designed to work if we crossed a huge page boundary to a new
huge pte and 'present' changed. However, that is impossible for
two reasons:
1. The above-mentioned walk_page_range() restriction
2. We never move ptep
So the outer while() along with the check for crossing the end of
the huge page boundary (which is impossible) make no sense. Once
we peel it off, it's clear that we can just make the 'return' in
to the loop condition.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---
b/mm/mincore.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff -puN mm/mincore.c~cleanup-hugetlbfs-mincore-2 mm/mincore.c
--- a/mm/mincore.c~cleanup-hugetlbfs-mincore-2 2014-06-02 14:20:20.426858178 -0700
+++ b/mm/mincore.c 2014-06-02 14:20:20.430858359 -0700
@@ -24,23 +24,17 @@ static int mincore_hugetlb_page_range(pt
struct mm_walk *walk)
{
unsigned char *vec = walk->private;
+ int present;
/* This is as good as an explicit ifdef */
if (!is_vm_hugetlb_page(walk->vma))
return 0;
- while (1) {
- int present = !huge_pte_none(huge_ptep_get(ptep));
- while (1) {
- *vec = present;
- vec++;
- addr += PAGE_SIZE;
- if (addr == end)
- return 0;
- /* check hugepage border */
- if (!(addr & hmask))
- break;
- }
+ present = !huge_pte_none(huge_ptep_get(ptep));
+ while (addr < end) {
+ *vec = present;
+ vec++;
+ addr += PAGE_SIZE;
}
return 0;
}
_
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 07/10] mm: pagewalk: kill check for hugetlbfs inside /proc pagemap code
2014-06-02 21:36 [PATCH 00/10] mm: pagewalk: huge page cleanups and VMA passing Dave Hansen
` (5 preceding siblings ...)
2014-06-02 21:36 ` [PATCH 06/10] mm: mincore: clean up hugetlbfs handler (part 2) Dave Hansen
@ 2014-06-02 21:36 ` Dave Hansen
2014-06-02 21:36 ` [PATCH 08/10] mm: pagewalk: add locked pte walker Dave Hansen
` (3 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2014-06-02 21:36 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, kirill.shutemov, Dave Hansen
From: Dave Hansen <dave.hansen@linux.intel.com>
The page map code does not call the normal handlers for hugetlbfs
areas. They are handled by ->hugetlb_entry exclusively, so
remove the check for it.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---
b/fs/proc/task_mmu.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff -puN fs/proc/task_mmu.c~do-not-check-for-hugetlbfs-inside-pagemap-walker fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c~do-not-check-for-hugetlbfs-inside-pagemap-walker 2014-06-02 14:20:20.693870160 -0700
+++ b/fs/proc/task_mmu.c 2014-06-02 14:20:20.697870340 -0700
@@ -1033,8 +1033,7 @@ static int pagemap_pte_range(pmd_t *pmd,
/* check that 'vma' actually covers this address,
* and that it isn't a huge page vma */
- if (vma && (vma->vm_start <= addr) &&
- !is_vm_hugetlb_page(vma)) {
+ if (vma && (vma->vm_start <= addr)) {
pte = pte_offset_map(pmd, addr);
pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
/* unmap before userspace copy */
_
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 08/10] mm: pagewalk: add locked pte walker
2014-06-02 21:36 [PATCH 00/10] mm: pagewalk: huge page cleanups and VMA passing Dave Hansen
` (6 preceding siblings ...)
2014-06-02 21:36 ` [PATCH 07/10] mm: pagewalk: kill check for hugetlbfs inside /proc pagemap code Dave Hansen
@ 2014-06-02 21:36 ` Dave Hansen
2014-06-02 21:36 ` [PATCH 09/10] mm: pagewalk: use new locked walker for /proc/pid/smaps Dave Hansen
` (2 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2014-06-02 21:36 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, kirill.shutemov, Dave Hansen
From: Dave Hansen <dave.hansen@linux.intel.com>
Neither the locking nor the splitting logic needed for
transparent huge pages is trivial. We end up having to teach
each of the page walkers about it individually, and have the same
pattern copied across several of them.
This patch introduces a new handler: ->locked_single_entry. It
does two things: it handles the page table locking, including the
difference between pmds and ptes, and it lets you have a single
handler for large and small pages.
This greatly simplifies the handlers. I only implemented this
for two of the walk_page_range() users for now. I believe this
can at least be applied to a few more of them going forward.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---
b/include/linux/mm.h | 7 +++++++
b/mm/pagewalk.c | 43 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 50 insertions(+)
diff -puN include/linux/mm.h~mm-pagewalk-add-locked-walker include/linux/mm.h
--- a/include/linux/mm.h~mm-pagewalk-add-locked-walker 2014-06-02 14:20:20.963882275 -0700
+++ b/include/linux/mm.h 2014-06-02 14:20:20.969882545 -0700
@@ -1096,6 +1096,11 @@ void unmap_vmas(struct mmu_gather *tlb,
* pmd_trans_huge() pmds. They may simply choose to
* split_huge_page() instead of handling it explicitly.
* @pte_entry: if set, called for each non-empty PTE (4th-level) entry
+ * @locked_pte_entry: if set, called for each pmd or pte entry. The
+ * page table lock for the entry is also acquired
+ * such that the handler does not have to worry
+ * about the entry disappearing (or being split in
+ * the case of a pmd_trans_huge).
* @pte_hole: if set, called for each hole at all levels
* @hugetlb_entry: if set, called for each hugetlb entry
* *Caution*: The caller must hold mmap_sem() if @hugetlb_entry
@@ -1112,6 +1117,8 @@ struct mm_walk {
unsigned long next, struct mm_walk *walk);
int (*pte_entry)(pte_t *pte, unsigned long addr,
unsigned long next, struct mm_walk *walk);
+ int (*locked_single_entry)(pte_t *pte, unsigned long addr,
+ unsigned long pte_size, struct mm_walk *walk);
int (*pte_hole)(unsigned long addr, unsigned long next,
struct mm_walk *walk);
int (*hugetlb_entry)(pte_t *pte, unsigned long hmask,
diff -puN mm/pagewalk.c~mm-pagewalk-add-locked-walker mm/pagewalk.c
--- a/mm/pagewalk.c~mm-pagewalk-add-locked-walker 2014-06-02 14:20:20.965882364 -0700
+++ b/mm/pagewalk.c 2014-06-02 14:20:20.969882545 -0700
@@ -57,6 +57,40 @@ static int walk_pte_range(pmd_t *pmd, un
return err;
}
+static int walk_single_entry_locked(pmd_t *pmd, unsigned long addr,
+ unsigned long end, struct mm_walk *walk)
+{
+ int ret = 0;
+ struct vm_area_struct *vma = walk->vma;
+ pte_t *pte;
+ spinlock_t *ptl;
+
+ if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
+ ret = walk->locked_single_entry((pte_t *)pmd, addr,
+ HPAGE_PMD_SIZE, walk);
+ spin_unlock(ptl);
+ return ret;
+ }
+
+ /*
+ * See pmd_none_or_trans_huge_or_clear_bad() for a
+ * description of the races we are avoiding with this.
+ * Note that this essentially acts as if the pmd were
+ * NULL (empty).
+ */
+ if (pmd_trans_unstable(pmd))
+ return 0;
+
+ pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+ for (; addr != end; pte++, addr += PAGE_SIZE) {
+ ret = walk->locked_single_entry(pte, addr, PAGE_SIZE, walk);
+ if (ret)
+ break;
+ }
+ pte_unmap_unlock(pte - 1, ptl);
+ return ret;
+}
+
static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
struct mm_walk *walk)
{
@@ -77,6 +111,15 @@ again:
continue;
}
/*
+ * A ->locked_single_entry must be able to handle
+ * arbitrary (well, pmd or pte-sized) sizes
+ */
+ if (walk->locked_single_entry)
+ err = walk_single_entry_locked(pmd, addr, next, walk);
+ if (err)
+ break;
+
+ /*
* This implies that each ->pmd_entry() handler
* needs to know about pmd_trans_huge() pmds
*/
_
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 09/10] mm: pagewalk: use new locked walker for /proc/pid/smaps
2014-06-02 21:36 [PATCH 00/10] mm: pagewalk: huge page cleanups and VMA passing Dave Hansen
` (7 preceding siblings ...)
2014-06-02 21:36 ` [PATCH 08/10] mm: pagewalk: add locked pte walker Dave Hansen
@ 2014-06-02 21:36 ` Dave Hansen
2014-06-02 21:36 ` [PATCH 10/10] mm: pagewalk: use locked walker for /proc/pid/numa_maps Dave Hansen
[not found] ` <1401745925-l651h3s9@n-horiguchi@ah.jp.nec.com>
10 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2014-06-02 21:36 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, kirill.shutemov, Dave Hansen
From: Dave Hansen <dave.hansen@linux.intel.com>
The diffstat tells the story here. Using the new walker function
greatly simplifies the code. One side-effect here is that we'll
call cond_resched() more often than we did before. It used to be
called once per pte page, but now it's called on every pte.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---
b/fs/proc/task_mmu.c | 27 +++++----------------------
1 file changed, 5 insertions(+), 22 deletions(-)
diff -puN fs/proc/task_mmu.c~mm-pagewalk-use-locked-walker-smaps fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c~mm-pagewalk-use-locked-walker-smaps 2014-06-02 14:20:21.247895019 -0700
+++ b/fs/proc/task_mmu.c 2014-06-02 14:20:21.251895198 -0700
@@ -490,32 +490,15 @@ static void smaps_pte_entry(pte_t ptent,
}
}
-static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
+static int smaps_single_locked(pte_t *pte, unsigned long addr, unsigned long size,
struct mm_walk *walk)
{
struct mem_size_stats *mss = walk->private;
- struct vm_area_struct *vma = walk->vma;
- pte_t *pte;
- spinlock_t *ptl;
-
- if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
- smaps_pte_entry(*(pte_t *)pmd, addr, HPAGE_PMD_SIZE, walk);
- spin_unlock(ptl);
+
+ if (size == HPAGE_PMD_SIZE)
mss->anonymous_thp += HPAGE_PMD_SIZE;
- return 0;
- }
- if (pmd_trans_unstable(pmd))
- return 0;
- /*
- * The mmap_sem held all the way back in m_start() is what
- * keeps khugepaged out of here and from collapsing things
- * in here.
- */
- pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
- for (; addr != end; pte++, addr += PAGE_SIZE)
- smaps_pte_entry(*pte, addr, PAGE_SIZE, walk);
- pte_unmap_unlock(pte - 1, ptl);
+ smaps_pte_entry(*pte, addr, size, walk);
cond_resched();
return 0;
}
@@ -581,7 +564,7 @@ static int show_smap(struct seq_file *m,
struct vm_area_struct *vma = v;
struct mem_size_stats mss;
struct mm_walk smaps_walk = {
- .pmd_entry = smaps_pte_range,
+ .locked_single_entry = smaps_single_locked,
.mm = vma->vm_mm,
.private = &mss,
};
_
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 10/10] mm: pagewalk: use locked walker for /proc/pid/numa_maps
2014-06-02 21:36 [PATCH 00/10] mm: pagewalk: huge page cleanups and VMA passing Dave Hansen
` (8 preceding siblings ...)
2014-06-02 21:36 ` [PATCH 09/10] mm: pagewalk: use new locked walker for /proc/pid/smaps Dave Hansen
@ 2014-06-02 21:36 ` Dave Hansen
[not found] ` <1401745925-l651h3s9@n-horiguchi@ah.jp.nec.com>
10 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2014-06-02 21:36 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, kirill.shutemov, Dave Hansen
From: Dave Hansen <dave.hansen@linux.intel.com>
Same deal as the last one. Lots of code savings using the new
walker function.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---
b/fs/proc/task_mmu.c | 39 ++++++++-------------------------------
1 file changed, 8 insertions(+), 31 deletions(-)
diff -puN fs/proc/task_mmu.c~mm-pagewalk-use-locked-walker-numa_maps fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c~mm-pagewalk-use-locked-walker-numa_maps 2014-06-02 14:20:21.518907178 -0700
+++ b/fs/proc/task_mmu.c 2014-06-02 14:20:21.522907359 -0700
@@ -1280,41 +1280,18 @@ static struct page *can_gather_numa_stat
return page;
}
-static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
- unsigned long end, struct mm_walk *walk)
+static int gather_stats_locked(pte_t *pte, unsigned long addr,
+ unsigned long size, struct mm_walk *walk)
{
- struct numa_maps *md;
- spinlock_t *ptl;
- pte_t *orig_pte;
- pte_t *pte;
-
- md = walk->private;
+ struct numa_maps *md = walk->private;
+ struct page *page = can_gather_numa_stats(*pte, walk->vma, addr);
- if (pmd_trans_huge_lock(pmd, walk->vma, &ptl) == 1) {
- pte_t huge_pte = *(pte_t *)pmd;
- struct page *page;
-
- page = can_gather_numa_stats(huge_pte, walk->vma, addr);
- if (page)
- gather_stats(page, md, pte_dirty(huge_pte),
- HPAGE_PMD_SIZE/PAGE_SIZE);
- spin_unlock(ptl);
- return 0;
- }
+ if (page)
+ gather_stats(page, md, pte_dirty(*pte), size/PAGE_SIZE);
- if (pmd_trans_unstable(pmd))
- return 0;
- orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
- do {
- struct page *page = can_gather_numa_stats(*pte, walk->vma, addr);
- if (!page)
- continue;
- gather_stats(page, md, pte_dirty(*pte), 1);
-
- } while (pte++, addr += PAGE_SIZE, addr != end);
- pte_unmap_unlock(orig_pte, ptl);
return 0;
}
+
#ifdef CONFIG_HUGETLB_PAGE
static int gather_hugetbl_stats(pte_t *pte, unsigned long hmask,
unsigned long addr, unsigned long end, struct mm_walk *walk)
@@ -1366,7 +1343,7 @@ static int show_numa_map(struct seq_file
memset(md, 0, sizeof(*md));
walk.hugetlb_entry = gather_hugetbl_stats;
- walk.pmd_entry = gather_pte_stats;
+ walk.locked_single_entry = gather_stats_locked;
walk.private = md;
walk.mm = mm;
_
^ permalink raw reply [flat|nested] 15+ messages in thread[parent not found: <1401745925-l651h3s9@n-horiguchi@ah.jp.nec.com>]