From: Ingo Molnar <mingo@kernel.org>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Paul Turner <pjt@google.com>,
Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
Mel Gorman <mgorman@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Johannes Weiner <hannes@cmpxchg.org>,
Hugh Dickins <hughd@google.com>,
Lee Schermerhorn <lee.schermerhorn@hp.com>,
Alex Shi <lkml.alex@gmail.com>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Subject: [PATCH 15/52] mm/mempolicy: Implement change_prot_numa() in terms of change_protection()
Date: Sun, 2 Dec 2012 19:43:07 +0100 [thread overview]
Message-ID: <1354473824-19229-16-git-send-email-mingo@kernel.org> (raw)
In-Reply-To: <1354473824-19229-1-git-send-email-mingo@kernel.org>
From: Lee Schermerhorn <lee.schermerhorn@hp.com>
This patch converts change_prot_numa() to use
change_protection(). As pte_numa and friends check the PTE bits
directly it is necessary for change_protection() to use
pmd_mknuma(). Hence the required modifications to
change_protection() are a little clumsy but the end result is
that most of the numa page table helpers are just one or two
instructions.
Original-From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Paul Turner <pjt@google.com>
Cc: Alex Shi <lkml.alex@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
include/linux/huge_mm.h | 3 +-
include/linux/mm.h | 4 +-
mm/huge_memory.c | 14 ++++-
mm/mempolicy.c | 137 +++++-------------------------------------------
mm/mprotect.c | 62 ++++++++++++++++------
5 files changed, 75 insertions(+), 145 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index dabb510..027ad04 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -27,7 +27,8 @@ extern int move_huge_pmd(struct vm_area_struct *vma,
unsigned long new_addr, unsigned long old_end,
pmd_t *old_pmd, pmd_t *new_pmd);
extern int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
- unsigned long addr, pgprot_t newprot);
+ unsigned long addr, pgprot_t newprot,
+ int prot_numa);
enum transparent_hugepage_flag {
TRANSPARENT_HUGEPAGE_FLAG,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 471185e..d04c2f0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1080,7 +1080,7 @@ extern unsigned long do_mremap(unsigned long addr,
unsigned long flags, unsigned long new_addr);
extern unsigned long change_protection(struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgprot_t newprot,
- int dirty_accountable);
+ int dirty_accountable, int prot_numa);
extern int mprotect_fixup(struct vm_area_struct *vma,
struct vm_area_struct **pprev, unsigned long start,
unsigned long end, unsigned long newflags);
@@ -1552,7 +1552,7 @@ static inline pgprot_t vm_get_page_prot(unsigned long vm_flags)
#endif
#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
-void change_prot_numa(struct vm_area_struct *vma,
+unsigned long change_prot_numa(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
#endif
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5723b55..d79f7a5 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1147,7 +1147,7 @@ out:
}
int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
- unsigned long addr, pgprot_t newprot)
+ unsigned long addr, pgprot_t newprot, int prot_numa)
{
struct mm_struct *mm = vma->vm_mm;
int ret = 0;
@@ -1155,7 +1155,17 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
if (__pmd_trans_huge_lock(pmd, vma) == 1) {
pmd_t entry;
entry = pmdp_get_and_clear(mm, addr, pmd);
- entry = pmd_modify(entry, newprot);
+ if (!prot_numa)
+ entry = pmd_modify(entry, newprot);
+ else {
+ struct page *page = pmd_page(*pmd);
+
+ /* only check non-shared pages */
+ if (page_mapcount(page) == 1 &&
+ !pmd_numa(*pmd)) {
+ entry = pmd_mknuma(entry);
+ }
+ }
set_pmd_at(mm, addr, pmd, entry);
spin_unlock(&vma->vm_mm->page_table_lock);
ret = 1;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 51d3ebd..75d4600 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -568,134 +568,23 @@ static inline int check_pgd_range(struct vm_area_struct *vma,
#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
/*
- * Here we search for not shared page mappings (mapcount == 1) and we
- * set up the pmd/pte_numa on those mappings so the very next access
- * will fire a NUMA hinting page fault.
+ * This is used to mark a range of virtual addresses to be inaccessible.
+ * These are later cleared by a NUMA hinting fault. Depending on these
+ * faults, pages may be migrated for better NUMA placement.
+ *
+ * This is assuming that NUMA faults are handled using PROT_NONE. If
+ * an architecture makes a different choice, it will need further
+ * changes to the core.
*/
-static int
-change_prot_numa_range(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long address)
-{
- pgd_t *pgd;
- pud_t *pud;
- pmd_t *pmd;
- pte_t *pte, *_pte;
- struct page *page;
- unsigned long _address, end;
- spinlock_t *ptl;
- int ret = 0;
-
- VM_BUG_ON(address & ~PAGE_MASK);
-
- pgd = pgd_offset(mm, address);
- if (!pgd_present(*pgd))
- goto out;
-
- pud = pud_offset(pgd, address);
- if (!pud_present(*pud))
- goto out;
-
- pmd = pmd_offset(pud, address);
- if (pmd_none(*pmd))
- goto out;
-
- if (pmd_trans_huge_lock(pmd, vma) == 1) {
- int page_nid;
- ret = HPAGE_PMD_NR;
-
- VM_BUG_ON(address & ~HPAGE_PMD_MASK);
-
- if (pmd_numa(*pmd)) {
- spin_unlock(&mm->page_table_lock);
- goto out;
- }
-
- page = pmd_page(*pmd);
-
- /* only check non-shared pages */
- if (page_mapcount(page) != 1) {
- spin_unlock(&mm->page_table_lock);
- goto out;
- }
-
- page_nid = page_to_nid(page);
-
- if (pmd_numa(*pmd)) {
- spin_unlock(&mm->page_table_lock);
- goto out;
- }
-
- set_pmd_at(mm, address, pmd, pmd_mknuma(*pmd));
- ret += HPAGE_PMD_NR;
- /* defer TLB flush to lower the overhead */
- spin_unlock(&mm->page_table_lock);
- goto out;
- }
-
- if (pmd_trans_unstable(pmd))
- goto out;
- VM_BUG_ON(!pmd_present(*pmd));
-
- end = min(vma->vm_end, (address + PMD_SIZE) & PMD_MASK);
- pte = pte_offset_map_lock(mm, pmd, address, &ptl);
- for (_address = address, _pte = pte; _address < end;
- _pte++, _address += PAGE_SIZE) {
- pte_t pteval = *_pte;
- if (!pte_present(pteval))
- continue;
- if (pte_numa(pteval))
- continue;
- page = vm_normal_page(vma, _address, pteval);
- if (unlikely(!page))
- continue;
- /* only check non-shared pages */
- if (page_mapcount(page) != 1)
- continue;
-
- set_pte_at(mm, _address, _pte, pte_mknuma(pteval));
-
- /* defer TLB flush to lower the overhead */
- ret++;
- }
- pte_unmap_unlock(pte, ptl);
-
- if (ret && !pmd_numa(*pmd)) {
- spin_lock(&mm->page_table_lock);
- set_pmd_at(mm, address, pmd, pmd_mknuma(*pmd));
- spin_unlock(&mm->page_table_lock);
- /* defer TLB flush to lower the overhead */
- }
-
-out:
- return ret;
-}
-
-/* Assumes mmap_sem is held */
-void
-change_prot_numa(struct vm_area_struct *vma,
- unsigned long address, unsigned long end)
+unsigned long change_prot_numa(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end)
{
- struct mm_struct *mm = vma->vm_mm;
- int progress = 0;
-
- while (address < end) {
- VM_BUG_ON(address < vma->vm_start ||
- address + PAGE_SIZE > vma->vm_end);
+ int nr_updated;
+ BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE);
- progress += change_prot_numa_range(mm, vma, address);
- address = (address + PMD_SIZE) & PMD_MASK;
- }
+ nr_updated = change_protection(vma, addr, end, vma->vm_page_prot, 0, 1);
- /*
- * Flush the TLB for the mm to start the NUMA hinting
- * page faults after we finish scanning this vma part
- * if there were any PTE updates
- */
- if (progress) {
- mmu_notifier_invalidate_range_start(vma->vm_mm, address, end);
- flush_tlb_range(vma, address, end);
- mmu_notifier_invalidate_range_end(vma->vm_mm, address, end);
- }
+ return nr_updated;
}
#else
static unsigned long change_prot_numa(struct vm_area_struct *vma,
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 7c3628a..1b383b7 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -35,10 +35,11 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
}
#endif
-static unsigned long change_pte_range(struct mm_struct *mm, pmd_t *pmd,
+static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end, pgprot_t newprot,
- int dirty_accountable)
+ int dirty_accountable, int prot_numa)
{
+ struct mm_struct *mm = vma->vm_mm;
pte_t *pte, oldpte;
spinlock_t *ptl;
unsigned long pages = 0;
@@ -49,19 +50,39 @@ static unsigned long change_pte_range(struct mm_struct *mm, pmd_t *pmd,
oldpte = *pte;
if (pte_present(oldpte)) {
pte_t ptent;
+ bool updated = false;
ptent = ptep_modify_prot_start(mm, addr, pte);
- ptent = pte_modify(ptent, newprot);
+ if (!prot_numa) {
+ ptent = pte_modify(ptent, newprot);
+ updated = true;
+ } else {
+ struct page *page;
+
+ page = vm_normal_page(vma, addr, oldpte);
+ if (page) {
+ /* only check non-shared pages */
+ if (!pte_numa(oldpte) &&
+ page_mapcount(page) == 1) {
+ ptent = pte_mknuma(ptent);
+ updated = true;
+ }
+ }
+ }
/*
* Avoid taking write faults for pages we know to be
* dirty.
*/
- if (dirty_accountable && pte_dirty(ptent))
+ if (dirty_accountable && pte_dirty(ptent)) {
ptent = pte_mkwrite(ptent);
+ updated = true;
+ }
+
+ if (updated)
+ pages++;
ptep_modify_prot_commit(mm, addr, pte, ptent);
- pages++;
} else if (IS_ENABLED(CONFIG_MIGRATION) && !pte_file(oldpte)) {
swp_entry_t entry = pte_to_swp_entry(oldpte);
@@ -85,7 +106,7 @@ static unsigned long change_pte_range(struct mm_struct *mm, pmd_t *pmd,
static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *pud,
unsigned long addr, unsigned long end, pgprot_t newprot,
- int dirty_accountable)
+ int dirty_accountable, int prot_numa)
{
pmd_t *pmd;
unsigned long next;
@@ -97,7 +118,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *
if (pmd_trans_huge(*pmd)) {
if (next - addr != HPAGE_PMD_SIZE)
split_huge_page_pmd(vma->vm_mm, pmd);
- else if (change_huge_pmd(vma, pmd, addr, newprot)) {
+ else if (change_huge_pmd(vma, pmd, addr, newprot, prot_numa)) {
pages += HPAGE_PMD_NR;
continue;
}
@@ -105,8 +126,17 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *
}
if (pmd_none_or_clear_bad(pmd))
continue;
- pages += change_pte_range(vma->vm_mm, pmd, addr, next, newprot,
- dirty_accountable);
+ pages += change_pte_range(vma, pmd, addr, next, newprot,
+ dirty_accountable, prot_numa);
+
+ if (prot_numa) {
+ struct mm_struct *mm = vma->vm_mm;
+
+ spin_lock(&mm->page_table_lock);
+ set_pmd_at(mm, addr & PMD_MASK, pmd, pmd_mknuma(*pmd));
+ spin_unlock(&mm->page_table_lock);
+ }
+
} while (pmd++, addr = next, addr != end);
return pages;
@@ -114,7 +144,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *
static inline unsigned long change_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
unsigned long addr, unsigned long end, pgprot_t newprot,
- int dirty_accountable)
+ int dirty_accountable, int prot_numa)
{
pud_t *pud;
unsigned long next;
@@ -126,7 +156,7 @@ static inline unsigned long change_pud_range(struct vm_area_struct *vma, pgd_t *
if (pud_none_or_clear_bad(pud))
continue;
pages += change_pmd_range(vma, pud, addr, next, newprot,
- dirty_accountable);
+ dirty_accountable, prot_numa);
} while (pud++, addr = next, addr != end);
return pages;
@@ -134,7 +164,7 @@ static inline unsigned long change_pud_range(struct vm_area_struct *vma, pgd_t *
static unsigned long change_protection_range(struct vm_area_struct *vma,
unsigned long addr, unsigned long end, pgprot_t newprot,
- int dirty_accountable)
+ int dirty_accountable, int prot_numa)
{
struct mm_struct *mm = vma->vm_mm;
pgd_t *pgd;
@@ -150,7 +180,7 @@ static unsigned long change_protection_range(struct vm_area_struct *vma,
if (pgd_none_or_clear_bad(pgd))
continue;
pages += change_pud_range(vma, pgd, addr, next, newprot,
- dirty_accountable);
+ dirty_accountable, prot_numa);
} while (pgd++, addr = next, addr != end);
/* Only flush the TLB if we actually modified any entries: */
@@ -162,7 +192,7 @@ static unsigned long change_protection_range(struct vm_area_struct *vma,
unsigned long change_protection(struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgprot_t newprot,
- int dirty_accountable)
+ int dirty_accountable, int prot_numa)
{
struct mm_struct *mm = vma->vm_mm;
unsigned long pages;
@@ -171,7 +201,7 @@ unsigned long change_protection(struct vm_area_struct *vma, unsigned long start,
if (is_vm_hugetlb_page(vma))
pages = hugetlb_change_protection(vma, start, end, newprot);
else
- pages = change_protection_range(vma, start, end, newprot, dirty_accountable);
+ pages = change_protection_range(vma, start, end, newprot, dirty_accountable, prot_numa);
mmu_notifier_invalidate_range_end(mm, start, end);
return pages;
@@ -249,7 +279,7 @@ success:
dirty_accountable = 1;
}
- change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable);
+ change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable, 0);
vm_stat_account(mm, oldflags, vma->vm_file, -nrpages);
vm_stat_account(mm, newflags, vma->vm_file, nrpages);
--
1.7.11.7
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Paul Turner <pjt@google.com>,
Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
Mel Gorman <mgorman@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Johannes Weiner <hannes@cmpxchg.org>,
Hugh Dickins <hughd@google.com>,
Lee Schermerhorn <lee.schermerhorn@hp.com>,
Alex Shi <lkml.alex@gmail.com>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Subject: [PATCH 15/52] mm/mempolicy: Implement change_prot_numa() in terms of change_protection()
Date: Sun, 2 Dec 2012 19:43:07 +0100 [thread overview]
Message-ID: <1354473824-19229-16-git-send-email-mingo@kernel.org> (raw)
In-Reply-To: <1354473824-19229-1-git-send-email-mingo@kernel.org>
From: Lee Schermerhorn <lee.schermerhorn@hp.com>
This patch converts change_prot_numa() to use
change_protection(). As pte_numa and friends check the PTE bits
directly it is necessary for change_protection() to use
pmd_mknuma(). Hence the required modifications to
change_protection() are a little clumsy but the end result is
that most of the numa page table helpers are just one or two
instructions.
Original-From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Paul Turner <pjt@google.com>
Cc: Alex Shi <lkml.alex@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
include/linux/huge_mm.h | 3 +-
include/linux/mm.h | 4 +-
mm/huge_memory.c | 14 ++++-
mm/mempolicy.c | 137 +++++-------------------------------------------
mm/mprotect.c | 62 ++++++++++++++++------
5 files changed, 75 insertions(+), 145 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index dabb510..027ad04 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -27,7 +27,8 @@ extern int move_huge_pmd(struct vm_area_struct *vma,
unsigned long new_addr, unsigned long old_end,
pmd_t *old_pmd, pmd_t *new_pmd);
extern int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
- unsigned long addr, pgprot_t newprot);
+ unsigned long addr, pgprot_t newprot,
+ int prot_numa);
enum transparent_hugepage_flag {
TRANSPARENT_HUGEPAGE_FLAG,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 471185e..d04c2f0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1080,7 +1080,7 @@ extern unsigned long do_mremap(unsigned long addr,
unsigned long flags, unsigned long new_addr);
extern unsigned long change_protection(struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgprot_t newprot,
- int dirty_accountable);
+ int dirty_accountable, int prot_numa);
extern int mprotect_fixup(struct vm_area_struct *vma,
struct vm_area_struct **pprev, unsigned long start,
unsigned long end, unsigned long newflags);
@@ -1552,7 +1552,7 @@ static inline pgprot_t vm_get_page_prot(unsigned long vm_flags)
#endif
#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
-void change_prot_numa(struct vm_area_struct *vma,
+unsigned long change_prot_numa(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
#endif
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5723b55..d79f7a5 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1147,7 +1147,7 @@ out:
}
int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
- unsigned long addr, pgprot_t newprot)
+ unsigned long addr, pgprot_t newprot, int prot_numa)
{
struct mm_struct *mm = vma->vm_mm;
int ret = 0;
@@ -1155,7 +1155,17 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
if (__pmd_trans_huge_lock(pmd, vma) == 1) {
pmd_t entry;
entry = pmdp_get_and_clear(mm, addr, pmd);
- entry = pmd_modify(entry, newprot);
+ if (!prot_numa)
+ entry = pmd_modify(entry, newprot);
+ else {
+ struct page *page = pmd_page(*pmd);
+
+ /* only check non-shared pages */
+ if (page_mapcount(page) == 1 &&
+ !pmd_numa(*pmd)) {
+ entry = pmd_mknuma(entry);
+ }
+ }
set_pmd_at(mm, addr, pmd, entry);
spin_unlock(&vma->vm_mm->page_table_lock);
ret = 1;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 51d3ebd..75d4600 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -568,134 +568,23 @@ static inline int check_pgd_range(struct vm_area_struct *vma,
#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
/*
- * Here we search for not shared page mappings (mapcount == 1) and we
- * set up the pmd/pte_numa on those mappings so the very next access
- * will fire a NUMA hinting page fault.
+ * This is used to mark a range of virtual addresses to be inaccessible.
+ * These are later cleared by a NUMA hinting fault. Depending on these
+ * faults, pages may be migrated for better NUMA placement.
+ *
+ * This is assuming that NUMA faults are handled using PROT_NONE. If
+ * an architecture makes a different choice, it will need further
+ * changes to the core.
*/
-static int
-change_prot_numa_range(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long address)
-{
- pgd_t *pgd;
- pud_t *pud;
- pmd_t *pmd;
- pte_t *pte, *_pte;
- struct page *page;
- unsigned long _address, end;
- spinlock_t *ptl;
- int ret = 0;
-
- VM_BUG_ON(address & ~PAGE_MASK);
-
- pgd = pgd_offset(mm, address);
- if (!pgd_present(*pgd))
- goto out;
-
- pud = pud_offset(pgd, address);
- if (!pud_present(*pud))
- goto out;
-
- pmd = pmd_offset(pud, address);
- if (pmd_none(*pmd))
- goto out;
-
- if (pmd_trans_huge_lock(pmd, vma) == 1) {
- int page_nid;
- ret = HPAGE_PMD_NR;
-
- VM_BUG_ON(address & ~HPAGE_PMD_MASK);
-
- if (pmd_numa(*pmd)) {
- spin_unlock(&mm->page_table_lock);
- goto out;
- }
-
- page = pmd_page(*pmd);
-
- /* only check non-shared pages */
- if (page_mapcount(page) != 1) {
- spin_unlock(&mm->page_table_lock);
- goto out;
- }
-
- page_nid = page_to_nid(page);
-
- if (pmd_numa(*pmd)) {
- spin_unlock(&mm->page_table_lock);
- goto out;
- }
-
- set_pmd_at(mm, address, pmd, pmd_mknuma(*pmd));
- ret += HPAGE_PMD_NR;
- /* defer TLB flush to lower the overhead */
- spin_unlock(&mm->page_table_lock);
- goto out;
- }
-
- if (pmd_trans_unstable(pmd))
- goto out;
- VM_BUG_ON(!pmd_present(*pmd));
-
- end = min(vma->vm_end, (address + PMD_SIZE) & PMD_MASK);
- pte = pte_offset_map_lock(mm, pmd, address, &ptl);
- for (_address = address, _pte = pte; _address < end;
- _pte++, _address += PAGE_SIZE) {
- pte_t pteval = *_pte;
- if (!pte_present(pteval))
- continue;
- if (pte_numa(pteval))
- continue;
- page = vm_normal_page(vma, _address, pteval);
- if (unlikely(!page))
- continue;
- /* only check non-shared pages */
- if (page_mapcount(page) != 1)
- continue;
-
- set_pte_at(mm, _address, _pte, pte_mknuma(pteval));
-
- /* defer TLB flush to lower the overhead */
- ret++;
- }
- pte_unmap_unlock(pte, ptl);
-
- if (ret && !pmd_numa(*pmd)) {
- spin_lock(&mm->page_table_lock);
- set_pmd_at(mm, address, pmd, pmd_mknuma(*pmd));
- spin_unlock(&mm->page_table_lock);
- /* defer TLB flush to lower the overhead */
- }
-
-out:
- return ret;
-}
-
-/* Assumes mmap_sem is held */
-void
-change_prot_numa(struct vm_area_struct *vma,
- unsigned long address, unsigned long end)
+unsigned long change_prot_numa(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end)
{
- struct mm_struct *mm = vma->vm_mm;
- int progress = 0;
-
- while (address < end) {
- VM_BUG_ON(address < vma->vm_start ||
- address + PAGE_SIZE > vma->vm_end);
+ int nr_updated;
+ BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE);
- progress += change_prot_numa_range(mm, vma, address);
- address = (address + PMD_SIZE) & PMD_MASK;
- }
+ nr_updated = change_protection(vma, addr, end, vma->vm_page_prot, 0, 1);
- /*
- * Flush the TLB for the mm to start the NUMA hinting
- * page faults after we finish scanning this vma part
- * if there were any PTE updates
- */
- if (progress) {
- mmu_notifier_invalidate_range_start(vma->vm_mm, address, end);
- flush_tlb_range(vma, address, end);
- mmu_notifier_invalidate_range_end(vma->vm_mm, address, end);
- }
+ return nr_updated;
}
#else
static unsigned long change_prot_numa(struct vm_area_struct *vma,
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 7c3628a..1b383b7 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -35,10 +35,11 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
}
#endif
-static unsigned long change_pte_range(struct mm_struct *mm, pmd_t *pmd,
+static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end, pgprot_t newprot,
- int dirty_accountable)
+ int dirty_accountable, int prot_numa)
{
+ struct mm_struct *mm = vma->vm_mm;
pte_t *pte, oldpte;
spinlock_t *ptl;
unsigned long pages = 0;
@@ -49,19 +50,39 @@ static unsigned long change_pte_range(struct mm_struct *mm, pmd_t *pmd,
oldpte = *pte;
if (pte_present(oldpte)) {
pte_t ptent;
+ bool updated = false;
ptent = ptep_modify_prot_start(mm, addr, pte);
- ptent = pte_modify(ptent, newprot);
+ if (!prot_numa) {
+ ptent = pte_modify(ptent, newprot);
+ updated = true;
+ } else {
+ struct page *page;
+
+ page = vm_normal_page(vma, addr, oldpte);
+ if (page) {
+ /* only check non-shared pages */
+ if (!pte_numa(oldpte) &&
+ page_mapcount(page) == 1) {
+ ptent = pte_mknuma(ptent);
+ updated = true;
+ }
+ }
+ }
/*
* Avoid taking write faults for pages we know to be
* dirty.
*/
- if (dirty_accountable && pte_dirty(ptent))
+ if (dirty_accountable && pte_dirty(ptent)) {
ptent = pte_mkwrite(ptent);
+ updated = true;
+ }
+
+ if (updated)
+ pages++;
ptep_modify_prot_commit(mm, addr, pte, ptent);
- pages++;
} else if (IS_ENABLED(CONFIG_MIGRATION) && !pte_file(oldpte)) {
swp_entry_t entry = pte_to_swp_entry(oldpte);
@@ -85,7 +106,7 @@ static unsigned long change_pte_range(struct mm_struct *mm, pmd_t *pmd,
static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *pud,
unsigned long addr, unsigned long end, pgprot_t newprot,
- int dirty_accountable)
+ int dirty_accountable, int prot_numa)
{
pmd_t *pmd;
unsigned long next;
@@ -97,7 +118,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *
if (pmd_trans_huge(*pmd)) {
if (next - addr != HPAGE_PMD_SIZE)
split_huge_page_pmd(vma->vm_mm, pmd);
- else if (change_huge_pmd(vma, pmd, addr, newprot)) {
+ else if (change_huge_pmd(vma, pmd, addr, newprot, prot_numa)) {
pages += HPAGE_PMD_NR;
continue;
}
@@ -105,8 +126,17 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *
}
if (pmd_none_or_clear_bad(pmd))
continue;
- pages += change_pte_range(vma->vm_mm, pmd, addr, next, newprot,
- dirty_accountable);
+ pages += change_pte_range(vma, pmd, addr, next, newprot,
+ dirty_accountable, prot_numa);
+
+ if (prot_numa) {
+ struct mm_struct *mm = vma->vm_mm;
+
+ spin_lock(&mm->page_table_lock);
+ set_pmd_at(mm, addr & PMD_MASK, pmd, pmd_mknuma(*pmd));
+ spin_unlock(&mm->page_table_lock);
+ }
+
} while (pmd++, addr = next, addr != end);
return pages;
@@ -114,7 +144,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *
static inline unsigned long change_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
unsigned long addr, unsigned long end, pgprot_t newprot,
- int dirty_accountable)
+ int dirty_accountable, int prot_numa)
{
pud_t *pud;
unsigned long next;
@@ -126,7 +156,7 @@ static inline unsigned long change_pud_range(struct vm_area_struct *vma, pgd_t *
if (pud_none_or_clear_bad(pud))
continue;
pages += change_pmd_range(vma, pud, addr, next, newprot,
- dirty_accountable);
+ dirty_accountable, prot_numa);
} while (pud++, addr = next, addr != end);
return pages;
@@ -134,7 +164,7 @@ static inline unsigned long change_pud_range(struct vm_area_struct *vma, pgd_t *
static unsigned long change_protection_range(struct vm_area_struct *vma,
unsigned long addr, unsigned long end, pgprot_t newprot,
- int dirty_accountable)
+ int dirty_accountable, int prot_numa)
{
struct mm_struct *mm = vma->vm_mm;
pgd_t *pgd;
@@ -150,7 +180,7 @@ static unsigned long change_protection_range(struct vm_area_struct *vma,
if (pgd_none_or_clear_bad(pgd))
continue;
pages += change_pud_range(vma, pgd, addr, next, newprot,
- dirty_accountable);
+ dirty_accountable, prot_numa);
} while (pgd++, addr = next, addr != end);
/* Only flush the TLB if we actually modified any entries: */
@@ -162,7 +192,7 @@ static unsigned long change_protection_range(struct vm_area_struct *vma,
unsigned long change_protection(struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgprot_t newprot,
- int dirty_accountable)
+ int dirty_accountable, int prot_numa)
{
struct mm_struct *mm = vma->vm_mm;
unsigned long pages;
@@ -171,7 +201,7 @@ unsigned long change_protection(struct vm_area_struct *vma, unsigned long start,
if (is_vm_hugetlb_page(vma))
pages = hugetlb_change_protection(vma, start, end, newprot);
else
- pages = change_protection_range(vma, start, end, newprot, dirty_accountable);
+ pages = change_protection_range(vma, start, end, newprot, dirty_accountable, prot_numa);
mmu_notifier_invalidate_range_end(mm, start, end);
return pages;
@@ -249,7 +279,7 @@ success:
dirty_accountable = 1;
}
- change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable);
+ change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable, 0);
vm_stat_account(mm, oldflags, vma->vm_file, -nrpages);
vm_stat_account(mm, newflags, vma->vm_file, nrpages);
--
1.7.11.7
next prev parent reply other threads:[~2012-12-02 18:44 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-02 18:42 [PATCH 00/52] RFC: Unified NUMA balancing tree, v1 Ingo Molnar
2012-12-02 18:42 ` Ingo Molnar
2012-12-02 18:42 ` [PATCH 01/52] mm/compaction: Move migration fail/success stats to migrate.c Ingo Molnar
2012-12-02 18:42 ` Ingo Molnar
2012-12-02 18:42 ` [PATCH 02/52] mm/compaction: Add scanned and isolated counters for compaction Ingo Molnar
2012-12-02 18:42 ` Ingo Molnar
2012-12-02 18:42 ` [PATCH 03/52] mm/migrate: Add a tracepoint for migrate_pages Ingo Molnar
2012-12-02 18:42 ` Ingo Molnar
2012-12-02 18:42 ` [PATCH 04/52] mm/numa: define _PAGE_NUMA Ingo Molnar
2012-12-02 18:42 ` Ingo Molnar
2012-12-02 18:42 ` [PATCH 05/52] mm/numa: Add pte_numa() and pmd_numa() Ingo Molnar
2012-12-02 18:42 ` Ingo Molnar
2012-12-02 18:42 ` [PATCH 06/52] mm/numa: Support NUMA hinting page faults from gup/gup_fast Ingo Molnar
2012-12-02 18:42 ` Ingo Molnar
2012-12-02 18:42 ` [PATCH 07/52] mm/numa: split_huge_page: transfer the NUMA type from the pmd to the pte Ingo Molnar
2012-12-02 18:42 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 08/52] mm/numa: Create basic numa page hinting infrastructure Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 09/52] mm/mempolicy: Make MPOL_LOCAL a real policy Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 10/52] mm/mempolicy: Add MPOL_MF_NOOP Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 11/52] mm/mempolicy: Check for misplaced page Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 12/52] mm/migrate: Introduce migrate_misplaced_page() Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 13/52] mm/mempolicy: Use _PAGE_NUMA to migrate pages Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 14/52] mm/mempolicy: Add MPOL_MF_LAZY Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar [this message]
2012-12-02 18:43 ` [PATCH 15/52] mm/mempolicy: Implement change_prot_numa() in terms of change_protection() Ingo Molnar
2012-12-02 18:43 ` [PATCH 16/52] mm/mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 17/52] mm/numa: Add pte updates, hinting and migration stats Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 18/52] mm/numa: Migrate on reference policy Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-03 15:44 ` Mel Gorman
2012-12-03 15:44 ` Mel Gorman
2012-12-02 18:43 ` [PATCH 19/52] sched, numa, mm: Add last_cpu to page flags Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 20/52] mm, numa: Implement migrate-on-fault lazy NUMA strategy for regular and THP pages Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-05 0:55 ` David Rientjes
2012-12-05 0:55 ` David Rientjes
2012-12-05 9:43 ` Mel Gorman
2012-12-05 9:43 ` Mel Gorman
2012-12-02 18:43 ` [PATCH 21/52] sched: Make find_busiest_queue() a method Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 22/52] sched, numa, mm: Add credits for NUMA placement Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 23/52] sched, numa, mm: Describe the NUMA scheduling problem formally Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 24/52] sched: Add adaptive NUMA affinity support Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 25/52] sched, numa: Improve the CONFIG_NUMA_BALANCING help text Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 26/52] sched: Implement constant, per task Working Set Sampling (WSS) rate Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 27/52] sched, numa, mm: Count WS scanning against present PTEs, not virtual memory ranges Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 28/52] sched: Implement slow start for working set sampling Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 29/52] sched: Implement NUMA scanning backoff Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-03 19:55 ` Rik van Riel
2012-12-03 19:55 ` Rik van Riel
2012-12-02 18:43 ` [PATCH 30/52] sched: Improve convergence Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 31/52] sched: Introduce staged average NUMA faults Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 32/52] sched: Track groups of shared tasks Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-03 22:46 ` Rik van Riel
2012-12-03 22:46 ` Rik van Riel
2012-12-02 18:43 ` [PATCH 33/52] sched: Use the best-buddy 'ideal cpu' in balancing decisions Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 34/52] sched: Average the fault stats longer Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 35/52] sched: Use the ideal CPU to drive active balancing Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 36/52] sched: Add hysteresis to p->numa_shared Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 37/52] sched, numa, mm: Interleave shared tasks Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 38/52] sched, mm, mempolicy: Add per task mempolicy Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 39/52] sched: Track shared task's node groups and interleave their memory allocations Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 40/52] sched: Add "task flipping" support Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 41/52] sched: Move the NUMA placement logic to a worklet Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 42/52] numa, mempolicy: Improve CONFIG_NUMA_BALANCING=y OOM behavior Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 43/52] sched: Introduce directed NUMA convergence Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 44/52] sched: Remove statistical NUMA scheduling Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 45/52] sched: Track quality and strength of convergence Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 46/52] sched: Converge NUMA migrations Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 47/52] sched: Add convergence strength based adaptive NUMA page fault rate Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 48/52] sched: Refine the 'shared tasks' memory interleaving logic Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 49/52] mm/rmap: Convert the struct anon_vma::mutex to an rwsem Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-04 14:43 ` Michel Lespinasse
2012-12-04 14:43 ` Michel Lespinasse
2012-12-02 18:43 ` [PATCH 50/52] mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 51/52] sched: Exclude pinned tasks from the NUMA-balancing logic Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-02 18:43 ` [PATCH 52/52] sched: Add RSS filter to NUMA-balancing Ingo Molnar
2012-12-02 18:43 ` Ingo Molnar
2012-12-03 5:09 ` [PATCH 00/52] RFC: Unified NUMA balancing tree, v1 Ingo Molnar
2012-12-03 5:09 ` Ingo Molnar
2012-12-03 9:25 ` [GIT] Unified NUMA balancing tree, v2 Ingo Molnar
2012-12-03 9:25 ` Ingo Molnar
2012-12-03 15:52 ` [PATCH 00/52] RFC: Unified NUMA balancing tree, v1 Rik van Riel
2012-12-03 15:52 ` Rik van Riel
2012-12-03 17:11 ` Ingo Molnar
2012-12-03 17:11 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1354473824-19229-16-git-send-email-mingo@kernel.org \
--to=mingo@kernel.org \
--cc=Lee.Schermerhorn@hp.com \
--cc=a.p.zijlstra@chello.nl \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=cl@linux.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lkml.alex@gmail.com \
--cc=mgorman@suse.de \
--cc=pjt@google.com \
--cc=riel@redhat.com \
--cc=srikar@linux.vnet.ibm.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.