From: Ingo Molnar <mingo@kernel.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Andrea Arcangeli <aarcange@redhat.com>,
Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Hugh Dickins <hughd@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 06/19] mm: numa: teach gup_fast about pmd_numa
Date: Tue, 13 Nov 2012 11:07:36 +0100 [thread overview]
Message-ID: <20121113100735.GC21522@gmail.com> (raw)
In-Reply-To: <1352193295-26815-7-git-send-email-mgorman@suse.de>
* Mel Gorman <mgorman@suse.de> wrote:
> From: Andrea Arcangeli <aarcange@redhat.com>
>
> When scanning pmds, the pmd may be of numa type (_PAGE_PRESENT not set),
> however the pte might be present. Therefore, gup_pmd_range() must return
> 0 in this case to avoid losing a NUMA hinting page fault during gup_fast.
>
> Note: gup_fast will skip over non present ptes (like numa
> types), so no explicit check is needed for the pte_numa case.
> [...]
So, why not fix all architectures that choose to expose
pte_numa() and pmd_numa() methods - via the patch below?
Thanks,
Ingo
----------------->
WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Andrea Arcangeli <aarcange@redhat.com>,
Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Hugh Dickins <hughd@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 06/19] mm: numa: teach gup_fast about pmd_numa
Date: Tue, 13 Nov 2012 11:07:36 +0100 [thread overview]
Message-ID: <20121113100735.GC21522@gmail.com> (raw)
In-Reply-To: <1352193295-26815-7-git-send-email-mgorman@suse.de>
* Mel Gorman <mgorman@suse.de> wrote:
> From: Andrea Arcangeli <aarcange@redhat.com>
>
> When scanning pmds, the pmd may be of numa type (_PAGE_PRESENT not set),
> however the pte might be present. Therefore, gup_pmd_range() must return
> 0 in this case to avoid losing a NUMA hinting page fault during gup_fast.
>
> Note: gup_fast will skip over non present ptes (like numa
> types), so no explicit check is needed for the pte_numa case.
> [...]
So, why not fix all architectures that choose to expose
pte_numa() and pmd_numa() methods - via the patch below?
Thanks,
Ingo
----------------->
>From db4aa58db59a2a296141c698be8b4535d0051ca1 Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli <aarcange@redhat.com>
Date: Fri, 5 Oct 2012 21:36:27 +0200
Subject: [PATCH] numa, mm: Support NUMA hinting page faults from gup/gup_fast
Introduce FOLL_NUMA to tell follow_page to check
pte/pmd_numa. get_user_pages must use FOLL_NUMA, and it's safe to do
so because it always invokes handle_mm_fault and retries the
follow_page later.
KVM secondary MMU page faults will trigger the NUMA hinting page
faults through gup_fast -> get_user_pages -> follow_page ->
handle_mm_fault.
Other follow_page callers like KSM should not use FOLL_NUMA, or they
would fail to get the pages if they use follow_page instead of
get_user_pages.
[ This patch was picked up from the AutoNUMA tree. ]
Originally-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
[ ported to this tree. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
include/linux/mm.h | 1 +
mm/memory.c | 17 +++++++++++++++++
2 files changed, 18 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0025bf9..1821629 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1600,6 +1600,7 @@ struct page *follow_page(struct vm_area_struct *, unsigned long address,
#define FOLL_MLOCK 0x40 /* mark page as mlocked */
#define FOLL_SPLIT 0x80 /* don't return transhuge pages, split them */
#define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */
+#define FOLL_NUMA 0x200 /* force NUMA hinting page fault */
typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
void *data);
diff --git a/mm/memory.c b/mm/memory.c
index e3e8ab2..a660fd0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1536,6 +1536,8 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
goto out;
}
+ if ((flags & FOLL_NUMA) && pmd_numa(vma, *pmd))
+ goto no_page_table;
if (pmd_trans_huge(*pmd)) {
if (flags & FOLL_SPLIT) {
split_huge_page_pmd(mm, pmd);
@@ -1565,6 +1567,8 @@ split_fallthrough:
pte = *ptep;
if (!pte_present(pte))
goto no_page;
+ if ((flags & FOLL_NUMA) && pte_numa(vma, pte))
+ goto no_page;
if ((flags & FOLL_WRITE) && !pte_write(pte))
goto unlock;
@@ -1716,6 +1720,19 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
(VM_WRITE | VM_MAYWRITE) : (VM_READ | VM_MAYREAD);
vm_flags &= (gup_flags & FOLL_FORCE) ?
(VM_MAYREAD | VM_MAYWRITE) : (VM_READ | VM_WRITE);
+
+ /*
+ * If FOLL_FORCE and FOLL_NUMA are both set, handle_mm_fault
+ * would be called on PROT_NONE ranges. We must never invoke
+ * handle_mm_fault on PROT_NONE ranges or the NUMA hinting
+ * page faults would unprotect the PROT_NONE ranges if
+ * _PAGE_NUMA and _PAGE_PROTNONE are sharing the same pte/pmd
+ * bitflag. So to avoid that, don't set FOLL_NUMA if
+ * FOLL_FORCE is set.
+ */
+ if (!(gup_flags & FOLL_FORCE))
+ gup_flags |= FOLL_NUMA;
+
i = 0;
do {
next prev parent reply other threads:[~2012-11-13 10:07 UTC|newest]
Thread overview: 129+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-06 9:14 [RFC PATCH 00/19] Foundation for automatic NUMA balancing Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 9:14 ` [PATCH 01/19] mm: compaction: Move migration fail/success stats to migrate.c Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 17:32 ` Rik van Riel
2012-11-06 17:32 ` Rik van Riel
2012-11-06 9:14 ` [PATCH 02/19] mm: migrate: Add a tracepoint for migrate_pages Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 17:33 ` Rik van Riel
2012-11-06 17:33 ` Rik van Riel
2012-11-06 9:14 ` [PATCH 03/19] mm: compaction: Add scanned and isolated counters for compaction Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 17:35 ` Rik van Riel
2012-11-06 17:35 ` Rik van Riel
2012-11-06 9:14 ` [PATCH 04/19] mm: numa: define _PAGE_NUMA Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 18:35 ` Rik van Riel
2012-11-06 18:35 ` Rik van Riel
2012-11-06 9:14 ` [PATCH 05/19] mm: numa: pte_numa() and pmd_numa() Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-13 9:54 ` Ingo Molnar
2012-11-13 9:54 ` Ingo Molnar
2012-11-13 11:24 ` Mel Gorman
2012-11-13 11:24 ` Mel Gorman
2012-11-06 9:14 ` [PATCH 06/19] mm: numa: teach gup_fast about pmd_numa Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-13 10:07 ` Ingo Molnar [this message]
2012-11-13 10:07 ` Ingo Molnar
2012-11-13 11:37 ` Mel Gorman
2012-11-13 11:37 ` Mel Gorman
2012-11-13 13:51 ` Ingo Molnar
2012-11-13 13:51 ` Ingo Molnar
2012-11-06 9:14 ` [PATCH 07/19] mm: numa: split_huge_page: transfer the NUMA type from the pmd to the pte Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 9:14 ` [PATCH 08/19] mm: numa: Create basic numa page hinting infrastructure Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 18:58 ` Rik van Riel
2012-11-06 18:58 ` Rik van Riel
2012-11-07 10:38 ` Mel Gorman
2012-11-07 10:38 ` Mel Gorman
2012-11-07 10:48 ` Rik van Riel
2012-11-07 10:48 ` Rik van Riel
2012-11-07 11:00 ` Mel Gorman
2012-11-07 11:00 ` Mel Gorman
2012-11-13 10:21 ` Ingo Molnar
2012-11-13 10:21 ` Ingo Molnar
2012-11-13 11:50 ` Mel Gorman
2012-11-13 11:50 ` Mel Gorman
2012-11-13 13:49 ` Ingo Molnar
2012-11-13 13:49 ` Ingo Molnar
2012-11-13 14:26 ` Mel Gorman
2012-11-13 14:26 ` Mel Gorman
2012-11-06 9:14 ` [PATCH 09/19] mm: mempolicy: Make MPOL_LOCAL a real policy Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 9:14 ` [PATCH 10/19] mm: mempolicy: Add MPOL_MF_NOOP Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 9:14 ` [PATCH 11/19] mm: mempolicy: Check for misplaced page Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 9:14 ` [PATCH 12/19] mm: migrate: Introduce migrate_misplaced_page() Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 19:10 ` Rik van Riel
2012-11-06 19:10 ` Rik van Riel
2012-11-13 9:36 ` Ingo Molnar
2012-11-13 9:36 ` Ingo Molnar
2012-11-13 11:43 ` Ingo Molnar
2012-11-13 11:56 ` Mel Gorman
2012-11-13 11:56 ` Mel Gorman
2012-11-13 14:49 ` Rik van Riel
2012-11-13 14:49 ` Rik van Riel
2012-11-06 9:14 ` [PATCH 13/19] mm: mempolicy: Use _PAGE_NUMA to migrate pages Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 19:18 ` Rik van Riel
2012-11-06 19:18 ` Rik van Riel
2012-11-07 12:32 ` Mel Gorman
2012-11-07 12:32 ` Mel Gorman
2012-11-06 9:14 ` [PATCH 14/19] mm: mempolicy: Add MPOL_MF_LAZY Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 19:19 ` Rik van Riel
2012-11-06 19:19 ` Rik van Riel
2012-11-13 10:25 ` Ingo Molnar
2012-11-13 10:25 ` Ingo Molnar
2012-11-13 12:02 ` Mel Gorman
2012-11-13 12:02 ` Mel Gorman
2012-11-06 9:14 ` [PATCH 15/19] mm: numa: Add fault driven placement and migration Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 19:41 ` Rik van Riel
2012-11-06 19:41 ` Rik van Riel
2012-11-07 10:49 ` Mel Gorman
2012-11-07 10:49 ` Mel Gorman
2012-11-07 11:46 ` Rik van Riel
2012-11-07 11:46 ` Rik van Riel
2012-11-13 10:45 ` Ingo Molnar
2012-11-13 10:45 ` Ingo Molnar
2012-11-13 12:09 ` Mel Gorman
2012-11-13 12:09 ` Mel Gorman
2012-11-13 13:39 ` Ingo Molnar
2012-11-13 13:39 ` Ingo Molnar
2012-11-06 9:14 ` [PATCH 16/19] mm: numa: Add pte updates, hinting and migration stats Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 19:55 ` Rik van Riel
2012-11-06 19:55 ` Rik van Riel
2012-11-07 10:57 ` Mel Gorman
2012-11-07 10:57 ` Mel Gorman
2012-11-07 11:47 ` Rik van Riel
2012-11-07 11:47 ` Rik van Riel
2012-11-06 9:14 ` [PATCH 17/19] mm: numa: Migrate on reference policy Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-07 11:56 ` Rik van Riel
2012-11-07 11:56 ` Rik van Riel
2012-11-06 9:14 ` [PATCH 18/19] mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 19:55 ` Rik van Riel
2012-11-06 19:55 ` Rik van Riel
2012-11-06 9:14 ` [PATCH 19/19] mm: sched: numa: Implement slow start for working set sampling Mel Gorman
2012-11-06 9:14 ` Mel Gorman
2012-11-06 19:56 ` Rik van Riel
2012-11-06 19:56 ` Rik van Riel
2012-11-07 9:27 ` [RFC PATCH 00/19] Foundation for automatic NUMA balancing Zhouping Liu
2012-11-07 15:25 ` Mel Gorman
2012-11-07 15:25 ` Mel Gorman
2012-11-08 6:37 ` Zhouping Liu
2012-11-08 6:37 ` Zhouping Liu
2012-11-08 6:39 ` 杨竹
2012-11-08 7:03 ` Zhouping Liu
2012-11-08 7:03 ` Zhouping Liu
2012-11-09 14:42 ` Andrea Arcangeli
2012-11-09 14:42 ` Andrea Arcangeli
2012-11-09 16:12 ` Mel Gorman
2012-11-09 16:12 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121113100735.GC21522@gmail.com \
--to=mingo@kernel.org \
--cc=a.p.zijlstra@chello.nl \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.