From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Mel Gorman <mgorman@suse.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Ingo Molnar <mingo@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
Preeti U Murthy <preeti@linux.vnet.ibm.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Subject: [RFC PATCH 10/10] x86, mm: Prevent gcc to re-read the pagetables
Date: Tue, 30 Jul 2013 13:18:25 +0530 [thread overview]
Message-ID: <1375170505-5967-11-git-send-email-srikar@linux.vnet.ibm.com> (raw)
In-Reply-To: <1375170505-5967-1-git-send-email-srikar@linux.vnet.ibm.com>
From: Andrea Arcangeli <aarcange@redhat.com>
GCC is very likely to read the pagetables just once and cache them in
the local stack or in a register, but it is can also decide to re-read
the pagetables. The problem is that the pagetable in those places can
change from under gcc.
In the page fault we only hold the ->mmap_sem for reading and both the
page fault and MADV_DONTNEED only take the ->mmap_sem for reading and we
don't hold any PT lock yet.
In get_user_pages_fast() the TLB shootdown code can clear the pagetables
before firing any TLB flush (the page can't be freed until the TLB
flushing IPI has been delivered but the pagetables will be cleared well
before sending any TLB flushing IPI).
With THP/hugetlbfs the pmd (and pud for hugetlbfs giga pages) can
change as well under gup_fast, it won't just be cleared for the same
reasons described above for the pte in the page fault case.
[ This patch was picked up from the AutoNUMA tree. Also stayed in Ingo
Molnar's Numa tree for a while.]
Originally-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
Ingo, Andrea, Please let me know if I can add your signed-off-by.
---
arch/x86/mm/gup.c | 23 ++++++++++++++++++++---
mm/memory.c | 2 +-
2 files changed, 21 insertions(+), 4 deletions(-)
diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index dd74e46..6dc9921 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -150,7 +150,13 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
pmdp = pmd_offset(&pud, addr);
do {
- pmd_t pmd = *pmdp;
+ /*
+ * With THP and hugetlbfs the pmd can change from
+ * under us and it can be cleared as well by the TLB
+ * shootdown, so read it with ACCESS_ONCE to do all
+ * computations on the same sampling.
+ */
+ pmd_t pmd = ACCESS_ONCE(*pmdp);
next = pmd_addr_end(addr, end);
/*
@@ -220,7 +226,13 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
pudp = pud_offset(&pgd, addr);
do {
- pud_t pud = *pudp;
+ /*
+ * With hugetlbfs giga pages the pud can change from
+ * under us and it can be cleared as well by the TLB
+ * shootdown, so read it with ACCESS_ONCE to do all
+ * computations on the same sampling.
+ */
+ pud_t pud = ACCESS_ONCE(*pudp);
next = pud_addr_end(addr, end);
if (pud_none(pud))
@@ -280,7 +292,12 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
local_irq_save(flags);
pgdp = pgd_offset(mm, addr);
do {
- pgd_t pgd = *pgdp;
+ /*
+ * The pgd could be cleared by the TLB shootdown from
+ * under us so read it with ACCESS_ONCE to do all
+ * computations on the same sampling.
+ */
+ pgd_t pgd = ACCESS_ONCE(*pgdp);
next = pgd_addr_end(addr, end);
if (pgd_none(pgd))
diff --git a/mm/memory.c b/mm/memory.c
index ba94dec..6254dd2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3703,7 +3703,7 @@ int handle_pte_fault(struct mm_struct *mm,
pte_t entry;
spinlock_t *ptl;
- entry = *pte;
+ entry = ACCESS_ONCE(*pte);
if (!pte_present(entry)) {
if (pte_none(entry)) {
if (vma->vm_ops) {
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-07-30 7:50 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-30 7:48 [RFC PATCH 00/10] Improve numa scheduling by consolidating tasks Srikar Dronamraju
2013-07-30 7:48 ` [RFC PATCH 01/10] sched: Introduce per node numa weights Srikar Dronamraju
2013-07-30 7:48 ` [RFC PATCH 02/10] sched: Use numa weights while migrating tasks Srikar Dronamraju
2013-07-30 7:48 ` [RFC PATCH 03/10] sched: Select a better task to pull across node using iterations Srikar Dronamraju
2013-07-30 7:48 ` [RFC PATCH 04/10] sched: Move active_load_balance_cpu_stop to a new helper function Srikar Dronamraju
2013-07-30 7:48 ` [RFC PATCH 05/10] sched: Extend idle balancing to look for consolidation of tasks Srikar Dronamraju
2013-07-30 7:48 ` [RFC PATCH 06/10] sched: Limit migrations from a node Srikar Dronamraju
2013-07-30 7:48 ` [RFC PATCH 07/10] sched: Pass hint to active balancer about the task to be chosen Srikar Dronamraju
2013-07-30 7:48 ` [RFC PATCH 08/10] sched: Prevent a task from migrating immediately after an active balance Srikar Dronamraju
2013-07-30 7:48 ` [RFC PATCH 09/10] sched: Choose a runqueue that has lesser local affinity tasks Srikar Dronamraju
2013-07-30 7:48 ` Srikar Dronamraju [this message]
2013-07-30 8:17 ` [RFC PATCH 00/10] Improve numa scheduling by consolidating tasks Peter Zijlstra
2013-07-30 8:20 ` Peter Zijlstra
2013-07-30 9:03 ` Srikar Dronamraju
2013-07-30 9:10 ` Peter Zijlstra
2013-07-30 9:26 ` Peter Zijlstra
2013-07-30 9:46 ` Srikar Dronamraju
2013-07-31 15:09 ` Peter Zijlstra
2013-07-31 18:06 ` Srikar Dronamraju
2013-07-30 9:15 ` Srikar Dronamraju
2013-07-30 9:33 ` Peter Zijlstra
2013-07-31 17:35 ` Srikar Dronamraju
2013-07-31 13:33 ` Andrew Theurer
2013-07-31 15:43 ` Srikar Dronamraju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1375170505-5967-11-git-send-email-srikar@linux.vnet.ibm.com \
--to=srikar@linux.vnet.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=aarcange@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=preeti@linux.vnet.ibm.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).