[PATCH 06/19] mm/thp: Preserve pgprot across huge page split

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Paul Turner <pjt@google.com>,
	Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
	Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Thomas Gleixner <tglx@linutronix.de>,
	Hugh Dickins <hughd@google.com>
Subject: [PATCH 06/19] mm/thp: Preserve pgprot across huge page split
Date: Fri, 16 Nov 2012 17:25:08 +0100	[thread overview]
Message-ID: <1353083121-4560-7-git-send-email-mingo@kernel.org> (raw)
In-Reply-To: <1353083121-4560-1-git-send-email-mingo@kernel.org>

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

We're going to play games with page-protections, ensure we don't lose
them over a THP split.

Collapse seems to always allocate a new (huge) page which should
already end up on the new target node so loosing protections there
isn't a problem.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Paul Turner <pjt@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: http://lkml.kernel.org/n/tip-eyi25t4eh3l4cd2zp4k3bj6c@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/pgtable.h |   1 +
 mm/huge_memory.c               | 103 ++++++++++++++++++++---------------------
 2 files changed, 50 insertions(+), 54 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index a1f780d..f85dccd 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -349,6 +349,7 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
 }
 
 #define pte_pgprot(x) __pgprot(pte_flags(x) & PTE_FLAGS_MASK)
+#define pmd_pgprot(x) __pgprot(pmd_val(x) & ~_HPAGE_CHG_MASK)
 
 #define canon_pgprot(p) __pgprot(massage_pgprot(p))
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 40f17c3..176fe3d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1343,63 +1343,60 @@ static int __split_huge_page_map(struct page *page,
 	int ret = 0, i;
 	pgtable_t pgtable;
 	unsigned long haddr;
+	pgprot_t prot;
 
 	spin_lock(&mm->page_table_lock);
 	pmd = page_check_address_pmd(page, mm, address,
 				     PAGE_CHECK_ADDRESS_PMD_SPLITTING_FLAG);
-	if (pmd) {
-		pgtable = pgtable_trans_huge_withdraw(mm);
-		pmd_populate(mm, &_pmd, pgtable);
-
-		haddr = address;
-		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
-			pte_t *pte, entry;
-			BUG_ON(PageCompound(page+i));
-			entry = mk_pte(page + i, vma->vm_page_prot);
-			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-			if (!pmd_write(*pmd))
-				entry = pte_wrprotect(entry);
-			else
-				BUG_ON(page_mapcount(page) != 1);
-			if (!pmd_young(*pmd))
-				entry = pte_mkold(entry);
-			pte = pte_offset_map(&_pmd, haddr);
-			BUG_ON(!pte_none(*pte));
-			set_pte_at(mm, haddr, pte, entry);
-			pte_unmap(pte);
-		}
+	if (!pmd)
+		goto unlock;
 
-		smp_wmb(); /* make pte visible before pmd */
-		/*
-		 * Up to this point the pmd is present and huge and
-		 * userland has the whole access to the hugepage
-		 * during the split (which happens in place). If we
-		 * overwrite the pmd with the not-huge version
-		 * pointing to the pte here (which of course we could
-		 * if all CPUs were bug free), userland could trigger
-		 * a small page size TLB miss on the small sized TLB
-		 * while the hugepage TLB entry is still established
-		 * in the huge TLB. Some CPU doesn't like that. See
-		 * http://support.amd.com/us/Processor_TechDocs/41322.pdf,
-		 * Erratum 383 on page 93. Intel should be safe but is
-		 * also warns that it's only safe if the permission
-		 * and cache attributes of the two entries loaded in
-		 * the two TLB is identical (which should be the case
-		 * here). But it is generally safer to never allow
-		 * small and huge TLB entries for the same virtual
-		 * address to be loaded simultaneously. So instead of
-		 * doing "pmd_populate(); flush_tlb_range();" we first
-		 * mark the current pmd notpresent (atomically because
-		 * here the pmd_trans_huge and pmd_trans_splitting
-		 * must remain set at all times on the pmd until the
-		 * split is complete for this pmd), then we flush the
-		 * SMP TLB and finally we write the non-huge version
-		 * of the pmd entry with pmd_populate.
-		 */
-		pmdp_invalidate(vma, address, pmd);
-		pmd_populate(mm, pmd, pgtable);
-		ret = 1;
+	prot = pmd_pgprot(*pmd);
+	pgtable = pgtable_trans_huge_withdraw(mm);
+	pmd_populate(mm, &_pmd, pgtable);
+
+	for (i = 0, haddr = address; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+		pte_t *pte, entry;
+
+		BUG_ON(PageCompound(page+i));
+		entry = mk_pte(page + i, prot);
+		entry = pte_mkdirty(entry);
+		if (!pmd_young(*pmd))
+			entry = pte_mkold(entry);
+		pte = pte_offset_map(&_pmd, haddr);
+		BUG_ON(!pte_none(*pte));
+		set_pte_at(mm, haddr, pte, entry);
+		pte_unmap(pte);
 	}
+
+	smp_wmb(); /* make ptes visible before pmd, see __pte_alloc */
+	/*
+	 * Up to this point the pmd is present and huge.
+	 *
+	 * If we overwrite the pmd with the not-huge version, we could trigger
+	 * a small page size TLB miss on the small sized TLB while the hugepage
+	 * TLB entry is still established in the huge TLB.
+	 *
+	 * Some CPUs don't like that. See
+	 * http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum 383
+	 * on page 93.
+	 *
+	 * Thus it is generally safer to never allow small and huge TLB entries
+	 * for overlapping virtual addresses to be loaded. So we first mark the
+	 * current pmd not present, then we flush the TLB and finally we write
+	 * the non-huge version of the pmd entry with pmd_populate.
+	 *
+	 * The above needs to be done under the ptl because pmd_trans_huge and
+	 * pmd_trans_splitting must remain set on the pmd until the split is
+	 * complete. The ptl also protects against concurrent faults due to
+	 * making the pmd not-present.
+	 */
+	set_pmd_at(mm, address, pmd, pmd_mknotpresent(*pmd));
+	flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
+	pmd_populate(mm, pmd, pgtable);
+	ret = 1;
+
+unlock:
 	spin_unlock(&mm->page_table_lock);
 
 	return ret;
@@ -2287,10 +2284,8 @@ static void khugepaged_do_scan(void)
 {
 	struct page *hpage = NULL;
 	unsigned int progress = 0, pass_through_head = 0;
-	unsigned int pages = khugepaged_pages_to_scan;
 	bool wait = true;
-
-	barrier(); /* write khugepaged_pages_to_scan to local stack */
+	unsigned int pages = ACCESS_ONCE(khugepaged_pages_to_scan);
 
 	while (progress < pages) {
 		if (!khugepaged_prealloc_page(&hpage, &wait))
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2012-11-16 16:25 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-16 16:25 [PATCH 00/19] latest numa/base patches Ingo Molnar
2012-11-16 16:25 ` [PATCH 01/19] mm/generic: Only flush the local TLB in ptep_set_access_flags() Ingo Molnar
2012-11-16 16:25 ` [PATCH 02/19] x86/mm: Only do a local tlb flush " Ingo Molnar
2012-11-16 16:25 ` [PATCH 03/19] sched, numa, mm: Make find_busiest_queue() a method Ingo Molnar
2012-11-16 16:25 ` [PATCH 04/19] sched, numa, mm: Describe the NUMA scheduling problem formally Ingo Molnar
2012-11-25  6:07   ` abhishek agarwal
2012-11-25  6:09   ` abhishek agarwal
2012-11-16 16:25 ` [PATCH 05/19] sched, numa, mm, s390/thp: Implement pmd_pgprot() for s390 Ingo Molnar
2012-11-16 16:25 ` Ingo Molnar [this message]
2012-11-16 16:25 ` [PATCH 07/19] x86/mm: Introduce pte_accessible() Ingo Molnar
2012-11-16 16:25 ` [PATCH 08/19] mm: Only flush the TLB when clearing an accessible pte Ingo Molnar
2012-11-16 16:25 ` [PATCH 09/19] sched, numa, mm, MIPS/thp: Add pmd_pgprot() implementation Ingo Molnar
2012-11-16 16:25 ` [PATCH 10/19] mm/pgprot: Move the pgprot_modify() fallback definition to mm.h Ingo Molnar
2012-11-16 16:25 ` [PATCH 11/19] mm/mpol: Make MPOL_LOCAL a real policy Ingo Molnar
2012-11-16 16:25 ` [PATCH 12/19] mm/mpol: Add MPOL_MF_NOOP Ingo Molnar
2012-11-16 16:25 ` [PATCH 13/19] mm/mpol: Check for misplaced page Ingo Molnar
2012-11-16 16:25 ` [PATCH 14/19] mm/mpol: Create special PROT_NONE infrastructure Ingo Molnar
2012-11-16 16:25 ` [PATCH 15/19] mm/mpol: Add MPOL_MF_LAZY Ingo Molnar
2012-11-16 16:25 ` [PATCH 16/19] numa, mm: Support NUMA hinting page faults from gup/gup_fast Ingo Molnar
2012-11-16 16:25 ` [PATCH 17/19] mm/migrate: Introduce migrate_misplaced_page() Ingo Molnar
2012-11-19  2:25   ` [PATCH 17/19, v2] " Ingo Molnar
2012-11-19 16:02     ` Rik van Riel
2012-11-16 16:25 ` [PATCH 18/19] mm/mpol: Use special PROT_NONE to migrate pages Ingo Molnar
2012-11-16 16:25 ` [PATCH 19/19] x86/mm: Completely drop the TLB flush from ptep_set_access_flags() Ingo Molnar
2012-11-17  8:35 ` [PATCH 00/19] latest numa/base patches Alex Shi
2012-11-17  8:40   ` Alex Shi

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:a1f780d dfblob:f85dccd dfblob:40f17c3 dfblob:176fe3d )
 OR (
bs:"[PATCH 06/19] mm/thp: Preserve pgprot across huge page split" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1353083121-4560-7-git-send-email-mingo@kernel.org \
    --to=mingo@kernel.org \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=pjt@google.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).