From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
linux-mm@kvack.org
Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
akpm@linux-foundation.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Rik van Riel <riel@redhat.com>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Mel Gorman <mel@csn.ul.ie>, Nick Piggin <npiggin@kernel.dk>,
Alex Shi <alex.shi@intel.com>,
"Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com>,
Konrad Rzeszutek Wilk <konrad@darnok.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
David Miller <davem@davemloft.net>,
Russell King <rmk@arm.linux.org.uk>,
Catalin Marinas <catalin.marinas@arm.com>,
Chris Metcalf <cmetcalf@tilera.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Tony Luck <tony.luck@intel.com>, Paul Mundt <lethal@linux-sh.org>,
Jeff Dike <jdike@addtoit.com>,
Richard Weinberger <richard@nod.at>,
Hans-Christian Egtvedt <hans-christian.egtvedt@>
Subject: [PATCH 10/20] mm: Provide generic range tracking and flushing
Date: Wed, 27 Jun 2012 23:15:50 +0200 [thread overview]
Message-ID: <20120627212831.279978500@chello.nl> (raw)
In-Reply-To: 20120627211540.459910855@chello.nl
[-- Attachment #1: mm-generic-tlb-range.patch --]
[-- Type: text/plain, Size: 12136 bytes --]
In order to convert various architectures to generic tlb we need to
provide some extra infrastructure to track the range of the flushed
page tables.
There are two mmu_gather cases to consider:
unmap_region()
tlb_gather_mmu()
unmap_vmas()
for (; vma; vma = vma->vm_next)
unmap_page_range()
tlb_start_vma() -> flush cache range/track vm_flags
zap_*_range()
arch_enter_lazy_mmu_mode()
ptep_get_and_clear_full() -> batch/track external tlbs
tlb_remove_tlb_entry() -> track range/external tlbs
tlb_remove_page() -> batch page
arch_lazy_leave_mmu_mode() -> flush external tlbs
tlb_end_vma()
free_pgtables()
while (vma)
unlink_*_vma()
free_*_range()
*_free_tlb() -> track range/batch page
tlb_finish_mmu() -> flush TLBs and flush everything
free vmas
and:
shift_arg_pages()
tlb_gather_mmu()
free_*_range()
*_free_tlb() -> track tlb range
tlb_finish_mmu() -> flush things
There are various reasons that we need to flush TLBs _after_ tearing
down the page-tables themselves. For some architectures (x86 among
others) this serializes against (both hardware and software) page
table walkers like gup_fast().
For others (ARM) this is (also) needed to evict stale page-table
caches - ARM LPAE mode apparently caches page tables and concurrent
hardware walkers could re-populate these caches if the final tlb flush
were to be from tlb_end_vma() since an concurrent walk could still be
in progress.
So implement generic range tracking over both clearing the PTEs and
tearing down the page-tables.
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: James Bottomley <jejb@parisc-linux.org>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/Kconfig | 3
include/asm-generic/tlb.h | 193 ++++++++++++++++++++++++++++++++++++++++++----
mm/memory.c | 3
3 files changed, 185 insertions(+), 14 deletions(-)
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -244,6 +244,9 @@ config HAVE_HW_PAGE_TABLE_WALKS
linux page-table structure. Therefore we don't need to emit
hardware TLB flush instructions before freeing page-table pages.
+config HAVE_MMU_GATHER_RANGE
+ bool
+
config ARCH_HAVE_NMI_SAFE_CMPXCHG
bool
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -5,12 +5,77 @@
* Copyright 2001 Red Hat, Inc.
* Based on code from mm/memory.c Copyright Linus Torvalds and others.
*
- * Copyright 2011 Red Hat, Inc., Peter Zijlstra <pzijlstr@redhat.com>
+ * Copyright 2011-2012 Red Hat, Inc., Peter Zijlstra <pzijlstr@redhat.com>
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later version.
+ *
+ * This generic implementation tries to cover all TLB invalidate needs
+ * across our archicture spectrum, please ask before adding a new arch
+ * specific mmu_gather implementation.
+ *
+ * The TLB shootdown code deals with all the fun races an SMP system bring
+ * to the otherwise simple task of unmapping and freeing pages.
+ *
+ * There are two mmu_gather cases to consider, the below shows the various
+ * hooks and how this implementation employs them:
+ *
+ * unmap_region()
+ * tlb_gather_mmu()
+ * unmap_vmas()
+ * for (; vma; vma = vma->vm_next)
+ * unmap_page_range()
+ * tlb_start_vma() -> flush cache range/track vm_flags
+ * zap_*_range()
+ * arch_enter_lazy_mmu_mode()
+ * ptep_get_and_clear_full() -> batch/track external tlbs
+ * tlb_remove_tlb_entry() -> track range/external tlbs
+ * tlb_remove_page() -> batch page
+ * arch_leave_lazy_mmu_mode() -> flush external tlbs
+ * tlb_end_vma()
+ * free_pgtables()
+ * while (vma)
+ * unlink_*_vma()
+ * free_*_range()
+ * *_free_tlb() -> track range/batch page
+ * tlb_finish_mmu() -> flush TLBs and pages
+ * free vmas
+ *
+ * and:
+ *
+ * shift_arg_pages()
+ * tlb_gather_mmu()
+ * free_*_range()
+ * *_free_tlb() -> track range/batch page
+ * tlb_finish_mmu() -> flush TLBs and pages
+ *
+ * This code has 3 relevant Kconfig knobs:
+ *
+ * CONFIG_HAVE_MMU_GATHER_RANGE -- In case the architecture has an efficient
+ * flush_tlb_range() implementation this adds range tracking to the
+ * mmu_gather and avoids full mm invalidation where possible.
+ *
+ * There's a number of curious details wrt passing a vm_area_struct, see
+ * our tlb_start_vma() implementation.
+ *
+ * CONFIG_HAVE_RCU_TABLE_FREE -- In case flush_tlb_*() doesn't
+ * serialize software walkers against page-table tear-down. This option
+ * enables a semi-RCU freeing of page-tables such that disabling IRQs
+ * will still provide the required serialization. See the big comment
+ * a page or so down.
+ *
+ * CONFIG_HAVE_HW_PAGE_TABLE_WALKS -- Optimization for architectures with
+ * 'external' hash-table MMUs and similar which don't require a TLB
+ * invalidate before freeing page-tables, always used in conjunction
+ * with CONFIG_HAVE_RCU_TABLE_FREE to provide proper serialization for
+ * software page-table walkers.
+ *
+ * For instance SPARC64 and PPC use arch_{enter,leave}_lazy_mmu_mode()
+ * toghether with ptep_get_and_clear_full() to wipe their hash-table.
+ *
+ * See arch/Kconfig for more details.
*/
#ifndef _ASM_GENERIC__TLB_H
#define _ASM_GENERIC__TLB_H
@@ -37,7 +102,8 @@ struct mmu_gather_batch {
#define MAX_GATHER_BATCH \
((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))
-/* struct mmu_gather is an opaque type used by the mm code for passing around
+/*
+ * struct mmu_gather is an opaque type used by the mm code for passing around
* any data needed by arch specific code for tlb_remove_page.
*/
struct mmu_gather {
@@ -45,6 +111,10 @@ struct mmu_gather {
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
struct mmu_table_batch *batch;
#endif
+#ifdef CONFIG_HAVE_MMU_GATHER_RANGE
+ unsigned long start, end;
+ unsigned long vm_flags;
+#endif
unsigned int need_flush : 1, /* Did free PTEs */
fast_mode : 1; /* No batching */
@@ -83,6 +153,16 @@ struct mmu_gather {
* pressure. To guarantee progress we fall back to single table freeing, see
* the implementation of tlb_remove_table_one().
*
+ * When this option is selected, the arch is expected to use:
+ *
+ * void tlb_remove_table(struct mmu_gather *tlb, void *table)
+ *
+ * to 'free' page-tables from their respective __{pte,pmd,pud}_free_tlb()
+ * implementations and has to provide an implementation of:
+ *
+ * void __tlb_remove_table(void *);
+ *
+ * that actually does the free.
*/
struct mmu_table_batch {
struct rcu_head rcu;
@@ -118,8 +198,90 @@ static inline void tlb_remove_table(stru
#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+void tlb_flush_mmu(struct mmu_gather *tlb);
+
#define HAVE_GENERIC_MMU_GATHER
+#ifdef CONFIG_HAVE_MMU_GATHER_RANGE
+
+static inline void tlb_range_init(struct mmu_gather *tlb)
+{
+ tlb->start = TASK_SIZE;
+ tlb->end = 0;
+ tlb->vm_flags = 0;
+}
+
+static inline void
+tlb_track_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end)
+{
+ if (!tlb->fullmm) {
+ tlb->start = min(tlb->start, addr);
+ tlb->end = max(tlb->end, end);
+ }
+}
+
+static inline void tlb_flush(struct mmu_gather *tlb)
+{
+ /*
+ * Fake VMA, some architectures use VM_EXEC to flush I-TLB/I$,
+ * and some use VM_HUGETLB since they have separate HPAGE TLBs.
+ */
+ struct vm_area_struct vma = {
+ .vm_mm = tlb->mm,
+ .vm_flags = tlb->vm_flags,
+ };
+
+ flush_tlb_range(&vma, tlb->start, tlb->end);
+ tlb_range_init(tlb);
+}
+
+static inline void
+tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+ if (tlb->fullmm)
+ return;
+
+ /*
+ * flush_tlb_range() implementations that look at VM_HUGETLB
+ * (tile, mips-r4k) flush only large pages, so force flush on
+ * VM_HUGETLB vma boundaries.
+ */
+ if ((tlb->vm_flags & VM_HUGETLB) != (vma->vm_flags & VM_HUGETLB))
+ tlb_flush_mmu(tlb);
+
+ /*
+ * flush_tlb_range() implementations that flush I-TLB also flush
+ * D-TLB (tile, extensa, arm), so its ok to just add VM_EXEC to
+ * an existing range.
+ */
+ tlb->vm_flags |= vma->vm_flags & (VM_EXEC|VM_HUGETLB);
+
+ flush_cache_range(vma, vma->vm_start, vma->vm_end);
+}
+
+static inline void
+tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+}
+
+#else /* CONFIG_HAVE_MMU_GATHER_RANGE */
+
+static inline void tlb_range_init(struct mmu_gather *tlb)
+{
+}
+
+/*
+ * Macro avoids argument evaluation.
+ */
+#define tlb_track_range(tlb, addr, end) do { } while (0)
+
+static inline void tlb_flush(struct mmu_gather *tlb)
+{
+ flush_tlb_mm(tlb->mm);
+}
+
+#endif /* CONFIG_HAVE_MMU_GATHER_RANGE */
+
static inline int tlb_fast_mode(struct mmu_gather *tlb)
{
#ifdef CONFIG_SMP
@@ -134,7 +296,6 @@ static inline int tlb_fast_mode(struct m
}
void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
-void tlb_flush_mmu(struct mmu_gather *tlb);
void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end);
int __tlb_remove_page(struct mmu_gather *tlb, struct page *page);
@@ -155,10 +316,11 @@ static inline void tlb_remove_page(struc
* later optimise away the tlb invalidate. This helps when userspace is
* unmapping already-unmapped pages, which happens quite a lot.
*/
-#define tlb_remove_tlb_entry(tlb, ptep, address) \
+#define tlb_remove_tlb_entry(tlb, ptep, addr) \
do { \
tlb->need_flush = 1; \
- __tlb_remove_tlb_entry(tlb, ptep, address); \
+ tlb_track_range(tlb, addr, addr + PAGE_SIZE); \
+ __tlb_remove_tlb_entry(tlb, ptep, addr); \
} while (0)
/**
@@ -175,26 +337,31 @@ static inline void tlb_remove_page(struc
__tlb_remove_pmd_tlb_entry(tlb, pmdp, address); \
} while (0)
-#define pte_free_tlb(tlb, ptep, address, end) \
+#define pte_free_tlb(tlb, ptep, addr, end) \
do { \
tlb->need_flush = 1; \
- __pte_free_tlb(tlb, ptep, address); \
+ tlb_track_range(tlb, addr, end); \
+ __pte_free_tlb(tlb, ptep, addr); \
} while (0)
-#ifndef __ARCH_HAS_4LEVEL_HACK
-#define pud_free_tlb(tlb, pudp, address, end) \
+#define pmd_free_tlb(tlb, pmdp, addr, end) \
do { \
tlb->need_flush = 1; \
- __pud_free_tlb(tlb, pudp, address); \
+ tlb_track_range(tlb, addr, end); \
+ __pmd_free_tlb(tlb, pmdp, addr); \
} while (0)
-#endif
-#define pmd_free_tlb(tlb, pmdp, address, end) \
+#ifndef __ARCH_HAS_4LEVEL_HACK
+#define pud_free_tlb(tlb, pudp, addr, end) \
do { \
tlb->need_flush = 1; \
- __pmd_free_tlb(tlb, pmdp, address); \
+ tlb_track_range(tlb, addr, end); \
+ __pud_free_tlb(tlb, pudp, addr); \
} while (0)
+#endif
+#ifndef tlb_migrate_finish
#define tlb_migrate_finish(mm) do {} while (0)
+#endif
#endif /* _ASM_GENERIC__TLB_H */
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -214,6 +214,7 @@ void tlb_gather_mmu(struct mmu_gather *t
tlb->local.max = ARRAY_SIZE(tlb->__pages);
tlb->active = &tlb->local;
+ tlb_range_init(tlb);
tlb_table_init(tlb);
if (fullmm) {
@@ -228,7 +229,7 @@ void tlb_flush_mmu(struct mmu_gather *tl
if (!tlb->fullmm && tlb->need_flush) {
tlb->need_flush = 0;
- flush_tlb_mm(tlb->mm);
+ tlb_flush(tlb);
}
tlb_table_flush(tlb);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-06-27 21:15 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-27 21:15 [PATCH 00/20] Unify TLB gather implementations -v3 Peter Zijlstra
2012-06-27 21:15 ` [PATCH 01/20] mm, x86: Add HAVE_RCU_TABLE_FREE support Peter Zijlstra
2012-06-27 21:15 ` [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing Peter Zijlstra
2012-06-27 22:23 ` Linus Torvalds
2012-06-27 23:01 ` Peter Zijlstra
2012-06-27 23:01 ` Peter Zijlstra
2012-06-27 23:42 ` Linus Torvalds
2012-06-27 23:42 ` Linus Torvalds
2012-06-28 7:09 ` Benjamin Herrenschmidt
2012-06-28 7:09 ` Benjamin Herrenschmidt
2012-06-28 11:05 ` Peter Zijlstra
2012-06-28 11:05 ` Peter Zijlstra
2012-06-28 12:00 ` Benjamin Herrenschmidt
2012-06-28 12:00 ` Benjamin Herrenschmidt
2012-07-24 5:12 ` Nikunj A Dadhania
2012-07-24 5:12 ` Nikunj A Dadhania
2012-06-27 21:15 ` [PATCH 03/20] mm, tlb: Remove a few #ifdefs Peter Zijlstra
2012-06-27 21:15 ` [PATCH 04/20] mm, s390: use generic RCU page-table freeing code Peter Zijlstra
2012-06-27 21:15 ` [PATCH 05/20] mm, powerpc: Dont use tlb_flush for external tlb flushes Peter Zijlstra
2012-06-27 21:15 ` [PATCH 06/20] mm, sparc64: " Peter Zijlstra
2012-06-27 21:15 ` [PATCH 07/20] mm, arch: Remove tlb_flush() Peter Zijlstra
2012-06-27 21:15 ` [PATCH 08/20] mm: Optimize fullmm TLB flushing Peter Zijlstra
2012-06-27 22:26 ` Linus Torvalds
2012-06-27 23:02 ` Peter Zijlstra
2012-06-27 23:13 ` Peter Zijlstra
2012-06-27 23:13 ` Peter Zijlstra
2012-06-27 23:23 ` Linus Torvalds
2012-06-27 23:23 ` Linus Torvalds
2012-06-27 23:33 ` Linus Torvalds
2012-06-27 23:33 ` Linus Torvalds
2012-06-28 9:16 ` Catalin Marinas
2012-06-28 10:39 ` Benjamin Herrenschmidt
2012-06-28 10:59 ` Peter Zijlstra
2012-06-28 14:53 ` Catalin Marinas
2012-06-28 16:20 ` Peter Zijlstra
2012-06-28 16:38 ` Peter Zijlstra
2012-06-28 16:45 ` Linus Torvalds
2012-06-28 16:52 ` Peter Zijlstra
2012-06-28 21:57 ` Benjamin Herrenschmidt
2012-06-28 21:58 ` Benjamin Herrenschmidt
2012-06-29 8:49 ` Peter Zijlstra
2012-06-29 15:26 ` Catalin Marinas
2012-06-29 22:11 ` Benjamin Herrenschmidt
2012-06-28 10:55 ` Peter Zijlstra
2012-06-28 10:55 ` Peter Zijlstra
2012-06-28 11:19 ` Martin Schwidefsky
2012-06-28 11:19 ` Martin Schwidefsky
2012-06-28 11:30 ` Peter Zijlstra
2012-06-28 11:30 ` Peter Zijlstra
2012-06-28 16:00 ` Avi Kivity
2012-06-27 21:15 ` [PATCH 09/20] mm, arch: Add end argument to p??_free_tlb() Peter Zijlstra
2012-06-27 21:15 ` Peter Zijlstra [this message]
2012-06-27 21:15 ` [PATCH 11/20] mm, s390: Convert to use generic mmu_gather Peter Zijlstra
2012-06-27 22:13 ` Peter Zijlstra
2012-06-28 7:13 ` Martin Schwidefsky
2012-06-27 21:15 ` [PATCH 12/20] mm, arm: Convert arm to generic tlb Peter Zijlstra
2012-06-27 21:15 ` [PATCH 13/20] mm, ia64: Convert ia64 " Peter Zijlstra
2012-06-27 21:15 ` [PATCH 14/20] mm, sh: Convert sh " Peter Zijlstra
2012-06-28 18:32 ` Paul Mundt
2012-06-28 20:27 ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 15/20] mm, um: Convert um " Peter Zijlstra
2012-06-27 21:15 ` [PATCH 16/20] mm, avr32: Convert avr32 " Peter Zijlstra
2012-06-27 21:15 ` [PATCH 17/20] mm, mips: Convert mips " Peter Zijlstra
2012-06-27 21:15 ` [PATCH 18/20] mm, parisc: Convert parisc " Peter Zijlstra
2012-06-27 21:15 ` [PATCH 19/20] mm, sparc32: Convert sparc32 " Peter Zijlstra
2012-06-27 21:16 ` [PATCH 20/20] mm, xtensa: Convert xtensa " Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120627212831.279978500@chello.nl \
--to=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@intel.com \
--cc=benh@kernel.crashing.org \
--cc=catalin.marinas@arm.com \
--cc=cmetcalf@tilera.com \
--cc=davem@davemloft.net \
--cc=hugh.dickins@tiscali.co.uk \
--cc=jdike@addtoit.com \
--cc=konrad@darnok.org \
--cc=lethal@linux-sh.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mingo@elte.hu \
--cc=nikunj@linux.vnet.ibm.com \
--cc=npiggin@kernel.dk \
--cc=richard@nod.at \
--cc=riel@redhat.com \
--cc=rmk@arm.linux.org.uk \
--cc=schwidefsky@de.ibm.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).