linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] couple of TLB flush optimisations
@ 2018-06-12  7:16 Nicholas Piggin
  2018-06-12  7:16 ` [RFC PATCH 1/3] Revert "mm: always flush VMA ranges affected by zap_page_range" Nicholas Piggin
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Nicholas Piggin @ 2018-06-12  7:16 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linuxppc-dev, linux-arch, Aneesh Kumar K . V,
	Minchan Kim, Mel Gorman, Nadav Amit, Andrew Morton,
	Linus Torvalds

I'm just looking around TLB flushing and noticed a few issues with
the core code. The first one seems pretty straightforward, unless I
missed something, but the TLB flush pattern after the revert seems
okay.

The second one might be a bit more interesting for other architectures
and the big comment in include/asm-generic/tlb.h and linked mail from
Linus gives some good context.

I suspect mmu notifiers should use this precise TLB range too, because
I don't see how they could care about the page table structure under
the mapping. Although I only use it in powerpc so far.

Comments?

Thanks,
Nick

Nicholas Piggin (3):
  Revert "mm: always flush VMA ranges affected by zap_page_range"
  mm: mmu_gather track of invalidated TLB ranges explicitly for more
    precise flushing
  powerpc/64s/radix: optimise TLB flush with precise TLB ranges in
    mmu_gather

 arch/powerpc/mm/tlb-radix.c |  7 +++++--
 include/asm-generic/tlb.h   | 27 +++++++++++++++++++++++++--
 mm/memory.c                 | 18 ++++--------------
 3 files changed, 34 insertions(+), 18 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 1/3] Revert "mm: always flush VMA ranges affected by zap_page_range"
  2018-06-12  7:16 [RFC PATCH 0/3] couple of TLB flush optimisations Nicholas Piggin
@ 2018-06-12  7:16 ` Nicholas Piggin
  2018-06-12 13:53   ` Aneesh Kumar K.V
  2018-06-12 18:52   ` Nadav Amit
  2018-06-12  7:16 ` [RFC PATCH 2/3] mm: mmu_gather track of invalidated TLB ranges explicitly for more precise flushing Nicholas Piggin
  2018-06-12  7:16 ` [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather Nicholas Piggin
  2 siblings, 2 replies; 19+ messages in thread
From: Nicholas Piggin @ 2018-06-12  7:16 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linuxppc-dev, linux-arch, Aneesh Kumar K . V,
	Minchan Kim, Mel Gorman, Nadav Amit, Andrew Morton,
	Linus Torvalds

This reverts commit 4647706ebeee6e50f7b9f922b095f4ec94d581c3.

Patch 99baac21e4585 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss
problem") provides a superset of the TLB flush coverage of this
commit, and even includes in the changelog "this patch supersedes
'mm: Always flush VMA ranges affected by zap_page_range v2'".

Reverting this avoids double flushing the TLB range, and the less
efficient flush_tlb_range() call (the mmu_gather API is more precise
about what ranges it invalidates).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 mm/memory.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7206a634270b..9d472e00fc2d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1603,20 +1603,8 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
 	tlb_gather_mmu(&tlb, mm, start, end);
 	update_hiwater_rss(mm);
 	mmu_notifier_invalidate_range_start(mm, start, end);
-	for ( ; vma && vma->vm_start < end; vma = vma->vm_next) {
+	for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
 		unmap_single_vma(&tlb, vma, start, end, NULL);
-
-		/*
-		 * zap_page_range does not specify whether mmap_sem should be
-		 * held for read or write. That allows parallel zap_page_range
-		 * operations to unmap a PTE and defer a flush meaning that
-		 * this call observes pte_none and fails to flush the TLB.
-		 * Rather than adding a complex API, ensure that no stale
-		 * TLB entries exist when this call returns.
-		 */
-		flush_tlb_range(vma, start, end);
-	}
-
 	mmu_notifier_invalidate_range_end(mm, start, end);
 	tlb_finish_mmu(&tlb, start, end);
 }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 2/3] mm: mmu_gather track of invalidated TLB ranges explicitly for more precise flushing
  2018-06-12  7:16 [RFC PATCH 0/3] couple of TLB flush optimisations Nicholas Piggin
  2018-06-12  7:16 ` [RFC PATCH 1/3] Revert "mm: always flush VMA ranges affected by zap_page_range" Nicholas Piggin
@ 2018-06-12  7:16 ` Nicholas Piggin
  2018-06-12 18:14   ` Linus Torvalds
  2018-06-12  7:16 ` [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather Nicholas Piggin
  2 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2018-06-12  7:16 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linuxppc-dev, linux-arch, Aneesh Kumar K . V,
	Minchan Kim, Mel Gorman, Nadav Amit, Andrew Morton,
	Linus Torvalds

The mmu_gather APIs keep track of the invalidated address range
including the span covered by invalidated page table pages. Page table
pages with no ptes (and therefore could not have TLB entries) still
need to be involved in the invalidation if the processor caches
intermediate levels of the page table.

This allows a backwards compatible / legacy implementation to cache
page tables without modification, if they invalidate their page table
cache using their existing tlb invalidation instructions.

However this additional flush range is not necessary if the
architecture provides explicit page table cache management, or if it
ensures that page table cache entries will never be instantiated if
they did not reach a valid pte.

This is very noticable on powerpc in the exec path, in shift_arg_pages
where the TLB flushing for the page table teardown is a very large
range that gets implemented as a full process flush. This patch
provides page_start and page_end fields to mmu_gather which
architectures can use to optimise their TLB flushing.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 include/asm-generic/tlb.h | 27 +++++++++++++++++++++++++--
 mm/memory.c               |  4 +++-
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index faddde44de8c..a006f702b4c2 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -96,6 +96,8 @@ struct mmu_gather {
 #endif
 	unsigned long		start;
 	unsigned long		end;
+	unsigned long		page_start;
+	unsigned long		page_end;
 	/* we are in the middle of an operation to clear
 	 * a full mm and can make some optimizations */
 	unsigned int		fullmm : 1,
@@ -128,13 +130,25 @@ static inline void __tlb_adjust_range(struct mmu_gather *tlb,
 	tlb->end = max(tlb->end, address + range_size);
 }
 
+static inline void __tlb_adjust_page_range(struct mmu_gather *tlb,
+				      unsigned long address,
+				      unsigned int range_size)
+{
+	tlb->page_start = min(tlb->page_start, address);
+	tlb->page_end = max(tlb->page_end, address + range_size);
+}
+
+
 static inline void __tlb_reset_range(struct mmu_gather *tlb)
 {
 	if (tlb->fullmm) {
 		tlb->start = tlb->end = ~0;
+		tlb->page_start = tlb->page_end = ~0;
 	} else {
 		tlb->start = TASK_SIZE;
 		tlb->end = 0;
+		tlb->page_start = TASK_SIZE;
+		tlb->page_end = 0;
 	}
 }
 
@@ -210,12 +224,14 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define tlb_remove_tlb_entry(tlb, ptep, address)		\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
+		__tlb_adjust_page_range(tlb, address, PAGE_SIZE); \
 		__tlb_remove_tlb_entry(tlb, ptep, address);	\
 	} while (0)
 
 #define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	     \
 	do {							     \
 		__tlb_adjust_range(tlb, address, huge_page_size(h)); \
+		__tlb_adjust_page_range(tlb, address, huge_page_size(h)); \
 		__tlb_remove_tlb_entry(tlb, ptep, address);	     \
 	} while (0)
 
@@ -230,6 +246,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define tlb_remove_pmd_tlb_entry(tlb, pmdp, address)			\
 	do {								\
 		__tlb_adjust_range(tlb, address, HPAGE_PMD_SIZE);	\
+		__tlb_adjust_page_range(tlb, address, HPAGE_PMD_SIZE);	\
 		__tlb_remove_pmd_tlb_entry(tlb, pmdp, address);		\
 	} while (0)
 
@@ -244,6 +261,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define tlb_remove_pud_tlb_entry(tlb, pudp, address)			\
 	do {								\
 		__tlb_adjust_range(tlb, address, HPAGE_PUD_SIZE);	\
+		__tlb_adjust_page_range(tlb, address, HPAGE_PUD_SIZE);	\
 		__tlb_remove_pud_tlb_entry(tlb, pudp, address);		\
 	} while (0)
 
@@ -262,6 +280,11 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
  * architecture to do its own odd thing, not cause pain for others
  * http://lkml.kernel.org/r/CA+55aFzBggoXtNXQeng5d_mRoDnaMBE5Y+URs+PHR67nUpMtaw@mail.gmail.com
  *
+ * Powerpc (Book3S 64-bit) with the radix MMU has an architected "page
+ * walk cache" that is invalidated with a specific instruction. It uses
+ * need_flush_all to issue this instruction, which is set by its own
+ * __p??_free_tlb functions.
+ *
  * For now w.r.t page table cache, mark the range_size as PAGE_SIZE
  */
 
@@ -273,7 +296,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 
 #define pmd_free_tlb(tlb, pmdp, address)			\
 	do {							\
-		__tlb_adjust_range(tlb, address, PAGE_SIZE);		\
+		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
 		__pmd_free_tlb(tlb, pmdp, address);		\
 	} while (0)
 
@@ -288,7 +311,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #ifndef __ARCH_HAS_5LEVEL_HACK
 #define p4d_free_tlb(tlb, pudp, address)			\
 	do {							\
-		__tlb_adjust_range(tlb, address, PAGE_SIZE);		\
+		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
 		__p4d_free_tlb(tlb, pudp, address);		\
 	} while (0)
 #endif
diff --git a/mm/memory.c b/mm/memory.c
index 9d472e00fc2d..a46896b85e54 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -277,8 +277,10 @@ void arch_tlb_finish_mmu(struct mmu_gather *tlb,
 {
 	struct mmu_gather_batch *batch, *next;
 
-	if (force)
+	if (force) {
 		__tlb_adjust_range(tlb, start, end - start);
+		__tlb_adjust_page_range(tlb, start, end - start);
+	}
 
 	tlb_flush_mmu(tlb);
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
  2018-06-12  7:16 [RFC PATCH 0/3] couple of TLB flush optimisations Nicholas Piggin
  2018-06-12  7:16 ` [RFC PATCH 1/3] Revert "mm: always flush VMA ranges affected by zap_page_range" Nicholas Piggin
  2018-06-12  7:16 ` [RFC PATCH 2/3] mm: mmu_gather track of invalidated TLB ranges explicitly for more precise flushing Nicholas Piggin
@ 2018-06-12  7:16 ` Nicholas Piggin
  2018-06-12 18:18   ` Linus Torvalds
  2 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2018-06-12  7:16 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linuxppc-dev, linux-arch, Aneesh Kumar K . V,
	Minchan Kim, Mel Gorman, Nadav Amit, Andrew Morton,
	Linus Torvalds

Use the page_start and page_end fields of mmu_gather to implement more
precise TLB flushing. (start, end) covers the entire TLB and page
table range that has been invalidated, for architectures that do not
have explicit page walk cache management. page_start and page_end are
just for ranges that may have TLB entries.

A tlb_flush may have no pages in this range, but still requires PWC
to be flushed. That is handled properly.

This brings the number of tlbiel instructions required by a kernel
compile from 33M to 25M, most avoided from exec->shift_arg_pages.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/tlb-radix.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 67a6e86d3e7e..06452ad701cf 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -853,8 +853,11 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 		else
 			radix__flush_all_mm(mm);
 	} else {
-		unsigned long start = tlb->start;
-		unsigned long end = tlb->end;
+		unsigned long start = tlb->page_start;
+		unsigned long end = tlb->page_end;
+
+		if (end < start)
+			end = start;
 
 		if (!tlb->need_flush_all)
 			radix__flush_tlb_range_psize(mm, start, end, psize);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 1/3] Revert "mm: always flush VMA ranges affected by zap_page_range"
  2018-06-12  7:16 ` [RFC PATCH 1/3] Revert "mm: always flush VMA ranges affected by zap_page_range" Nicholas Piggin
@ 2018-06-12 13:53   ` Aneesh Kumar K.V
  2018-06-12 18:52   ` Nadav Amit
  1 sibling, 0 replies; 19+ messages in thread
From: Aneesh Kumar K.V @ 2018-06-12 13:53 UTC (permalink / raw)
  To: Nicholas Piggin, linux-mm
  Cc: linuxppc-dev, linux-arch, Aneesh Kumar K . V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton, Linus Torvalds

On 06/12/2018 12:46 PM, Nicholas Piggin wrote:
> This reverts commit 4647706ebeee6e50f7b9f922b095f4ec94d581c3.
> 
> Patch 99baac21e4585 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss
> problem") provides a superset of the TLB flush coverage of this
> commit, and even includes in the changelog "this patch supersedes
> 'mm: Always flush VMA ranges affected by zap_page_range v2'".
> 
> Reverting this avoids double flushing the TLB range, and the less
> efficient flush_tlb_range() call (the mmu_gather API is more precise
> about what ranges it invalidates).
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   mm/memory.c | 14 +-------------
>   1 file changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 7206a634270b..9d472e00fc2d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1603,20 +1603,8 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
>   	tlb_gather_mmu(&tlb, mm, start, end);
>   	update_hiwater_rss(mm);
>   	mmu_notifier_invalidate_range_start(mm, start, end);
> -	for ( ; vma && vma->vm_start < end; vma = vma->vm_next) {
> +	for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
>   		unmap_single_vma(&tlb, vma, start, end, NULL);
> -
> -		/*
> -		 * zap_page_range does not specify whether mmap_sem should be
> -		 * held for read or write. That allows parallel zap_page_range
> -		 * operations to unmap a PTE and defer a flush meaning that
> -		 * this call observes pte_none and fails to flush the TLB.
> -		 * Rather than adding a complex API, ensure that no stale
> -		 * TLB entries exist when this call returns.
> -		 */
> -		flush_tlb_range(vma, start, end);
> -	}
> -
>   	mmu_notifier_invalidate_range_end(mm, start, end);
>   	tlb_finish_mmu(&tlb, start, end);
>   }
> 

No really related to this patch, but does 99baac21e4585 do the right 
thing if the range start - end covers pages with multiple page sizes?

-aneesh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/3] mm: mmu_gather track of invalidated TLB ranges explicitly for more precise flushing
  2018-06-12  7:16 ` [RFC PATCH 2/3] mm: mmu_gather track of invalidated TLB ranges explicitly for more precise flushing Nicholas Piggin
@ 2018-06-12 18:14   ` Linus Torvalds
  0 siblings, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2018-06-12 18:14 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton

On Tue, Jun 12, 2018 at 12:16 AM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> +static inline void __tlb_adjust_page_range(struct mmu_gather *tlb,
> +                                     unsigned long address,
> +                                     unsigned int range_size)
> +{
> +       tlb->page_start = min(tlb->page_start, address);
> +       tlb->page_end = max(tlb->page_end, address + range_size);
> +}

Why add this unnecessary complexity for architectures where it doesn't matter?

This is not "generic". This is some crazy powerpc special case. Why
add it to generic code, and why make everybody else take the cost?

                    Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
  2018-06-12  7:16 ` [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather Nicholas Piggin
@ 2018-06-12 18:18   ` Linus Torvalds
  2018-06-12 22:31     ` Nicholas Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2018-06-12 18:18 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton

On Tue, Jun 12, 2018 at 12:16 AM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> This brings the number of tlbiel instructions required by a kernel
> compile from 33M to 25M, most avoided from exec->shift_arg_pages.

And this shows that "page_start/end" is purely for powerpc and used
nowhere else.

The previous patch should have been to purely powerpc page table
walking and not touch asm-generic/tlb.h

I think you should make those changes to
arch/powerpc/include/asm/tlb.h. If that means you can't use the
generic header, then so be it.

Or maybe you can embed the generic case in some ppc-specific
structures, and use 90% of the generic code just with your added
wrappers for that radix invalidation on top.

But don't make other architectures do pointless work that doesn't
matter - or make sense - for them.

               Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 1/3] Revert "mm: always flush VMA ranges affected by zap_page_range"
  2018-06-12  7:16 ` [RFC PATCH 1/3] Revert "mm: always flush VMA ranges affected by zap_page_range" Nicholas Piggin
  2018-06-12 13:53   ` Aneesh Kumar K.V
@ 2018-06-12 18:52   ` Nadav Amit
  1 sibling, 0 replies; 19+ messages in thread
From: Nadav Amit @ 2018-06-12 18:52 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: open list:MEMORY MANAGEMENT, linuxppc-dev, linux-arch,
	Aneesh Kumar K . V, Minchan Kim, Mel Gorman, Andrew Morton,
	Linus Torvalds

at 12:16 AM, Nicholas Piggin <npiggin@gmail.com> wrote:

> This reverts commit 4647706ebeee6e50f7b9f922b095f4ec94d581c3.
> 
> Patch 99baac21e4585 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss
> problem") provides a superset of the TLB flush coverage of this
> commit, and even includes in the changelog "this patch supersedes
> 'mm: Always flush VMA ranges affected by zap_page_range v2'".
> 
> Reverting this avoids double flushing the TLB range, and the less
> efficient flush_tlb_range() call (the mmu_gather API is more precise
> about what ranges it invalidates).
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> mm/memory.c | 14 +-------------
> 1 file changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 7206a634270b..9d472e00fc2d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1603,20 +1603,8 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
> 	tlb_gather_mmu(&tlb, mm, start, end);
> 	update_hiwater_rss(mm);
> 	mmu_notifier_invalidate_range_start(mm, start, end);
> -	for ( ; vma && vma->vm_start < end; vma = vma->vm_next) {
> +	for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
> 		unmap_single_vma(&tlb, vma, start, end, NULL);
> -
> -		/*
> -		 * zap_page_range does not specify whether mmap_sem should be
> -		 * held for read or write. That allows parallel zap_page_range
> -		 * operations to unmap a PTE and defer a flush meaning that
> -		 * this call observes pte_none and fails to flush the TLB.
> -		 * Rather than adding a complex API, ensure that no stale
> -		 * TLB entries exist when this call returns.
> -		 */
> -		flush_tlb_range(vma, start, end);
> -	}
> -
> 	mmu_notifier_invalidate_range_end(mm, start, end);
> 	tlb_finish_mmu(&tlb, start, end);
> }

Yes, this was in my “to check when I have time” todo list, especially since
the flush was from start to end, not even vma->vm_start to vma->vm_end.

The revert seems correct.

Reviewed-by: Nadav Amit <namit@vmware.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
  2018-06-12 18:18   ` Linus Torvalds
@ 2018-06-12 22:31     ` Nicholas Piggin
  2018-06-12 22:42       ` Linus Torvalds
  0 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2018-06-12 22:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton

On Tue, 12 Jun 2018 11:18:27 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Jun 12, 2018 at 12:16 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > This brings the number of tlbiel instructions required by a kernel
> > compile from 33M to 25M, most avoided from exec->shift_arg_pages.  
> 
> And this shows that "page_start/end" is purely for powerpc and used
> nowhere else.
> 
> The previous patch should have been to purely powerpc page table
> walking and not touch asm-generic/tlb.h
> 
> I think you should make those changes to
> arch/powerpc/include/asm/tlb.h. If that means you can't use the
> generic header, then so be it.

I can make it ppc specific if nobody else would use it. But at least
mmu notifiers AFAIKS would rather use a precise range.

> Or maybe you can embed the generic case in some ppc-specific
> structures, and use 90% of the generic code just with your added
> wrappers for that radix invalidation on top.

Would you mind another arch specific ifdefs in there?

> 
> But don't make other architectures do pointless work that doesn't
> matter - or make sense - for them.

Okay sure, and this is the reason for the wide cc list. Intel does
need it of course, from 4.10.3.1 of the dev manual:

  — The processor may create a PML4-cache entry even if there are no
    translations for any linear address that might use that entry
    (e.g., because the P flags are 0 in all entries in the referenced
    page-directory-pointer table).

But I'm sure others would not have paging structure caches at all
(some don't even walk the page tables in hardware right?). Maybe
they're all doing their own thing though.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
  2018-06-12 22:31     ` Nicholas Piggin
@ 2018-06-12 22:42       ` Linus Torvalds
  2018-06-12 23:09         ` Nicholas Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2018-06-12 22:42 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton

On Tue, Jun 12, 2018 at 3:31 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Okay sure, and this is the reason for the wide cc list. Intel does
> need it of course, from 4.10.3.1 of the dev manual:
>
>   — The processor may create a PML4-cache entry even if there are no
>     translations for any linear address that might use that entry
>     (e.g., because the P flags are 0 in all entries in the referenced
>     page-directory-pointer table).

But does intel need it?

Because I don't see it. We already do the __tlb_adjust_range(), and we
never tear down the highest-level page tables afaik.

Am I missing something?

               Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
  2018-06-12 22:42       ` Linus Torvalds
@ 2018-06-12 23:09         ` Nicholas Piggin
  2018-06-12 23:26           ` Linus Torvalds
  0 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2018-06-12 23:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton

On Tue, 12 Jun 2018 15:42:34 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Jun 12, 2018 at 3:31 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > Okay sure, and this is the reason for the wide cc list. Intel does
> > need it of course, from 4.10.3.1 of the dev manual:
> >
> >   — The processor may create a PML4-cache entry even if there are no
> >     translations for any linear address that might use that entry
> >     (e.g., because the P flags are 0 in all entries in the referenced
> >     page-directory-pointer table).  
> 
> But does intel need it?
> 
> Because I don't see it. We already do the __tlb_adjust_range(), and we
> never tear down the highest-level page tables afaik.
> 
> Am I missing something?


Sorry I mean Intel needs the existing behaviour of range flush expanded
to cover page table pages.... right? The manual has similar wording for
lower levels of page tables too. So it does need to send an invalidate
*somewhere* that a freed page table page covers, even if no valid pte
was torn down.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
  2018-06-12 23:09         ` Nicholas Piggin
@ 2018-06-12 23:26           ` Linus Torvalds
  2018-06-12 23:39             ` Linus Torvalds
  2018-06-12 23:53             ` Nicholas Piggin
  0 siblings, 2 replies; 19+ messages in thread
From: Linus Torvalds @ 2018-06-12 23:26 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton

On Tue, Jun 12, 2018 at 4:09 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Sorry I mean Intel needs the existing behaviour of range flush expanded
> to cover page table pages.... right?

Right.  Intel depends on the current thing, ie if a page table
*itself* is freed, we will will need to do a flush, but it's the exact
same flush as if there had been a regular page there.

That's already handled by (for example) pud_free_tlb() doing the
__tlb_adjust_range().

Again, I may be missing entirely what you're talking about, because it
feels like we're talking across each other.

My argument is that your new patches in (2-3 in the series - patch #1
looks ok) seem to be fundamentally specific to things that have a
*different* tlb invalidation for the directory entries than for the
leaf entries.

But that's not what at least x86 has, and not what the generic code has done.

I think it might be fine to introduce a few new helpers that end up
being no-ops for the traditional cases.

I just don't think it makes sense to maintain a set of range values
that then aren't actually used in the general case.

              Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
  2018-06-12 23:26           ` Linus Torvalds
@ 2018-06-12 23:39             ` Linus Torvalds
  2018-06-13  0:12               ` Nicholas Piggin
  2018-06-12 23:53             ` Nicholas Piggin
  1 sibling, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2018-06-12 23:39 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton

On Tue, Jun 12, 2018 at 4:26 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Right.  Intel depends on the current thing, ie if a page table
> *itself* is freed, we will will need to do a flush, but it's the exact
> same flush as if there had been a regular page there.
>
> That's already handled by (for example) pud_free_tlb() doing the
> __tlb_adjust_range().

Side note: I guess we _could_ make the "page directory" flush be
special on x86 too.

Right now a page directory flush just counts as a range, and then a
range that is more that a few entries just means "flush everything".

End result: in practice, every time you free a page directory, you
flush the whole TLB because it looks identical to flushing a large
range of pages.

And in _theory_, maybe you could have just used "invalpg" with a
targeted address instead. In fact, I think a single invlpg invalidates
_all_ caches for the associated MM, but don't quote me on that.

That said, I don't think this is a common case. But I think that *if*
you extend this to be aware of the page directory caches, and _if_ you
extend it to cover both ppc and x86, at that point all my "this isn't
generic" arguments go away.

Because once x86 does it, it's "common enough" that it counts as
generic. It may be only a single other architecture, but it's the bulk
of all the development machines, so..

                 Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
  2018-06-12 23:26           ` Linus Torvalds
  2018-06-12 23:39             ` Linus Torvalds
@ 2018-06-12 23:53             ` Nicholas Piggin
  1 sibling, 0 replies; 19+ messages in thread
From: Nicholas Piggin @ 2018-06-12 23:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton

On Tue, 12 Jun 2018 16:26:33 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Jun 12, 2018 at 4:09 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > Sorry I mean Intel needs the existing behaviour of range flush expanded
> > to cover page table pages.... right?  
> 
> Right.  Intel depends on the current thing, ie if a page table
> *itself* is freed, we will will need to do a flush, but it's the exact
> same flush as if there had been a regular page there.
> 
> That's already handled by (for example) pud_free_tlb() doing the
> __tlb_adjust_range().

Agreed.

> 
> Again, I may be missing entirely what you're talking about, because it
> feels like we're talking across each other.
> 
> My argument is that your new patches in (2-3 in the series - patch #1
> looks ok) seem to be fundamentally specific to things that have a
> *different* tlb invalidation for the directory entries than for the
> leaf entries.

Yes I think I confused myself a bit. You're right these patches are
only useful if there is no page structure cache, or if it's managed
separately from TLB invalidation.

> 
> But that's not what at least x86 has, and not what the generic code has done.
> 
> I think it might be fine to introduce a few new helpers that end up
> being no-ops for the traditional cases.
> 
> I just don't think it makes sense to maintain a set of range values
> that then aren't actually used in the general case.

Sure, I'll make it optional. That would probably give a better result
for powerpc too because it doesn't need to maintain two ranges either.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
  2018-06-12 23:39             ` Linus Torvalds
@ 2018-06-13  0:12               ` Nicholas Piggin
  2018-06-13  1:10                 ` Linus Torvalds
  0 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2018-06-13  0:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton

On Tue, 12 Jun 2018 16:39:55 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Jun 12, 2018 at 4:26 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Right.  Intel depends on the current thing, ie if a page table
> > *itself* is freed, we will will need to do a flush, but it's the exact
> > same flush as if there had been a regular page there.
> >
> > That's already handled by (for example) pud_free_tlb() doing the
> > __tlb_adjust_range().  
> 
> Side note: I guess we _could_ make the "page directory" flush be
> special on x86 too.
> 
> Right now a page directory flush just counts as a range, and then a
> range that is more that a few entries just means "flush everything".
> 
> End result: in practice, every time you free a page directory, you
> flush the whole TLB because it looks identical to flushing a large
> range of pages.
> 
> And in _theory_, maybe you could have just used "invalpg" with a
> targeted address instead. In fact, I think a single invlpg invalidates
> _all_ caches for the associated MM, but don't quote me on that.

Yeah I was thinking that, you could treat it separately (similar to
powerpc maybe) despite using the same instructions to invalidate it.

> That said, I don't think this is a common case. But I think that *if*
> you extend this to be aware of the page directory caches, and _if_ you
> extend it to cover both ppc and x86, at that point all my "this isn't
> generic" arguments go away.
> 
> Because once x86 does it, it's "common enough" that it counts as
> generic. It may be only a single other architecture, but it's the bulk
> of all the development machines, so..

I'll do the small step first (basically just this patch as an opt-in
for architectures that don't need page tables in their tlb range). But
after that it would be interesting to see if x86 could do anything
with explicit page table cache management.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
  2018-06-13  0:12               ` Nicholas Piggin
@ 2018-06-13  1:10                 ` Linus Torvalds
  2018-06-14  2:49                   ` Nicholas Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2018-06-13  1:10 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton

On Tue, Jun 12, 2018 at 5:12 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > And in _theory_, maybe you could have just used "invalpg" with a
> > targeted address instead. In fact, I think a single invlpg invalidates
> > _all_ caches for the associated MM, but don't quote me on that.

Confirmed. The SDK says

 "INVLPG also invalidates all entries in all paging-structure caches
  associated with the current PCID, regardless of the linear addresses
  to which they correspond"

so if x86 wants to do this "separate invalidation for page directory
entryes", then it would want to

 (a) remove the __tlb_adjust_range() operation entirely from
pud_free_tlb() and friends

 (b) instead just have a single field for "invalidate_tlb_caches",
which could be a boolean, or could just be one of the addresses

and then the logic would be that IFF no other tlb invalidate is done
due to an actual page range, then we look at that
invalidate_tlb_caches field, and do a single INVLPG instead.

I still am not sure if this would actually make a difference in
practice, but I guess it does mean that x86 could at least participate
in some kind of scheme where we have architecture-specific actions for
those page directory entries.

And we could make the default behavior - if no architecture-specific
tlb page directory invalidation function exists - be the current
"__tlb_adjust_range()" case. So the default would be to not change
behavior, and architectures could opt in to something like this.

            Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
  2018-06-13  1:10                 ` Linus Torvalds
@ 2018-06-14  2:49                   ` Nicholas Piggin
  2018-06-14  6:15                     ` Linus Torvalds
  0 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2018-06-14  2:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton

On Tue, 12 Jun 2018 18:10:26 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Jun 12, 2018 at 5:12 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> > >
> > > And in _theory_, maybe you could have just used "invalpg" with a
> > > targeted address instead. In fact, I think a single invlpg invalidates
> > > _all_ caches for the associated MM, but don't quote me on that.  
> 
> Confirmed. The SDK says
> 
>  "INVLPG also invalidates all entries in all paging-structure caches
>   associated with the current PCID, regardless of the linear addresses
>   to which they correspond"

Interesting, so that's very much like powerpc.

> so if x86 wants to do this "separate invalidation for page directory
> entryes", then it would want to
> 
>  (a) remove the __tlb_adjust_range() operation entirely from
> pud_free_tlb() and friends

Revised patch below (only the generic part this time, but powerpc
implementation gives the same result as the last patch).

> 
>  (b) instead just have a single field for "invalidate_tlb_caches",
> which could be a boolean, or could just be one of the addresses

Yeah well powerpc hijacks one of the existing bools in the mmu_gather
for exactly that, and sets it when a page table page is to be freed.

> and then the logic would be that IFF no other tlb invalidate is done
> due to an actual page range, then we look at that
> invalidate_tlb_caches field, and do a single INVLPG instead.
> 
> I still am not sure if this would actually make a difference in
> practice, but I guess it does mean that x86 could at least participate
> in some kind of scheme where we have architecture-specific actions for
> those page directory entries.

I think it could. But yes I don't know how much it would help, I think
x86 tlb invalidation is very fast, and I noticed this mostly at exec
time when you probably lose all your TLBs anyway.

> 
> And we could make the default behavior - if no architecture-specific
> tlb page directory invalidation function exists - be the current
> "__tlb_adjust_range()" case. So the default would be to not change
> behavior, and architectures could opt in to something like this.
> 
>             Linus

Yep, is this a bit more to your liking?

---
 include/asm-generic/tlb.h | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index faddde44de8c..fa44321bc8dd 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -262,36 +262,49 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
  * architecture to do its own odd thing, not cause pain for others
  * http://lkml.kernel.org/r/CA+55aFzBggoXtNXQeng5d_mRoDnaMBE5Y+URs+PHR67nUpMtaw@mail.gmail.com
  *
+ * Powerpc (Book3S 64-bit) with the radix MMU has an architected "page
+ * walk cache" that is invalidated with a specific instruction. It uses
+ * need_flush_all to issue this instruction, which is set by its own
+ * __p??_free_tlb functions.
+ *
  * For now w.r.t page table cache, mark the range_size as PAGE_SIZE
  */
 
+#ifndef pte_free_tlb
 #define pte_free_tlb(tlb, ptep, address)			\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
 		__pte_free_tlb(tlb, ptep, address);		\
 	} while (0)
+#endif
 
+#ifndef pmd_free_tlb
 #define pmd_free_tlb(tlb, pmdp, address)			\
 	do {							\
-		__tlb_adjust_range(tlb, address, PAGE_SIZE);		\
+		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
 		__pmd_free_tlb(tlb, pmdp, address);		\
 	} while (0)
+#endif
 
 #ifndef __ARCH_HAS_4LEVEL_HACK
+#ifndef pud_free_tlb
 #define pud_free_tlb(tlb, pudp, address)			\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
 		__pud_free_tlb(tlb, pudp, address);		\
 	} while (0)
 #endif
+#endif
 
 #ifndef __ARCH_HAS_5LEVEL_HACK
+#ifndef p4d_free_tlb
 #define p4d_free_tlb(tlb, pudp, address)			\
 	do {							\
-		__tlb_adjust_range(tlb, address, PAGE_SIZE);		\
+		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
 		__p4d_free_tlb(tlb, pudp, address);		\
 	} while (0)
 #endif
+#endif
 
 #define tlb_migrate_finish(mm) do {} while (0)
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
  2018-06-14  2:49                   ` Nicholas Piggin
@ 2018-06-14  6:15                     ` Linus Torvalds
  2018-06-14  6:51                       ` Nicholas Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2018-06-14  6:15 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton

On Thu, Jun 14, 2018 at 11:49 AM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> +#ifndef pte_free_tlb
>  #define pte_free_tlb(tlb, ptep, address)                       \
>         do {                                                    \
>                 __tlb_adjust_range(tlb, address, PAGE_SIZE);    \
>                 __pte_free_tlb(tlb, ptep, address);             \
>         } while (0)
> +#endif

Do you really want to / need to take over the whole pte_free_tlb macro?

I was hoping that you'd just replace the __tlv_adjust_range() instead.

Something like

 - replace the

        __tlb_adjust_range(tlb, address, PAGE_SIZE);

   with a "page directory" version:

        __tlb_free_directory(tlb, address, size);

 - have the default implementation for that be the old code:

        #ifndef __tlb_free_directory
          #define __tlb_free_directory(tlb,addr,size)
__tlb_adjust_range(tlb, addr, PAGE_SIZE)
        #endif

and that way architectures can now just hook into that
"__tlb_free_directory()" thing.

Hmm?

             Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
  2018-06-14  6:15                     ` Linus Torvalds
@ 2018-06-14  6:51                       ` Nicholas Piggin
  0 siblings, 0 replies; 19+ messages in thread
From: Nicholas Piggin @ 2018-06-14  6:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton

On Thu, 14 Jun 2018 15:15:47 +0900
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, Jun 14, 2018 at 11:49 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > +#ifndef pte_free_tlb
> >  #define pte_free_tlb(tlb, ptep, address)                       \
> >         do {                                                    \
> >                 __tlb_adjust_range(tlb, address, PAGE_SIZE);    \
> >                 __pte_free_tlb(tlb, ptep, address);             \
> >         } while (0)
> > +#endif  
> 
> Do you really want to / need to take over the whole pte_free_tlb macro?
> 
> I was hoping that you'd just replace the __tlv_adjust_range() instead.
> 
> Something like
> 
>  - replace the
> 
>         __tlb_adjust_range(tlb, address, PAGE_SIZE);
> 
>    with a "page directory" version:
> 
>         __tlb_free_directory(tlb, address, size);
> 
>  - have the default implementation for that be the old code:
> 
>         #ifndef __tlb_free_directory
>           #define __tlb_free_directory(tlb,addr,size)
> __tlb_adjust_range(tlb, addr, PAGE_SIZE)
>         #endif
> 
> and that way architectures can now just hook into that
> "__tlb_free_directory()" thing.
> 
> Hmm?

Isn't it just easier and less indirection for the arch to just take
over the pte_free_tlb instead? 

I don't see what the __tlb_free_directory gets you except having to
follow another macro -- if the arch has something special they want
to do there, just do it in their __pte_free_tlb and call it
pte_free_tlb instead.

Thanks,
Nick

> 
>              Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-06-14  6:51 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-12  7:16 [RFC PATCH 0/3] couple of TLB flush optimisations Nicholas Piggin
2018-06-12  7:16 ` [RFC PATCH 1/3] Revert "mm: always flush VMA ranges affected by zap_page_range" Nicholas Piggin
2018-06-12 13:53   ` Aneesh Kumar K.V
2018-06-12 18:52   ` Nadav Amit
2018-06-12  7:16 ` [RFC PATCH 2/3] mm: mmu_gather track of invalidated TLB ranges explicitly for more precise flushing Nicholas Piggin
2018-06-12 18:14   ` Linus Torvalds
2018-06-12  7:16 ` [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather Nicholas Piggin
2018-06-12 18:18   ` Linus Torvalds
2018-06-12 22:31     ` Nicholas Piggin
2018-06-12 22:42       ` Linus Torvalds
2018-06-12 23:09         ` Nicholas Piggin
2018-06-12 23:26           ` Linus Torvalds
2018-06-12 23:39             ` Linus Torvalds
2018-06-13  0:12               ` Nicholas Piggin
2018-06-13  1:10                 ` Linus Torvalds
2018-06-14  2:49                   ` Nicholas Piggin
2018-06-14  6:15                     ` Linus Torvalds
2018-06-14  6:51                       ` Nicholas Piggin
2018-06-12 23:53             ` Nicholas Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).