linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] powerpc/book3s64/radix: Upgrade va tlbie to PID tlbie if we cross PMD_SIZE
@ 2021-08-03 14:37 Aneesh Kumar K.V
  2021-08-04  5:14 ` Nicholas Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: Aneesh Kumar K.V @ 2021-08-03 14:37 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, npiggin

With shared mapping, even though we are unmapping a large range, the kernel
will force a TLB flush with ptl lock held to avoid the race mentioned in
commit 1cf35d47712d ("mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts")
This results in the kernel issuing a high number of TLB flushes even for a large
range. This can be improved by making sure the kernel switch to pid based flush if the
kernel is unmapping a 2M range.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/book3s64/radix_tlb.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index aefc100d79a7..21d0f098e43b 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -1106,7 +1106,7 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
  * invalidating a full PID, so it has a far lower threshold to change from
  * individual page flushes to full-pid flushes.
  */
-static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
+static unsigned long tlb_single_page_flush_ceiling __read_mostly = 32;
 static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
 
 static inline void __radix__flush_tlb_range(struct mm_struct *mm,
@@ -1133,7 +1133,7 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm,
 	if (fullmm)
 		flush_pid = true;
 	else if (type == FLUSH_TYPE_GLOBAL)
-		flush_pid = nr_pages > tlb_single_page_flush_ceiling;
+		flush_pid = nr_pages >= tlb_single_page_flush_ceiling;
 	else
 		flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;
 	/*
@@ -1335,7 +1335,7 @@ static void __radix__flush_tlb_range_psize(struct mm_struct *mm,
 	if (fullmm)
 		flush_pid = true;
 	else if (type == FLUSH_TYPE_GLOBAL)
-		flush_pid = nr_pages > tlb_single_page_flush_ceiling;
+		flush_pid = nr_pages >= tlb_single_page_flush_ceiling;
 	else
 		flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;
 
@@ -1505,7 +1505,7 @@ void do_h_rpt_invalidate_prt(unsigned long pid, unsigned long lpid,
 			continue;
 
 		nr_pages = (end - start) >> def->shift;
-		flush_pid = nr_pages > tlb_single_page_flush_ceiling;
+		flush_pid = nr_pages >= tlb_single_page_flush_ceiling;
 
 		/*
 		 * If the number of pages spanning the range is above
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread
* Re: [RFC PATCH] powerpc/book3s64/radix: Upgrade va tlbie to PID tlbie if we cross PMD_SIZE
@ 2021-08-06  5:22 Puvichakravarthy Ramachandran
  0 siblings, 0 replies; 10+ messages in thread
From: Puvichakravarthy Ramachandran @ 2021-08-06  5:22 UTC (permalink / raw)
  To: aneesh.kumar; +Cc: linuxppc-dev, npiggin

> With shared mapping, even though we are unmapping a large range, the 
kernel
> will force a TLB flush with ptl lock held to avoid the race mentioned in
> commit 1cf35d47712d ("mm: split 'tlb_flush_mmu()' into tlb flushing and 
memory freeing parts")
> This results in the kernel issuing a high number of TLB flushes even for 
a large
> range. This can be improved by making sure the kernel switch to pid 
based flush if the
> kernel is unmapping a 2M range.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/mm/book3s64/radix_tlb.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
b/arch/powerpc/mm/book3s64/radix_tlb.c
> index aefc100d79a7..21d0f098e43b 100644
> --- a/arch/powerpc/mm/book3s64/radix_tlb.c
> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
> @@ -1106,7 +1106,7 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
>   * invalidating a full PID, so it has a far lower threshold to change 
from
>   * individual page flushes to full-pid flushes.
>   */
> -static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
> +static unsigned long tlb_single_page_flush_ceiling __read_mostly = 32;
>  static unsigned long tlb_local_single_page_flush_ceiling __read_mostly 
= POWER9_TLB_SETS_RADIX * 2;
> 
>  static inline void __radix__flush_tlb_range(struct mm_struct *mm,
> @@ -1133,7 +1133,7 @@ static inline void __radix__flush_tlb_range(struct 
mm_struct *mm,
>       if (fullmm)
>               flush_pid = true;
>       else if (type == FLUSH_TYPE_GLOBAL)
> -             flush_pid = nr_pages > tlb_single_page_flush_ceiling;
> +             flush_pid = nr_pages >= tlb_single_page_flush_ceiling;
>       else
>               flush_pid = nr_pages > 
tlb_local_single_page_flush_ceiling;

I evaluated the patches from Aneesh with a micro benchmark which does 
shmat, shmdt of 256 MB segment.
Higher the rate of work, better the performance. With a value of 32, we 
match the performance of 
GTSE=off. This was evaluated on SLES15 SP3 kernel.


# cat /sys/kernel/debug/powerpc/tlb_single_page_flush_ceiling 
32

# perf stat -I 1000 -a -e powerpc:tlbie,r30058 ./tlbie -i 5 -c 1 t 1
 Rate of work: = 311 
#           time             counts unit events
     1.013131404              50939      powerpc:tlbie   
     1.013131404              50658      r30058  
 Rate of work: = 318 
     2.026957019              51520      powerpc:tlbie   
     2.026957019              51481      r30058  
 Rate of work: = 318 
     3.038884431              51485      powerpc:tlbie   
     3.038884431              51461      r30058  
 Rate of work: = 318 
     4.051483926              51485      powerpc:tlbie   
     4.051483926              51520      r30058  
 Rate of work: = 318 
     5.063635713              48577      powerpc:tlbie   
     5.063635713              48347      r30058  
 
# echo 34 > /sys/kernel/debug/powerpc/tlb_single_page_flush_ceiling 

# perf stat -I 1000 -a -e powerpc:tlbie,r30058 ./tlbie -i 5 -c 1 t 1
 Rate of work: = 174 
#           time             counts unit events
     1.012672696             721471      powerpc:tlbie   
     1.012672696             726491      r30058  
 Rate of work: = 177 
     2.026348661             737460      powerpc:tlbie   
     2.026348661             736138      r30058  
 Rate of work: = 178 
     3.037932122             737460      powerpc:tlbie   
     3.037932122             737460      r30058  
 Rate of work: = 178 
     4.050198819             737044      powerpc:tlbie   
     4.050198819             737460      r30058  
 Rate of work: = 177 
     5.062400776             692832      powerpc:tlbie   
     5.062400776             688319      r30058          


Regards,
Puvichakravarthy Ramachandran




^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [RFC PATCH] powerpc/book3s64/radix: Upgrade va tlbie to PID tlbie if we cross PMD_SIZE
@ 2021-08-06  7:56 Puvichakravarthy Ramachandran
  2021-08-12 12:49 ` Michael Ellerman
  0 siblings, 1 reply; 10+ messages in thread
From: Puvichakravarthy Ramachandran @ 2021-08-06  7:56 UTC (permalink / raw)
  To: aneesh.kumar; +Cc: linuxppc-dev, npiggin

> With shared mapping, even though we are unmapping a large range, the 
kernel
> will force a TLB flush with ptl lock held to avoid the race mentioned in
> commit 1cf35d47712d ("mm: split 'tlb_flush_mmu()' into tlb flushing and 
memory freeing parts")
> This results in the kernel issuing a high number of TLB flushes even for 
a large
> range. This can be improved by making sure the kernel switch to pid 
based flush if the
> kernel is unmapping a 2M range.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/mm/book3s64/radix_tlb.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
b/arch/powerpc/mm/book3s64/radix_tlb.c
> index aefc100d79a7..21d0f098e43b 100644
> --- a/arch/powerpc/mm/book3s64/radix_tlb.c
> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
> @@ -1106,7 +1106,7 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
>   * invalidating a full PID, so it has a far lower threshold to change 
from
>   * individual page flushes to full-pid flushes.
>   */
> -static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
> +static unsigned long tlb_single_page_flush_ceiling __read_mostly = 32;
>  static unsigned long tlb_local_single_page_flush_ceiling __read_mostly 
= POWER9_TLB_SETS_RADIX * 2;
> 
>  static inline void __radix__flush_tlb_range(struct mm_struct *mm,
> @@ -1133,7 +1133,7 @@ static inline void __radix__flush_tlb_range(struct 
mm_struct *mm,
>       if (fullmm)
>               flush_pid = true;
>       else if (type == FLUSH_TYPE_GLOBAL)
> -             flush_pid = nr_pages > tlb_single_page_flush_ceiling;
> +             flush_pid = nr_pages >= tlb_single_page_flush_ceiling;
>       else
>               flush_pid = nr_pages > 
tlb_local_single_page_flush_ceiling;

Additional details on the test environment. This was tested on a 2 Node/8 
socket Power10 system.
The LPAR had 105 cores and the LPAR spanned across all the sockets. 

# perf stat -I 1000 -a -e cycles,instructions -e 
"{cpu/config=0x030008,name=PM_EXEC_STALL/}" -e 
"{cpu/config=0x02E01C,name=PM_EXEC_STALL_TLBIE/}" ./tlbie -i 10 -c 1  -t 1
 Rate of work: = 176 
#           time             counts unit events
     1.029206442         4198594519      cycles  
     1.029206442         2458254252      instructions              # 0.59 
insn per cycle 
     1.029206442         3004031488      PM_EXEC_STALL   
     1.029206442         1798186036      PM_EXEC_STALL_TLBIE   
 Rate of work: = 181 
     2.054288539         4183883450      cycles  
     2.054288539         2472178171      instructions              # 0.59 
insn per cycle 
     2.054288539         3014609313      PM_EXEC_STALL   
     2.054288539         1797851642      PM_EXEC_STALL_TLBIE   
 Rate of work: = 180 
     3.078306883         4171250717      cycles  
     3.078306883         2468341094      instructions              # 0.59 
insn per cycle 
     3.078306883         2993036205      PM_EXEC_STALL   
     3.078306883         1798181890      PM_EXEC_STALL_TLBIE   
.
. 

# cat /sys/kernel/debug/powerpc/tlb_single_page_flush_ceiling
34

# echo 32 > /sys/kernel/debug/powerpc/tlb_single_page_flush_ceiling

# perf stat -I 1000 -a -e cycles,instructions -e 
"{cpu/config=0x030008,name=PM_EXEC_STALL/}" -e 
"{cpu/config=0x02E01C,name=PM_EXEC_STALL_TLBIE/}" ./tlbie -i 10 -c 1  -t 1
 Rate of work: = 313 
#           time             counts unit events
     1.030310506         4206071143      cycles  
     1.030310506         4314716958      instructions              # 1.03 
insn per cycle 
     1.030310506         2157762167      PM_EXEC_STALL   
     1.030310506          110825573      PM_EXEC_STALL_TLBIE   
 Rate of work: = 322 
     2.056034068         4331745630      cycles  
     2.056034068         4531658304      instructions              # 1.05 
insn per cycle 
     2.056034068         2288971361      PM_EXEC_STALL   
     2.056034068          111267927      PM_EXEC_STALL_TLBIE   
 Rate of work: = 321 
     3.081216434         4327050349      cycles  
     3.081216434         4379679508      instructions              # 1.01 
insn per cycle 
     3.081216434         2252602550      PM_EXEC_STALL   
     3.081216434          110974887      PM_EXEC_STALL_TLBIE   
.
.
 

Regards,
Puvichakravarthy Ramachandran





^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-08-16  7:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-08-03 14:37 [RFC PATCH] powerpc/book3s64/radix: Upgrade va tlbie to PID tlbie if we cross PMD_SIZE Aneesh Kumar K.V
2021-08-04  5:14 ` Nicholas Piggin
2021-08-04  6:39   ` Nicholas Piggin
2021-08-04  7:34     ` Peter Zijlstra
2021-08-04  6:59   ` Michael Ellerman
  -- strict thread matches above, loose matches on Subject: below --
2021-08-06  5:22 Puvichakravarthy Ramachandran
2021-08-06  7:56 Puvichakravarthy Ramachandran
2021-08-12 12:49 ` Michael Ellerman
2021-08-12 13:20   ` Aneesh Kumar K.V
2021-08-16  7:03     ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).