linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/64s: optimise syscall entry for virtual, relocatable case
@ 2016-09-15  9:03 Nicholas Piggin
  2016-09-20  4:00 ` Balbir Singh
  2016-09-20 13:07 ` Michael Ellerman
  0 siblings, 2 replies; 3+ messages in thread
From: Nicholas Piggin @ 2016-09-15  9:03 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: Nicholas Piggin, Michael Neuling

The mflr r10 instruction was left over saving of lr when the code used
lr to branch to system_call_entry from the exception handler. That was
changed by 6a404806d to use the count register. The value is never used
now, so mflr can be removed, and r10 can be used for storage rather than
spilling to the SPR scratch register.

The scratch register spill causes a long pipeline stall due to the SPR
read after write. This change brings getppid syscall cost from 406 to
376 cycles on POWER8. getppid for non-relocatable case is 371 cycles.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---

 arch/powerpc/kernel/exceptions-64s.S | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index df6d45e..2cdd64f 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -63,15 +63,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)				\
 	 * is volatile across system calls.
 	 */
 #define SYSCALL_PSERIES_2_DIRECT				\
-	mflr	r10 ;						\
 	ld	r12,PACAKBASE(r13) ; 				\
 	LOAD_HANDLER(r12, system_call_entry) ;			\
 	mtctr	r12 ;						\
 	mfspr	r12,SPRN_SRR1 ;					\
-	/* Re-use of r13... No spare regs to do this */	\
-	li	r13,MSR_RI ;					\
-	mtmsrd 	r13,1 ;						\
-	GET_PACA(r13) ;	/* get r13 back */			\
+	li	r10,MSR_RI ;					\
+	mtmsrd 	r10,1 ;						\
 	bctr ;
 #else
 	/* We can branch directly */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] powerpc/64s: optimise syscall entry for virtual, relocatable case
  2016-09-15  9:03 [PATCH] powerpc/64s: optimise syscall entry for virtual, relocatable case Nicholas Piggin
@ 2016-09-20  4:00 ` Balbir Singh
  2016-09-20 13:07 ` Michael Ellerman
  1 sibling, 0 replies; 3+ messages in thread
From: Balbir Singh @ 2016-09-20  4:00 UTC (permalink / raw)
  To: Nicholas Piggin, Michael Ellerman, linuxppc-dev; +Cc: Michael Neuling



On 15/09/16 19:03, Nicholas Piggin wrote:
> The mflr r10 instruction was left over saving of lr when the code used
> lr to branch to system_call_entry from the exception handler. That was
> changed by 6a404806d to use the count register. The value is never used
> now, so mflr can be removed, and r10 can be used for storage rather than
> spilling to the SPR scratch register.
> 
> The scratch register spill causes a long pipeline stall due to the SPR
> read after write. This change brings getppid syscall cost from 406 to
> 376 cycles on POWER8. getppid for non-relocatable case is 371 cycles.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> 
>  arch/powerpc/kernel/exceptions-64s.S | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index df6d45e..2cdd64f 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -63,15 +63,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)				\
>  	 * is volatile across system calls.
>  	 */
>  #define SYSCALL_PSERIES_2_DIRECT				\
> -	mflr	r10 ;						\
>  	ld	r12,PACAKBASE(r13) ; 				\
>  	LOAD_HANDLER(r12, system_call_entry) ;			\
>  	mtctr	r12 ;						\
>  	mfspr	r12,SPRN_SRR1 ;					\
> -	/* Re-use of r13... No spare regs to do this */	\
> -	li	r13,MSR_RI ;					\
> -	mtmsrd 	r13,1 ;						\
> -	GET_PACA(r13) ;	/* get r13 back */			\
> +	li	r10,MSR_RI ;					\
> +	mtmsrd 	r10,1 ;						\
>  	bctr ;
>  #else
>  	/* We can branch directly */
> 

The patch makes sense

Acked-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: powerpc/64s: optimise syscall entry for virtual, relocatable case
  2016-09-15  9:03 [PATCH] powerpc/64s: optimise syscall entry for virtual, relocatable case Nicholas Piggin
  2016-09-20  4:00 ` Balbir Singh
@ 2016-09-20 13:07 ` Michael Ellerman
  1 sibling, 0 replies; 3+ messages in thread
From: Michael Ellerman @ 2016-09-20 13:07 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Michael Neuling, Nicholas Piggin

On Thu, 2016-15-09 at 09:03:21 UTC, Nicholas Piggin wrote:
> The mflr r10 instruction was left over saving of lr when the code used
> lr to branch to system_call_entry from the exception handler. That was
> changed by 6a404806d to use the count register. The value is never used
> now, so mflr can be removed, and r10 can be used for storage rather than
> spilling to the SPR scratch register.
> 
> The scratch register spill causes a long pipeline stall due to the SPR
> read after write. This change brings getppid syscall cost from 406 to
> 376 cycles on POWER8. getppid for non-relocatable case is 371 cycles.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> Acked-by: Balbir Singh <bsingharora@gmail.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/18e3f56b1cacb96017e2a66844

cheers

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-09-20 13:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-15  9:03 [PATCH] powerpc/64s: optimise syscall entry for virtual, relocatable case Nicholas Piggin
2016-09-20  4:00 ` Balbir Singh
2016-09-20 13:07 ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).