linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()
@ 2015-09-23  6:05 Michael Neuling
  2015-09-23 18:21 ` Denis Kirjanov
  2015-09-23 22:23 ` Michael Ellerman
  0 siblings, 2 replies; 8+ messages in thread
From: Michael Neuling @ 2015-09-23  6:05 UTC (permalink / raw)
  To: Michael Ellerman, benh
  Cc: linuxppc-dev, Anton Blanchard, Aaron Sawdey, Steve Munroe

The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to
determine the loaded address of the VDSO. The current version of these
attempt to use the special bcl variant which avoids pushing to the
link stack.

Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the
current code results in link stack corruption and the resulting
performance degradation (due to branch mis-prediction).

This patch moves us to bcl+4 by moving __kernel_datapage_offset
out of __get_datapage().

With this patch, running the below benchmark we get a bump in
performance on POWER8 for gettimeofday() (which uses
__get_datapage()).

64bit gets ~4% improvement:
  Without patch:
    # ./tb
    time =3D 0.180321
  With patch:
    # ./tb
    time =3D 0.187408

32bit gets ~9% improvement:
  Without patch:
    # ./tb
    time =3D 0.276551
  With patch:
    # ./tb
    time =3D 0.252767

Testcase tb.c (stolen from Anton)
  /* gcc -O2 tb.c -o tb */
  #include <sys/time.h>
  #include <stdio.h>

  int main()
  {
	  int i;

	  struct timeval tv_start, tv_end;

	  gettimeofday(&tv_start, NULL);

	  for(i =3D 0; i < 10000000; i++) {
		  gettimeofday(&tv_end, NULL);
	  }

	  printf("time =3D %.6f\n", tv_end.tv_sec - tv_start.tv_sec + (tv_end.tv_u=
sec - tv_start.tv_usec) * 1e-6);

	  return 0;
  }

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reported-by: Aaron Sawdey <sawdey@us.ibm.com>

diff --git a/arch/powerpc/kernel/vdso32/datapage.S b/arch/powerpc/kernel/vd=
so32/datapage.S
index dc21e89..59cf5f4 100644
--- a/arch/powerpc/kernel/vdso32/datapage.S
+++ b/arch/powerpc/kernel/vdso32/datapage.S
@@ -16,6 +16,10 @@
 #include <asm/vdso.h>
=20
 	.text
+	.global	__kernel_datapage_offset;
+__kernel_datapage_offset:
+	.long	0
+
 V_FUNCTION_BEGIN(__get_datapage)
   .cfi_startproc
 	/* We don't want that exposed or overridable as we want other objects
@@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
 	mflr	r0
   .cfi_register lr,r0
=20
-	bcl	20,31,1f
-	.global	__kernel_datapage_offset;
-__kernel_datapage_offset:
-	.long	0
-1:
+	bcl	20,31,data_page_branch
+data_page_branch:
 	mflr	r3
 	mtlr	r0
+	addi	r3, r3, __kernel_datapage_offset-data_page_branch
 	lwz	r0,0(r3)
 	add	r3,r0,r3
 	blr
diff --git a/arch/powerpc/kernel/vdso64/datapage.S b/arch/powerpc/kernel/vd=
so64/datapage.S
index 79796de..2f01c4a 100644
--- a/arch/powerpc/kernel/vdso64/datapage.S
+++ b/arch/powerpc/kernel/vdso64/datapage.S
@@ -16,6 +16,10 @@
 #include <asm/vdso.h>
=20
 	.text
+.global	__kernel_datapage_offset;
+__kernel_datapage_offset:
+	.long	0
+
 V_FUNCTION_BEGIN(__get_datapage)
   .cfi_startproc
 	/* We don't want that exposed or overridable as we want other objects
@@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
 	mflr	r0
   .cfi_register lr,r0
=20
-	bcl	20,31,1f
-	.global	__kernel_datapage_offset;
-__kernel_datapage_offset:
-	.long	0
-1:
+	bcl	20,31,data_page_branch
+data_page_branch:
 	mflr	r3
 	mtlr	r0
+	addi	r3, r3, __kernel_datapage_offset-data_page_branch
 	lwz	r0,0(r3)
 	add	r3,r0,r3
 	blr

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()
  2015-09-23  6:05 [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() Michael Neuling
@ 2015-09-23 18:21 ` Denis Kirjanov
  2015-09-23 18:35   ` Aaron Sawdey
                     ` (2 more replies)
  2015-09-23 22:23 ` Michael Ellerman
  1 sibling, 3 replies; 8+ messages in thread
From: Denis Kirjanov @ 2015-09-23 18:21 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Michael Ellerman, benh, Aaron Sawdey, linuxppc-dev,
	Anton Blanchard, Steve Munroe

On 9/23/15, Michael Neuling <mikey@neuling.org> wrote:
> The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to
> determine the loaded address of the VDSO. The current version of these
> attempt to use the special bcl variant which avoids pushing to the
> link stack.
>
> Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the
> current code results in link stack corruption and the resulting
> performance degradation (due to branch mis-prediction).
>
> This patch moves us to bcl+4 by moving __kernel_datapage_offset
> out of __get_datapage().
>
> With this patch, running the below benchmark we get a bump in
> performance on POWER8 for gettimeofday() (which uses
> __get_datapage()).
>
> 64bit gets ~4% improvement:
>   Without patch:
>     # ./tb
>     time = 0.180321
>   With patch:
>     # ./tb
>     time = 0.187408
>
> 32bit gets ~9% improvement:
>   Without patch:
>     # ./tb
>     time = 0.276551
>   With patch:
>     # ./tb
>     time = 0.252767

I've got the following results on POWER7 64bit
without the patch:
# ./tb
time = 0.263337
# ./tb
time = 0.251273
# ./tb
time = 0.258453
# ./tb
time = 0.260189

with the patch:
# ./tb
time = 0.241517
# ./tb
time = 0.241973
# ./tb
time = 0.239365
# ./tb
time = 0.240109



>
> Testcase tb.c (stolen from Anton)
>   /* gcc -O2 tb.c -o tb */
>   #include <sys/time.h>
>   #include <stdio.h>
>
>   int main()
>   {
> 	  int i;
>
> 	  struct timeval tv_start, tv_end;
>
> 	  gettimeofday(&tv_start, NULL);
>
> 	  for(i = 0; i < 10000000; i++) {
> 		  gettimeofday(&tv_end, NULL);
> 	  }
>
> 	  printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec + (tv_end.tv_usec
> - tv_start.tv_usec) * 1e-6);
>
> 	  return 0;
>   }
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Reported-by: Aaron Sawdey <sawdey@us.ibm.com>
>
> diff --git a/arch/powerpc/kernel/vdso32/datapage.S
> b/arch/powerpc/kernel/vdso32/datapage.S
> index dc21e89..59cf5f4 100644
> --- a/arch/powerpc/kernel/vdso32/datapage.S
> +++ b/arch/powerpc/kernel/vdso32/datapage.S
> @@ -16,6 +16,10 @@
>  #include <asm/vdso.h>
>
>  	.text
> +	.global	__kernel_datapage_offset;
> +__kernel_datapage_offset:
> +	.long	0
> +
>  V_FUNCTION_BEGIN(__get_datapage)
>    .cfi_startproc
>  	/* We don't want that exposed or overridable as we want other objects
> @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
>  	mflr	r0
>    .cfi_register lr,r0
>
> -	bcl	20,31,1f
> -	.global	__kernel_datapage_offset;
> -__kernel_datapage_offset:
> -	.long	0
> -1:
> +	bcl	20,31,data_page_branch
> +data_page_branch:
>  	mflr	r3
>  	mtlr	r0
> +	addi	r3, r3, __kernel_datapage_offset-data_page_branch
>  	lwz	r0,0(r3)
>  	add	r3,r0,r3
>  	blr
> diff --git a/arch/powerpc/kernel/vdso64/datapage.S
> b/arch/powerpc/kernel/vdso64/datapage.S
> index 79796de..2f01c4a 100644
> --- a/arch/powerpc/kernel/vdso64/datapage.S
> +++ b/arch/powerpc/kernel/vdso64/datapage.S
> @@ -16,6 +16,10 @@
>  #include <asm/vdso.h>
>
>  	.text
> +.global	__kernel_datapage_offset;
> +__kernel_datapage_offset:
> +	.long	0
> +
>  V_FUNCTION_BEGIN(__get_datapage)
>    .cfi_startproc
>  	/* We don't want that exposed or overridable as we want other objects
> @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
>  	mflr	r0
>    .cfi_register lr,r0
>
> -	bcl	20,31,1f
> -	.global	__kernel_datapage_offset;
> -__kernel_datapage_offset:
> -	.long	0
> -1:
> +	bcl	20,31,data_page_branch
> +data_page_branch:
>  	mflr	r3
>  	mtlr	r0
> +	addi	r3, r3, __kernel_datapage_offset-data_page_branch
>  	lwz	r0,0(r3)
>  	add	r3,r0,r3
>  	blr
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()
  2015-09-23 18:21 ` Denis Kirjanov
@ 2015-09-23 18:35   ` Aaron Sawdey
       [not found]   ` <201509231835.t8NIZr0Q029029@d01av04.pok.ibm.com>
  2015-09-23 23:33   ` Michael Neuling
  2 siblings, 0 replies; 8+ messages in thread
From: Aaron Sawdey @ 2015-09-23 18:35 UTC (permalink / raw)
  To: Denis Kirjanov
  Cc: Anton Blanchard, benh, linuxppc-dev, Michael Ellerman,
	Michael Neuling, Steve Munroe


[-- Attachment #1.1: Type: text/plain, Size: 4835 bytes --]


One thing that occurred to me was that since the __kernel_datapage_offset
is located immediately before the function, the offset is small and you
could get rid of the addi and just fold it into the lwz.


Aaron Sawdey, Ph.D.  sawdey@us.ibm.com
050-2/C113  (507) 253-7520 TL: 553-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



From:	Denis Kirjanov <kda@linux-powerpc.org>
To:	Michael Neuling <mikey@neuling.org>
Cc:	Michael Ellerman <michael@ellerman.id.au>, benh
            <benh@kernel.crashing.org>, Aaron Sawdey/Rochester/IBM@IBMUS,
            linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, Anton Blanchard
            <anton@samba.org>, Steve Munroe/Rochester/IBM@IBMUS
Date:	09/23/2015 01:22 PM
Subject:	Re: [PATCH] powerpc/vdso: Avoid link stack corruption in
            __get_datapage()



On 9/23/15, Michael Neuling <mikey@neuling.org> wrote:
> The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to
> determine the loaded address of the VDSO. The current version of these
> attempt to use the special bcl variant which avoids pushing to the
> link stack.
>
> Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the
> current code results in link stack corruption and the resulting
> performance degradation (due to branch mis-prediction).
>
> This patch moves us to bcl+4 by moving __kernel_datapage_offset
> out of __get_datapage().
>
> With this patch, running the below benchmark we get a bump in
> performance on POWER8 for gettimeofday() (which uses
> __get_datapage()).
>
> 64bit gets ~4% improvement:
>   Without patch:
>     # ./tb
>     time = 0.180321
>   With patch:
>     # ./tb
>     time = 0.187408
>
> 32bit gets ~9% improvement:
>   Without patch:
>     # ./tb
>     time = 0.276551
>   With patch:
>     # ./tb
>     time = 0.252767

I've got the following results on POWER7 64bit
without the patch:
# ./tb
time = 0.263337
# ./tb
time = 0.251273
# ./tb
time = 0.258453
# ./tb
time = 0.260189

with the patch:
# ./tb
time = 0.241517
# ./tb
time = 0.241973
# ./tb
time = 0.239365
# ./tb
time = 0.240109



>
> Testcase tb.c (stolen from Anton)
>   /* gcc -O2 tb.c -o tb */
>   #include <sys/time.h>
>   #include <stdio.h>
>
>   int main()
>   {
> 		   int i;
>
> 		   struct timeval tv_start, tv_end;
>
> 		   gettimeofday(&tv_start, NULL);
>
> 		   for(i = 0; i < 10000000; i++) {
> 		 		   gettimeofday(&tv_end, NULL);
> 		   }
>
> 		   printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec +
(tv_end.tv_usec
> - tv_start.tv_usec) * 1e-6);
>
> 		   return 0;
>   }
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Reported-by: Aaron Sawdey <sawdey@us.ibm.com>
>
> diff --git a/arch/powerpc/kernel/vdso32/datapage.S
> b/arch/powerpc/kernel/vdso32/datapage.S
> index dc21e89..59cf5f4 100644
> --- a/arch/powerpc/kernel/vdso32/datapage.S
> +++ b/arch/powerpc/kernel/vdso32/datapage.S
> @@ -16,6 +16,10 @@
>  #include <asm/vdso.h>
>
>  		 .text
> +		 .global		 __kernel_datapage_offset;
> +__kernel_datapage_offset:
> +		 .long		 0
> +
>  V_FUNCTION_BEGIN(__get_datapage)
>    .cfi_startproc
>  		 /* We don't want that exposed or overridable as we want other
objects
> @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
>  		 mflr		 r0
>    .cfi_register lr,r0
>
> -		 bcl		 20,31,1f
> -		 .global		 __kernel_datapage_offset;
> -__kernel_datapage_offset:
> -		 .long		 0
> -1:
> +		 bcl		 20,31,data_page_branch
> +data_page_branch:
>  		 mflr		 r3
>  		 mtlr		 r0
> +		 addi		 r3, r3, __kernel_datapage_offset-data_page_branch
>  		 lwz		 r0,0(r3)
>  		 add		 r3,r0,r3
>  		 blr
> diff --git a/arch/powerpc/kernel/vdso64/datapage.S
> b/arch/powerpc/kernel/vdso64/datapage.S
> index 79796de..2f01c4a 100644
> --- a/arch/powerpc/kernel/vdso64/datapage.S
> +++ b/arch/powerpc/kernel/vdso64/datapage.S
> @@ -16,6 +16,10 @@
>  #include <asm/vdso.h>
>
>  		 .text
> +.global		 __kernel_datapage_offset;
> +__kernel_datapage_offset:
> +		 .long		 0
> +
>  V_FUNCTION_BEGIN(__get_datapage)
>    .cfi_startproc
>  		 /* We don't want that exposed or overridable as we want other
objects
> @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
>  		 mflr		 r0
>    .cfi_register lr,r0
>
> -		 bcl		 20,31,1f
> -		 .global		 __kernel_datapage_offset;
> -__kernel_datapage_offset:
> -		 .long		 0
> -1:
> +		 bcl		 20,31,data_page_branch
> +data_page_branch:
>  		 mflr		 r3
>  		 mtlr		 r0
> +		 addi		 r3, r3, __kernel_datapage_offset-data_page_branch
>  		 lwz		 r0,0(r3)
>  		 add		 r3,r0,r3
>  		 blr
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev




[-- Attachment #1.2: Type: text/html, Size: 7752 bytes --]

[-- Attachment #2: graycol.gif --]
[-- Type: image/gif, Size: 105 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()
  2015-09-23  6:05 [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() Michael Neuling
  2015-09-23 18:21 ` Denis Kirjanov
@ 2015-09-23 22:23 ` Michael Ellerman
  2015-09-24 10:10   ` Denis Kirjanov
  1 sibling, 1 reply; 8+ messages in thread
From: Michael Ellerman @ 2015-09-23 22:23 UTC (permalink / raw)
  To: Michael Neuling, benh
  Cc: linuxppc-dev, Anton Blanchard, Aaron Sawdey, Steve Munroe



On 23 September 2015 16:05:02 GMT+10:00, Michael Neuling <mikey@neuling.org> wrote:
>The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to
>determine the loaded address of the VDSO. The current version of these
>attempt to use the special bcl variant which avoids pushing to the
>link stack.
>
>Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the
>current code results in link stack corruption and the resulting
>performance degradation (due to branch mis-prediction).
>
>This patch moves us to bcl+4 by moving __kernel_datapage_offset
>out of __get_datapage().
>
>With this patch, running the below benchmark we get a bump in
>performance on POWER8 for gettimeofday() (which uses
>__get_datapage()).
>
>64bit gets ~4% improvement:
>  Without patch:
>    # ./tb
>    time = 0.180321
>  With patch:
>    # ./tb
>    time = 0.187408
>
>32bit gets ~9% improvement:
>  Without patch:
>    # ./tb
>    time = 0.276551
>  With patch:
>    # ./tb
>    time = 0.252767
>
>Testcase tb.c (stolen from Anton)
>  /* gcc -O2 tb.c -o tb */
>  #include <sys/time.h>
>  #include <stdio.h>
>
>  int main()
>  {
>	  int i;
>
>	  struct timeval tv_start, tv_end;
>
>	  gettimeofday(&tv_start, NULL);
>
>	  for(i = 0; i < 10000000; i++) {
>		  gettimeofday(&tv_end, NULL);
>	  }
>
>	  printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec +
>(tv_end.tv_usec - tv_start.tv_usec) * 1e-6);
>
>	  return 0;
>  }

You know where test cases are supposed to go.

I know it's not a pass/fail test, but it's still useful. If it's in the tree it will get run as part of automated test runs and we will have a record of the result over time.

cheers
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()
       [not found]   ` <201509231835.t8NIZr0Q029029@d01av04.pok.ibm.com>
@ 2015-09-23 23:28     ` Michael Neuling
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Neuling @ 2015-09-23 23:28 UTC (permalink / raw)
  To: Aaron Sawdey
  Cc: Denis Kirjanov, Anton Blanchard, benh, linuxppc-dev,
	Michael Ellerman, Steve Munroe

On Wed, 2015-09-23 at 13:35 -0500, Aaron Sawdey wrote:
> One thing that occurred to me was that since the
> __kernel_datapage_offset is located immediately before the function,
> the offset is small and you could get rid of the addi and just fold it
> into the lwz.

r3 is reused at the end to add to r0, so if r3 doesn't have the right
offset for __kernel_datapage_offset it doesn't work.

Mikey


>=20
>=20
> Aaron Sawdey, Ph.D. sawdey@us.ibm.com
> 050-2/C113 (507) 253-7520 TL: 553-7520 home: 507/263-0782
> IBM Linux Technology Center - PPC Toolchain
>=20
> Inactive hide details for Denis Kirjanov ---09/23/2015 01:22:39
> PM---On 9/23/15, Michael Neuling <mikey@neuling.org> wrote: > TDenis
> Kirjanov ---09/23/2015 01:22:39 PM---On 9/23/15, Michael Neuling
> <mikey@neuling.org> wrote: > The 32 and 64 bit variants of
> __get_datapag
>=20
> From: Denis Kirjanov <kda@linux-powerpc.org>
> To: Michael Neuling <mikey@neuling.org>
> Cc: Michael Ellerman <michael@ellerman.id.au>, benh
> <benh@kernel.crashing.org>, Aaron Sawdey/Rochester/IBM@IBMUS,
> linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, Anton Blanchard
> <anton@samba.org>, Steve Munroe/Rochester/IBM@IBMUS
> Date: 09/23/2015 01:22 PM
> Subject: Re: [PATCH] powerpc/vdso: Avoid link stack corruption in
> __get_datapage()
>=20
>=20
>=20
> ______________________________________________________________________
>=20
>=20
>=20
> On 9/23/15, Michael Neuling <mikey@neuling.org> wrote:
> > The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to
> > determine the loaded address of the VDSO. The current version of
> these
> > attempt to use the special bcl variant which avoids pushing to the
> > link stack.
> >
> > Unfortunately it uses bcl+8 rather than the required bcl+4. Hence
> the
> > current code results in link stack corruption and the resulting
> > performance degradation (due to branch mis-prediction).
> >
> > This patch moves us to bcl+4 by moving __kernel_datapage_offset
> > out of __get_datapage().
> >
> > With this patch, running the below benchmark we get a bump in
> > performance on POWER8 for gettimeofday() (which uses
> > __get_datapage()).
> >
> > 64bit gets ~4% improvement:
> >   Without patch:
> >     # ./tb
> >     time =3D 0.180321
> >   With patch:
> >     # ./tb
> >     time =3D 0.187408
> >
> > 32bit gets ~9% improvement:
> >   Without patch:
> >     # ./tb
> >     time =3D 0.276551
> >   With patch:
> >     # ./tb
> >     time =3D 0.252767
>=20
> I've got the following results on POWER7 64bit
> without the patch:
> # ./tb
> time =3D 0.263337
> # ./tb
> time =3D 0.251273
> # ./tb
> time =3D 0.258453
> # ./tb
> time =3D 0.260189
>=20
> with the patch:
> # ./tb
> time =3D 0.241517
> # ./tb
> time =3D 0.241973
> # ./tb
> time =3D 0.239365
> # ./tb
> time =3D 0.240109
>=20
>=20
>=20
> >
> > Testcase tb.c (stolen from Anton)
> >   /* gcc -O2 tb.c -o tb */
> >   #include <sys/time.h>
> >   #include <stdio.h>
> >
> >   int main()
> >   {
> >   int i;
> >
> >   struct timeval tv_start, tv_end;
> >
> >   gettimeofday(&tv_start, NULL);
> >
> >   for(i =3D 0; i < 10000000; i++) {
> >   gettimeofday(&tv_end, NULL);
> >   }
> >
> >   printf("time =3D %.6f\n", tv_end.tv_sec - tv_start.tv_sec +
> (tv_end.tv_usec
> > - tv_start.tv_usec) * 1e-6);
> >
> >   return 0;
> >   }
> >
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > Reported-by: Aaron Sawdey <sawdey@us.ibm.com>
> >
> > diff --git a/arch/powerpc/kernel/vdso32/datapage.S
> > b/arch/powerpc/kernel/vdso32/datapage.S
> > index dc21e89..59cf5f4 100644
> > --- a/arch/powerpc/kernel/vdso32/datapage.S
> > +++ b/arch/powerpc/kernel/vdso32/datapage.S
> > @@ -16,6 +16,10 @@
> >  #include <asm/vdso.h>
> >
> >   .text
> > + .global __kernel_datapage_offset;
> > +__kernel_datapage_offset:
> > + .long 0
> > +
> >  V_FUNCTION_BEGIN(__get_datapage)
> >    .cfi_startproc
> >   /* We don't want that exposed or overridable as we want other
> objects
> > @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
> >   mflr r0
> >    .cfi_register lr,r0
> >
> > - bcl 20,31,1f
> > - .global __kernel_datapage_offset;
> > -__kernel_datapage_offset:
> > - .long 0
> > -1:
> > + bcl 20,31,data_page_branch
> > +data_page_branch:
> >   mflr r3
> >   mtlr r0
> > + addi r3, r3, __kernel_datapage_offset-data_page_branch
> >   lwz r0,0(r3)
> >   add r3,r0,r3
> >   blr
> > diff --git a/arch/powerpc/kernel/vdso64/datapage.S
> > b/arch/powerpc/kernel/vdso64/datapage.S
> > index 79796de..2f01c4a 100644
> > --- a/arch/powerpc/kernel/vdso64/datapage.S
> > +++ b/arch/powerpc/kernel/vdso64/datapage.S
> > @@ -16,6 +16,10 @@
> >  #include <asm/vdso.h>
> >
> >   .text
> > +.global __kernel_datapage_offset;
> > +__kernel_datapage_offset:
> > + .long 0
> > +
> >  V_FUNCTION_BEGIN(__get_datapage)
> >    .cfi_startproc
> >   /* We don't want that exposed or overridable as we want other
> objects
> > @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
> >   mflr r0
> >    .cfi_register lr,r0
> >
> > - bcl 20,31,1f
> > - .global __kernel_datapage_offset;
> > -__kernel_datapage_offset:
> > - .long 0
> > -1:
> > + bcl 20,31,data_page_branch
> > +data_page_branch:
> >   mflr r3
> >   mtlr r0
> > + addi r3, r3, __kernel_datapage_offset-data_page_branch
> >   lwz r0,0(r3)
> >   add r3,r0,r3
> >   blr
> >
> >
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev
>=20
>=20
>=20
>=20

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()
  2015-09-23 18:21 ` Denis Kirjanov
  2015-09-23 18:35   ` Aaron Sawdey
       [not found]   ` <201509231835.t8NIZr0Q029029@d01av04.pok.ibm.com>
@ 2015-09-23 23:33   ` Michael Neuling
  2 siblings, 0 replies; 8+ messages in thread
From: Michael Neuling @ 2015-09-23 23:33 UTC (permalink / raw)
  To: Denis Kirjanov
  Cc: Michael Ellerman, benh, Aaron Sawdey, linuxppc-dev,
	Anton Blanchard, Steve Munroe


> I've got the following results on POWER7 64bit
> without the patch:
> # ./tb
> time =3D 0.263337
> # ./tb
> time =3D 0.251273
> # ./tb
> time =3D 0.258453
> # ./tb
> time =3D 0.260189
>=20
> with the patch:
> # ./tb
> time =3D 0.241517
> # ./tb
> time =3D 0.241973
> # ./tb
> time =3D 0.239365
> # ./tb
> time =3D 0.240109

Nice, I was expecting a bump there too but didn't test..  Looks like
about 7% to me. =20

Thanks,
Mikey

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()
  2015-09-23 22:23 ` Michael Ellerman
@ 2015-09-24 10:10   ` Denis Kirjanov
  2015-09-25  0:05     ` Michael Ellerman
  0 siblings, 1 reply; 8+ messages in thread
From: Denis Kirjanov @ 2015-09-24 10:10 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Michael Neuling, benh, Aaron Sawdey, linuxppc-dev,
	Anton Blanchard, Steve Munroe

On 9/24/15, Michael Ellerman <michael@ellerman.id.au> wrote:
>
>
> On 23 September 2015 16:05:02 GMT+10:00, Michael Neuling <mikey@neuling.org>
> wrote:
>>The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to
>>determine the loaded address of the VDSO. The current version of these
>>attempt to use the special bcl variant which avoids pushing to the
>>link stack.
>>
>>Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the
>>current code results in link stack corruption and the resulting
>>performance degradation (due to branch mis-prediction).
>>
>>This patch moves us to bcl+4 by moving __kernel_datapage_offset
>>out of __get_datapage().
>>
>>With this patch, running the below benchmark we get a bump in
>>performance on POWER8 for gettimeofday() (which uses
>>__get_datapage()).
>>
>>64bit gets ~4% improvement:
>>  Without patch:
>>    # ./tb
>>    time = 0.180321
>>  With patch:
>>    # ./tb
>>    time = 0.187408
>>
>>32bit gets ~9% improvement:
>>  Without patch:
>>    # ./tb
>>    time = 0.276551
>>  With patch:
>>    # ./tb
>>    time = 0.252767
>>
>>Testcase tb.c (stolen from Anton)
>>  /* gcc -O2 tb.c -o tb */
>>  #include <sys/time.h>
>>  #include <stdio.h>
>>
>>  int main()
>>  {
>>	  int i;
>>
>>	  struct timeval tv_start, tv_end;
>>
>>	  gettimeofday(&tv_start, NULL);
>>
>>	  for(i = 0; i < 10000000; i++) {
>>		  gettimeofday(&tv_end, NULL);
>>	  }
>>
>>	  printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec +
>>(tv_end.tv_usec - tv_start.tv_usec) * 1e-6);
>>
>>	  return 0;
>>  }
>
> You know where test cases are supposed to go.
>
> I know it's not a pass/fail test, but it's still useful. If it's in the tree
> it will get run as part of automated test runs and we will have a record of
> the result over time.

I can send a patch for it.

>
> cheers
> --
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()
  2015-09-24 10:10   ` Denis Kirjanov
@ 2015-09-25  0:05     ` Michael Ellerman
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2015-09-25  0:05 UTC (permalink / raw)
  To: Denis Kirjanov
  Cc: Michael Neuling, Steve Munroe, Anton Blanchard, Aaron Sawdey,
	linuxppc-dev

On Thu, 2015-09-24 at 13:10 +0300, Denis Kirjanov wrote:
> On 9/24/15, Michael Ellerman <michael@ellerman.id.au> wrote:
> > On 23 September 2015 16:05:02 GMT+10:00, Michael Neuling <mikey@neuling.org>
> > wrote:
> >>
> >>Testcase tb.c (stolen from Anton)
> >>  /* gcc -O2 tb.c -o tb */
> >>  #include <sys/time.h>
> >>  #include <stdio.h>
> >>
> >>  int main()
> >>  {
> >>	  int i;
> >>
> >>	  struct timeval tv_start, tv_end;
> >>
> >>	  gettimeofday(&tv_start, NULL);
> >>
> >>	  for(i = 0; i < 10000000; i++) {
> >>		  gettimeofday(&tv_end, NULL);
> >>	  }
> >>
> >>	  printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec +
> >>(tv_end.tv_usec - tv_start.tv_usec) * 1e-6);
> >>
> >>	  return 0;
> >>  }
> >
> > You know where test cases are supposed to go.
> >
> > I know it's not a pass/fail test, but it's still useful. If it's in the tree
> > it will get run as part of automated test runs and we will have a record of
> > the result over time.
> 
> I can send a patch for it.

Mikey was working on it I think.

cheers

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-09-25  0:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-23  6:05 [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() Michael Neuling
2015-09-23 18:21 ` Denis Kirjanov
2015-09-23 18:35   ` Aaron Sawdey
     [not found]   ` <201509231835.t8NIZr0Q029029@d01av04.pok.ibm.com>
2015-09-23 23:28     ` Michael Neuling
2015-09-23 23:33   ` Michael Neuling
2015-09-23 22:23 ` Michael Ellerman
2015-09-24 10:10   ` Denis Kirjanov
2015-09-25  0:05     ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).