* [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()
@ 2015-09-23 6:05 Michael Neuling
2015-09-23 18:21 ` Denis Kirjanov
2015-09-23 22:23 ` Michael Ellerman
0 siblings, 2 replies; 8+ messages in thread
From: Michael Neuling @ 2015-09-23 6:05 UTC (permalink / raw)
To: Michael Ellerman, benh
Cc: linuxppc-dev, Anton Blanchard, Aaron Sawdey, Steve Munroe
The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to
determine the loaded address of the VDSO. The current version of these
attempt to use the special bcl variant which avoids pushing to the
link stack.
Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the
current code results in link stack corruption and the resulting
performance degradation (due to branch mis-prediction).
This patch moves us to bcl+4 by moving __kernel_datapage_offset
out of __get_datapage().
With this patch, running the below benchmark we get a bump in
performance on POWER8 for gettimeofday() (which uses
__get_datapage()).
64bit gets ~4% improvement:
Without patch:
# ./tb
time =3D 0.180321
With patch:
# ./tb
time =3D 0.187408
32bit gets ~9% improvement:
Without patch:
# ./tb
time =3D 0.276551
With patch:
# ./tb
time =3D 0.252767
Testcase tb.c (stolen from Anton)
/* gcc -O2 tb.c -o tb */
#include <sys/time.h>
#include <stdio.h>
int main()
{
int i;
struct timeval tv_start, tv_end;
gettimeofday(&tv_start, NULL);
for(i =3D 0; i < 10000000; i++) {
gettimeofday(&tv_end, NULL);
}
printf("time =3D %.6f\n", tv_end.tv_sec - tv_start.tv_sec + (tv_end.tv_u=
sec - tv_start.tv_usec) * 1e-6);
return 0;
}
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reported-by: Aaron Sawdey <sawdey@us.ibm.com>
diff --git a/arch/powerpc/kernel/vdso32/datapage.S b/arch/powerpc/kernel/vd=
so32/datapage.S
index dc21e89..59cf5f4 100644
--- a/arch/powerpc/kernel/vdso32/datapage.S
+++ b/arch/powerpc/kernel/vdso32/datapage.S
@@ -16,6 +16,10 @@
#include <asm/vdso.h>
=20
.text
+ .global __kernel_datapage_offset;
+__kernel_datapage_offset:
+ .long 0
+
V_FUNCTION_BEGIN(__get_datapage)
.cfi_startproc
/* We don't want that exposed or overridable as we want other objects
@@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
mflr r0
.cfi_register lr,r0
=20
- bcl 20,31,1f
- .global __kernel_datapage_offset;
-__kernel_datapage_offset:
- .long 0
-1:
+ bcl 20,31,data_page_branch
+data_page_branch:
mflr r3
mtlr r0
+ addi r3, r3, __kernel_datapage_offset-data_page_branch
lwz r0,0(r3)
add r3,r0,r3
blr
diff --git a/arch/powerpc/kernel/vdso64/datapage.S b/arch/powerpc/kernel/vd=
so64/datapage.S
index 79796de..2f01c4a 100644
--- a/arch/powerpc/kernel/vdso64/datapage.S
+++ b/arch/powerpc/kernel/vdso64/datapage.S
@@ -16,6 +16,10 @@
#include <asm/vdso.h>
=20
.text
+.global __kernel_datapage_offset;
+__kernel_datapage_offset:
+ .long 0
+
V_FUNCTION_BEGIN(__get_datapage)
.cfi_startproc
/* We don't want that exposed or overridable as we want other objects
@@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
mflr r0
.cfi_register lr,r0
=20
- bcl 20,31,1f
- .global __kernel_datapage_offset;
-__kernel_datapage_offset:
- .long 0
-1:
+ bcl 20,31,data_page_branch
+data_page_branch:
mflr r3
mtlr r0
+ addi r3, r3, __kernel_datapage_offset-data_page_branch
lwz r0,0(r3)
add r3,r0,r3
blr
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()
2015-09-23 6:05 [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() Michael Neuling
@ 2015-09-23 18:21 ` Denis Kirjanov
2015-09-23 18:35 ` Aaron Sawdey
` (2 more replies)
2015-09-23 22:23 ` Michael Ellerman
1 sibling, 3 replies; 8+ messages in thread
From: Denis Kirjanov @ 2015-09-23 18:21 UTC (permalink / raw)
To: Michael Neuling
Cc: Michael Ellerman, benh, Aaron Sawdey, linuxppc-dev,
Anton Blanchard, Steve Munroe
On 9/23/15, Michael Neuling <mikey@neuling.org> wrote:
> The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to
> determine the loaded address of the VDSO. The current version of these
> attempt to use the special bcl variant which avoids pushing to the
> link stack.
>
> Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the
> current code results in link stack corruption and the resulting
> performance degradation (due to branch mis-prediction).
>
> This patch moves us to bcl+4 by moving __kernel_datapage_offset
> out of __get_datapage().
>
> With this patch, running the below benchmark we get a bump in
> performance on POWER8 for gettimeofday() (which uses
> __get_datapage()).
>
> 64bit gets ~4% improvement:
> Without patch:
> # ./tb
> time = 0.180321
> With patch:
> # ./tb
> time = 0.187408
>
> 32bit gets ~9% improvement:
> Without patch:
> # ./tb
> time = 0.276551
> With patch:
> # ./tb
> time = 0.252767
I've got the following results on POWER7 64bit
without the patch:
# ./tb
time = 0.263337
# ./tb
time = 0.251273
# ./tb
time = 0.258453
# ./tb
time = 0.260189
with the patch:
# ./tb
time = 0.241517
# ./tb
time = 0.241973
# ./tb
time = 0.239365
# ./tb
time = 0.240109
>
> Testcase tb.c (stolen from Anton)
> /* gcc -O2 tb.c -o tb */
> #include <sys/time.h>
> #include <stdio.h>
>
> int main()
> {
> int i;
>
> struct timeval tv_start, tv_end;
>
> gettimeofday(&tv_start, NULL);
>
> for(i = 0; i < 10000000; i++) {
> gettimeofday(&tv_end, NULL);
> }
>
> printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec + (tv_end.tv_usec
> - tv_start.tv_usec) * 1e-6);
>
> return 0;
> }
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Reported-by: Aaron Sawdey <sawdey@us.ibm.com>
>
> diff --git a/arch/powerpc/kernel/vdso32/datapage.S
> b/arch/powerpc/kernel/vdso32/datapage.S
> index dc21e89..59cf5f4 100644
> --- a/arch/powerpc/kernel/vdso32/datapage.S
> +++ b/arch/powerpc/kernel/vdso32/datapage.S
> @@ -16,6 +16,10 @@
> #include <asm/vdso.h>
>
> .text
> + .global __kernel_datapage_offset;
> +__kernel_datapage_offset:
> + .long 0
> +
> V_FUNCTION_BEGIN(__get_datapage)
> .cfi_startproc
> /* We don't want that exposed or overridable as we want other objects
> @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
> mflr r0
> .cfi_register lr,r0
>
> - bcl 20,31,1f
> - .global __kernel_datapage_offset;
> -__kernel_datapage_offset:
> - .long 0
> -1:
> + bcl 20,31,data_page_branch
> +data_page_branch:
> mflr r3
> mtlr r0
> + addi r3, r3, __kernel_datapage_offset-data_page_branch
> lwz r0,0(r3)
> add r3,r0,r3
> blr
> diff --git a/arch/powerpc/kernel/vdso64/datapage.S
> b/arch/powerpc/kernel/vdso64/datapage.S
> index 79796de..2f01c4a 100644
> --- a/arch/powerpc/kernel/vdso64/datapage.S
> +++ b/arch/powerpc/kernel/vdso64/datapage.S
> @@ -16,6 +16,10 @@
> #include <asm/vdso.h>
>
> .text
> +.global __kernel_datapage_offset;
> +__kernel_datapage_offset:
> + .long 0
> +
> V_FUNCTION_BEGIN(__get_datapage)
> .cfi_startproc
> /* We don't want that exposed or overridable as we want other objects
> @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
> mflr r0
> .cfi_register lr,r0
>
> - bcl 20,31,1f
> - .global __kernel_datapage_offset;
> -__kernel_datapage_offset:
> - .long 0
> -1:
> + bcl 20,31,data_page_branch
> +data_page_branch:
> mflr r3
> mtlr r0
> + addi r3, r3, __kernel_datapage_offset-data_page_branch
> lwz r0,0(r3)
> add r3,r0,r3
> blr
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()
2015-09-23 18:21 ` Denis Kirjanov
@ 2015-09-23 18:35 ` Aaron Sawdey
[not found] ` <201509231835.t8NIZr0Q029029@d01av04.pok.ibm.com>
2015-09-23 23:33 ` Michael Neuling
2 siblings, 0 replies; 8+ messages in thread
From: Aaron Sawdey @ 2015-09-23 18:35 UTC (permalink / raw)
To: Denis Kirjanov
Cc: Anton Blanchard, benh, linuxppc-dev, Michael Ellerman,
Michael Neuling, Steve Munroe
[-- Attachment #1.1: Type: text/plain, Size: 4835 bytes --]
One thing that occurred to me was that since the __kernel_datapage_offset
is located immediately before the function, the offset is small and you
could get rid of the addi and just fold it into the lwz.
Aaron Sawdey, Ph.D. sawdey@us.ibm.com
050-2/C113 (507) 253-7520 TL: 553-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain
From: Denis Kirjanov <kda@linux-powerpc.org>
To: Michael Neuling <mikey@neuling.org>
Cc: Michael Ellerman <michael@ellerman.id.au>, benh
<benh@kernel.crashing.org>, Aaron Sawdey/Rochester/IBM@IBMUS,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, Anton Blanchard
<anton@samba.org>, Steve Munroe/Rochester/IBM@IBMUS
Date: 09/23/2015 01:22 PM
Subject: Re: [PATCH] powerpc/vdso: Avoid link stack corruption in
__get_datapage()
On 9/23/15, Michael Neuling <mikey@neuling.org> wrote:
> The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to
> determine the loaded address of the VDSO. The current version of these
> attempt to use the special bcl variant which avoids pushing to the
> link stack.
>
> Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the
> current code results in link stack corruption and the resulting
> performance degradation (due to branch mis-prediction).
>
> This patch moves us to bcl+4 by moving __kernel_datapage_offset
> out of __get_datapage().
>
> With this patch, running the below benchmark we get a bump in
> performance on POWER8 for gettimeofday() (which uses
> __get_datapage()).
>
> 64bit gets ~4% improvement:
> Without patch:
> # ./tb
> time = 0.180321
> With patch:
> # ./tb
> time = 0.187408
>
> 32bit gets ~9% improvement:
> Without patch:
> # ./tb
> time = 0.276551
> With patch:
> # ./tb
> time = 0.252767
I've got the following results on POWER7 64bit
without the patch:
# ./tb
time = 0.263337
# ./tb
time = 0.251273
# ./tb
time = 0.258453
# ./tb
time = 0.260189
with the patch:
# ./tb
time = 0.241517
# ./tb
time = 0.241973
# ./tb
time = 0.239365
# ./tb
time = 0.240109
>
> Testcase tb.c (stolen from Anton)
> /* gcc -O2 tb.c -o tb */
> #include <sys/time.h>
> #include <stdio.h>
>
> int main()
> {
> int i;
>
> struct timeval tv_start, tv_end;
>
> gettimeofday(&tv_start, NULL);
>
> for(i = 0; i < 10000000; i++) {
> gettimeofday(&tv_end, NULL);
> }
>
> printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec +
(tv_end.tv_usec
> - tv_start.tv_usec) * 1e-6);
>
> return 0;
> }
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Reported-by: Aaron Sawdey <sawdey@us.ibm.com>
>
> diff --git a/arch/powerpc/kernel/vdso32/datapage.S
> b/arch/powerpc/kernel/vdso32/datapage.S
> index dc21e89..59cf5f4 100644
> --- a/arch/powerpc/kernel/vdso32/datapage.S
> +++ b/arch/powerpc/kernel/vdso32/datapage.S
> @@ -16,6 +16,10 @@
> #include <asm/vdso.h>
>
> .text
> + .global __kernel_datapage_offset;
> +__kernel_datapage_offset:
> + .long 0
> +
> V_FUNCTION_BEGIN(__get_datapage)
> .cfi_startproc
> /* We don't want that exposed or overridable as we want other
objects
> @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
> mflr r0
> .cfi_register lr,r0
>
> - bcl 20,31,1f
> - .global __kernel_datapage_offset;
> -__kernel_datapage_offset:
> - .long 0
> -1:
> + bcl 20,31,data_page_branch
> +data_page_branch:
> mflr r3
> mtlr r0
> + addi r3, r3, __kernel_datapage_offset-data_page_branch
> lwz r0,0(r3)
> add r3,r0,r3
> blr
> diff --git a/arch/powerpc/kernel/vdso64/datapage.S
> b/arch/powerpc/kernel/vdso64/datapage.S
> index 79796de..2f01c4a 100644
> --- a/arch/powerpc/kernel/vdso64/datapage.S
> +++ b/arch/powerpc/kernel/vdso64/datapage.S
> @@ -16,6 +16,10 @@
> #include <asm/vdso.h>
>
> .text
> +.global __kernel_datapage_offset;
> +__kernel_datapage_offset:
> + .long 0
> +
> V_FUNCTION_BEGIN(__get_datapage)
> .cfi_startproc
> /* We don't want that exposed or overridable as we want other
objects
> @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
> mflr r0
> .cfi_register lr,r0
>
> - bcl 20,31,1f
> - .global __kernel_datapage_offset;
> -__kernel_datapage_offset:
> - .long 0
> -1:
> + bcl 20,31,data_page_branch
> +data_page_branch:
> mflr r3
> mtlr r0
> + addi r3, r3, __kernel_datapage_offset-data_page_branch
> lwz r0,0(r3)
> add r3,r0,r3
> blr
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
[-- Attachment #1.2: Type: text/html, Size: 7752 bytes --]
[-- Attachment #2: graycol.gif --]
[-- Type: image/gif, Size: 105 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()
2015-09-23 6:05 [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() Michael Neuling
2015-09-23 18:21 ` Denis Kirjanov
@ 2015-09-23 22:23 ` Michael Ellerman
2015-09-24 10:10 ` Denis Kirjanov
1 sibling, 1 reply; 8+ messages in thread
From: Michael Ellerman @ 2015-09-23 22:23 UTC (permalink / raw)
To: Michael Neuling, benh
Cc: linuxppc-dev, Anton Blanchard, Aaron Sawdey, Steve Munroe
On 23 September 2015 16:05:02 GMT+10:00, Michael Neuling <mikey@neuling.org> wrote:
>The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to
>determine the loaded address of the VDSO. The current version of these
>attempt to use the special bcl variant which avoids pushing to the
>link stack.
>
>Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the
>current code results in link stack corruption and the resulting
>performance degradation (due to branch mis-prediction).
>
>This patch moves us to bcl+4 by moving __kernel_datapage_offset
>out of __get_datapage().
>
>With this patch, running the below benchmark we get a bump in
>performance on POWER8 for gettimeofday() (which uses
>__get_datapage()).
>
>64bit gets ~4% improvement:
> Without patch:
> # ./tb
> time = 0.180321
> With patch:
> # ./tb
> time = 0.187408
>
>32bit gets ~9% improvement:
> Without patch:
> # ./tb
> time = 0.276551
> With patch:
> # ./tb
> time = 0.252767
>
>Testcase tb.c (stolen from Anton)
> /* gcc -O2 tb.c -o tb */
> #include <sys/time.h>
> #include <stdio.h>
>
> int main()
> {
> int i;
>
> struct timeval tv_start, tv_end;
>
> gettimeofday(&tv_start, NULL);
>
> for(i = 0; i < 10000000; i++) {
> gettimeofday(&tv_end, NULL);
> }
>
> printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec +
>(tv_end.tv_usec - tv_start.tv_usec) * 1e-6);
>
> return 0;
> }
You know where test cases are supposed to go.
I know it's not a pass/fail test, but it's still useful. If it's in the tree it will get run as part of automated test runs and we will have a record of the result over time.
cheers
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()
[not found] ` <201509231835.t8NIZr0Q029029@d01av04.pok.ibm.com>
@ 2015-09-23 23:28 ` Michael Neuling
0 siblings, 0 replies; 8+ messages in thread
From: Michael Neuling @ 2015-09-23 23:28 UTC (permalink / raw)
To: Aaron Sawdey
Cc: Denis Kirjanov, Anton Blanchard, benh, linuxppc-dev,
Michael Ellerman, Steve Munroe
On Wed, 2015-09-23 at 13:35 -0500, Aaron Sawdey wrote:
> One thing that occurred to me was that since the
> __kernel_datapage_offset is located immediately before the function,
> the offset is small and you could get rid of the addi and just fold it
> into the lwz.
r3 is reused at the end to add to r0, so if r3 doesn't have the right
offset for __kernel_datapage_offset it doesn't work.
Mikey
>=20
>=20
> Aaron Sawdey, Ph.D. sawdey@us.ibm.com
> 050-2/C113 (507) 253-7520 TL: 553-7520 home: 507/263-0782
> IBM Linux Technology Center - PPC Toolchain
>=20
> Inactive hide details for Denis Kirjanov ---09/23/2015 01:22:39
> PM---On 9/23/15, Michael Neuling <mikey@neuling.org> wrote: > TDenis
> Kirjanov ---09/23/2015 01:22:39 PM---On 9/23/15, Michael Neuling
> <mikey@neuling.org> wrote: > The 32 and 64 bit variants of
> __get_datapag
>=20
> From: Denis Kirjanov <kda@linux-powerpc.org>
> To: Michael Neuling <mikey@neuling.org>
> Cc: Michael Ellerman <michael@ellerman.id.au>, benh
> <benh@kernel.crashing.org>, Aaron Sawdey/Rochester/IBM@IBMUS,
> linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, Anton Blanchard
> <anton@samba.org>, Steve Munroe/Rochester/IBM@IBMUS
> Date: 09/23/2015 01:22 PM
> Subject: Re: [PATCH] powerpc/vdso: Avoid link stack corruption in
> __get_datapage()
>=20
>=20
>=20
> ______________________________________________________________________
>=20
>=20
>=20
> On 9/23/15, Michael Neuling <mikey@neuling.org> wrote:
> > The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to
> > determine the loaded address of the VDSO. The current version of
> these
> > attempt to use the special bcl variant which avoids pushing to the
> > link stack.
> >
> > Unfortunately it uses bcl+8 rather than the required bcl+4. Hence
> the
> > current code results in link stack corruption and the resulting
> > performance degradation (due to branch mis-prediction).
> >
> > This patch moves us to bcl+4 by moving __kernel_datapage_offset
> > out of __get_datapage().
> >
> > With this patch, running the below benchmark we get a bump in
> > performance on POWER8 for gettimeofday() (which uses
> > __get_datapage()).
> >
> > 64bit gets ~4% improvement:
> > Without patch:
> > # ./tb
> > time =3D 0.180321
> > With patch:
> > # ./tb
> > time =3D 0.187408
> >
> > 32bit gets ~9% improvement:
> > Without patch:
> > # ./tb
> > time =3D 0.276551
> > With patch:
> > # ./tb
> > time =3D 0.252767
>=20
> I've got the following results on POWER7 64bit
> without the patch:
> # ./tb
> time =3D 0.263337
> # ./tb
> time =3D 0.251273
> # ./tb
> time =3D 0.258453
> # ./tb
> time =3D 0.260189
>=20
> with the patch:
> # ./tb
> time =3D 0.241517
> # ./tb
> time =3D 0.241973
> # ./tb
> time =3D 0.239365
> # ./tb
> time =3D 0.240109
>=20
>=20
>=20
> >
> > Testcase tb.c (stolen from Anton)
> > /* gcc -O2 tb.c -o tb */
> > #include <sys/time.h>
> > #include <stdio.h>
> >
> > int main()
> > {
> > int i;
> >
> > struct timeval tv_start, tv_end;
> >
> > gettimeofday(&tv_start, NULL);
> >
> > for(i =3D 0; i < 10000000; i++) {
> > gettimeofday(&tv_end, NULL);
> > }
> >
> > printf("time =3D %.6f\n", tv_end.tv_sec - tv_start.tv_sec +
> (tv_end.tv_usec
> > - tv_start.tv_usec) * 1e-6);
> >
> > return 0;
> > }
> >
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > Reported-by: Aaron Sawdey <sawdey@us.ibm.com>
> >
> > diff --git a/arch/powerpc/kernel/vdso32/datapage.S
> > b/arch/powerpc/kernel/vdso32/datapage.S
> > index dc21e89..59cf5f4 100644
> > --- a/arch/powerpc/kernel/vdso32/datapage.S
> > +++ b/arch/powerpc/kernel/vdso32/datapage.S
> > @@ -16,6 +16,10 @@
> > #include <asm/vdso.h>
> >
> > .text
> > + .global __kernel_datapage_offset;
> > +__kernel_datapage_offset:
> > + .long 0
> > +
> > V_FUNCTION_BEGIN(__get_datapage)
> > .cfi_startproc
> > /* We don't want that exposed or overridable as we want other
> objects
> > @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
> > mflr r0
> > .cfi_register lr,r0
> >
> > - bcl 20,31,1f
> > - .global __kernel_datapage_offset;
> > -__kernel_datapage_offset:
> > - .long 0
> > -1:
> > + bcl 20,31,data_page_branch
> > +data_page_branch:
> > mflr r3
> > mtlr r0
> > + addi r3, r3, __kernel_datapage_offset-data_page_branch
> > lwz r0,0(r3)
> > add r3,r0,r3
> > blr
> > diff --git a/arch/powerpc/kernel/vdso64/datapage.S
> > b/arch/powerpc/kernel/vdso64/datapage.S
> > index 79796de..2f01c4a 100644
> > --- a/arch/powerpc/kernel/vdso64/datapage.S
> > +++ b/arch/powerpc/kernel/vdso64/datapage.S
> > @@ -16,6 +16,10 @@
> > #include <asm/vdso.h>
> >
> > .text
> > +.global __kernel_datapage_offset;
> > +__kernel_datapage_offset:
> > + .long 0
> > +
> > V_FUNCTION_BEGIN(__get_datapage)
> > .cfi_startproc
> > /* We don't want that exposed or overridable as we want other
> objects
> > @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
> > mflr r0
> > .cfi_register lr,r0
> >
> > - bcl 20,31,1f
> > - .global __kernel_datapage_offset;
> > -__kernel_datapage_offset:
> > - .long 0
> > -1:
> > + bcl 20,31,data_page_branch
> > +data_page_branch:
> > mflr r3
> > mtlr r0
> > + addi r3, r3, __kernel_datapage_offset-data_page_branch
> > lwz r0,0(r3)
> > add r3,r0,r3
> > blr
> >
> >
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev
>=20
>=20
>=20
>=20
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()
2015-09-23 18:21 ` Denis Kirjanov
2015-09-23 18:35 ` Aaron Sawdey
[not found] ` <201509231835.t8NIZr0Q029029@d01av04.pok.ibm.com>
@ 2015-09-23 23:33 ` Michael Neuling
2 siblings, 0 replies; 8+ messages in thread
From: Michael Neuling @ 2015-09-23 23:33 UTC (permalink / raw)
To: Denis Kirjanov
Cc: Michael Ellerman, benh, Aaron Sawdey, linuxppc-dev,
Anton Blanchard, Steve Munroe
> I've got the following results on POWER7 64bit
> without the patch:
> # ./tb
> time =3D 0.263337
> # ./tb
> time =3D 0.251273
> # ./tb
> time =3D 0.258453
> # ./tb
> time =3D 0.260189
>=20
> with the patch:
> # ./tb
> time =3D 0.241517
> # ./tb
> time =3D 0.241973
> # ./tb
> time =3D 0.239365
> # ./tb
> time =3D 0.240109
Nice, I was expecting a bump there too but didn't test.. Looks like
about 7% to me. =20
Thanks,
Mikey
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()
2015-09-23 22:23 ` Michael Ellerman
@ 2015-09-24 10:10 ` Denis Kirjanov
2015-09-25 0:05 ` Michael Ellerman
0 siblings, 1 reply; 8+ messages in thread
From: Denis Kirjanov @ 2015-09-24 10:10 UTC (permalink / raw)
To: Michael Ellerman
Cc: Michael Neuling, benh, Aaron Sawdey, linuxppc-dev,
Anton Blanchard, Steve Munroe
On 9/24/15, Michael Ellerman <michael@ellerman.id.au> wrote:
>
>
> On 23 September 2015 16:05:02 GMT+10:00, Michael Neuling <mikey@neuling.org>
> wrote:
>>The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to
>>determine the loaded address of the VDSO. The current version of these
>>attempt to use the special bcl variant which avoids pushing to the
>>link stack.
>>
>>Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the
>>current code results in link stack corruption and the resulting
>>performance degradation (due to branch mis-prediction).
>>
>>This patch moves us to bcl+4 by moving __kernel_datapage_offset
>>out of __get_datapage().
>>
>>With this patch, running the below benchmark we get a bump in
>>performance on POWER8 for gettimeofday() (which uses
>>__get_datapage()).
>>
>>64bit gets ~4% improvement:
>> Without patch:
>> # ./tb
>> time = 0.180321
>> With patch:
>> # ./tb
>> time = 0.187408
>>
>>32bit gets ~9% improvement:
>> Without patch:
>> # ./tb
>> time = 0.276551
>> With patch:
>> # ./tb
>> time = 0.252767
>>
>>Testcase tb.c (stolen from Anton)
>> /* gcc -O2 tb.c -o tb */
>> #include <sys/time.h>
>> #include <stdio.h>
>>
>> int main()
>> {
>> int i;
>>
>> struct timeval tv_start, tv_end;
>>
>> gettimeofday(&tv_start, NULL);
>>
>> for(i = 0; i < 10000000; i++) {
>> gettimeofday(&tv_end, NULL);
>> }
>>
>> printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec +
>>(tv_end.tv_usec - tv_start.tv_usec) * 1e-6);
>>
>> return 0;
>> }
>
> You know where test cases are supposed to go.
>
> I know it's not a pass/fail test, but it's still useful. If it's in the tree
> it will get run as part of automated test runs and we will have a record of
> the result over time.
I can send a patch for it.
>
> cheers
> --
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()
2015-09-24 10:10 ` Denis Kirjanov
@ 2015-09-25 0:05 ` Michael Ellerman
0 siblings, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2015-09-25 0:05 UTC (permalink / raw)
To: Denis Kirjanov
Cc: Michael Neuling, Steve Munroe, Anton Blanchard, Aaron Sawdey,
linuxppc-dev
On Thu, 2015-09-24 at 13:10 +0300, Denis Kirjanov wrote:
> On 9/24/15, Michael Ellerman <michael@ellerman.id.au> wrote:
> > On 23 September 2015 16:05:02 GMT+10:00, Michael Neuling <mikey@neuling.org>
> > wrote:
> >>
> >>Testcase tb.c (stolen from Anton)
> >> /* gcc -O2 tb.c -o tb */
> >> #include <sys/time.h>
> >> #include <stdio.h>
> >>
> >> int main()
> >> {
> >> int i;
> >>
> >> struct timeval tv_start, tv_end;
> >>
> >> gettimeofday(&tv_start, NULL);
> >>
> >> for(i = 0; i < 10000000; i++) {
> >> gettimeofday(&tv_end, NULL);
> >> }
> >>
> >> printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec +
> >>(tv_end.tv_usec - tv_start.tv_usec) * 1e-6);
> >>
> >> return 0;
> >> }
> >
> > You know where test cases are supposed to go.
> >
> > I know it's not a pass/fail test, but it's still useful. If it's in the tree
> > it will get run as part of automated test runs and we will have a record of
> > the result over time.
>
> I can send a patch for it.
Mikey was working on it I think.
cheers
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-09-25 0:05 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-23 6:05 [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() Michael Neuling
2015-09-23 18:21 ` Denis Kirjanov
2015-09-23 18:35 ` Aaron Sawdey
[not found] ` <201509231835.t8NIZr0Q029029@d01av04.pok.ibm.com>
2015-09-23 23:28 ` Michael Neuling
2015-09-23 23:33 ` Michael Neuling
2015-09-23 22:23 ` Michael Ellerman
2015-09-24 10:10 ` Denis Kirjanov
2015-09-25 0:05 ` Michael Ellerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).