* [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() @ 2015-09-23 6:05 Michael Neuling 2015-09-23 18:21 ` Denis Kirjanov 2015-09-23 22:23 ` Michael Ellerman 0 siblings, 2 replies; 8+ messages in thread From: Michael Neuling @ 2015-09-23 6:05 UTC (permalink / raw) To: Michael Ellerman, benh Cc: linuxppc-dev, Anton Blanchard, Aaron Sawdey, Steve Munroe The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to determine the loaded address of the VDSO. The current version of these attempt to use the special bcl variant which avoids pushing to the link stack. Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the current code results in link stack corruption and the resulting performance degradation (due to branch mis-prediction). This patch moves us to bcl+4 by moving __kernel_datapage_offset out of __get_datapage(). With this patch, running the below benchmark we get a bump in performance on POWER8 for gettimeofday() (which uses __get_datapage()). 64bit gets ~4% improvement: Without patch: # ./tb time =3D 0.180321 With patch: # ./tb time =3D 0.187408 32bit gets ~9% improvement: Without patch: # ./tb time =3D 0.276551 With patch: # ./tb time =3D 0.252767 Testcase tb.c (stolen from Anton) /* gcc -O2 tb.c -o tb */ #include <sys/time.h> #include <stdio.h> int main() { int i; struct timeval tv_start, tv_end; gettimeofday(&tv_start, NULL); for(i =3D 0; i < 10000000; i++) { gettimeofday(&tv_end, NULL); } printf("time =3D %.6f\n", tv_end.tv_sec - tv_start.tv_sec + (tv_end.tv_u= sec - tv_start.tv_usec) * 1e-6); return 0; } Signed-off-by: Michael Neuling <mikey@neuling.org> Reported-by: Aaron Sawdey <sawdey@us.ibm.com> diff --git a/arch/powerpc/kernel/vdso32/datapage.S b/arch/powerpc/kernel/vd= so32/datapage.S index dc21e89..59cf5f4 100644 --- a/arch/powerpc/kernel/vdso32/datapage.S +++ b/arch/powerpc/kernel/vdso32/datapage.S @@ -16,6 +16,10 @@ #include <asm/vdso.h> =20 .text + .global __kernel_datapage_offset; +__kernel_datapage_offset: + .long 0 + V_FUNCTION_BEGIN(__get_datapage) .cfi_startproc /* We don't want that exposed or overridable as we want other objects @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage) mflr r0 .cfi_register lr,r0 =20 - bcl 20,31,1f - .global __kernel_datapage_offset; -__kernel_datapage_offset: - .long 0 -1: + bcl 20,31,data_page_branch +data_page_branch: mflr r3 mtlr r0 + addi r3, r3, __kernel_datapage_offset-data_page_branch lwz r0,0(r3) add r3,r0,r3 blr diff --git a/arch/powerpc/kernel/vdso64/datapage.S b/arch/powerpc/kernel/vd= so64/datapage.S index 79796de..2f01c4a 100644 --- a/arch/powerpc/kernel/vdso64/datapage.S +++ b/arch/powerpc/kernel/vdso64/datapage.S @@ -16,6 +16,10 @@ #include <asm/vdso.h> =20 .text +.global __kernel_datapage_offset; +__kernel_datapage_offset: + .long 0 + V_FUNCTION_BEGIN(__get_datapage) .cfi_startproc /* We don't want that exposed or overridable as we want other objects @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage) mflr r0 .cfi_register lr,r0 =20 - bcl 20,31,1f - .global __kernel_datapage_offset; -__kernel_datapage_offset: - .long 0 -1: + bcl 20,31,data_page_branch +data_page_branch: mflr r3 mtlr r0 + addi r3, r3, __kernel_datapage_offset-data_page_branch lwz r0,0(r3) add r3,r0,r3 blr ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() 2015-09-23 6:05 [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() Michael Neuling @ 2015-09-23 18:21 ` Denis Kirjanov 2015-09-23 18:35 ` Aaron Sawdey ` (2 more replies) 2015-09-23 22:23 ` Michael Ellerman 1 sibling, 3 replies; 8+ messages in thread From: Denis Kirjanov @ 2015-09-23 18:21 UTC (permalink / raw) To: Michael Neuling Cc: Michael Ellerman, benh, Aaron Sawdey, linuxppc-dev, Anton Blanchard, Steve Munroe On 9/23/15, Michael Neuling <mikey@neuling.org> wrote: > The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to > determine the loaded address of the VDSO. The current version of these > attempt to use the special bcl variant which avoids pushing to the > link stack. > > Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the > current code results in link stack corruption and the resulting > performance degradation (due to branch mis-prediction). > > This patch moves us to bcl+4 by moving __kernel_datapage_offset > out of __get_datapage(). > > With this patch, running the below benchmark we get a bump in > performance on POWER8 for gettimeofday() (which uses > __get_datapage()). > > 64bit gets ~4% improvement: > Without patch: > # ./tb > time = 0.180321 > With patch: > # ./tb > time = 0.187408 > > 32bit gets ~9% improvement: > Without patch: > # ./tb > time = 0.276551 > With patch: > # ./tb > time = 0.252767 I've got the following results on POWER7 64bit without the patch: # ./tb time = 0.263337 # ./tb time = 0.251273 # ./tb time = 0.258453 # ./tb time = 0.260189 with the patch: # ./tb time = 0.241517 # ./tb time = 0.241973 # ./tb time = 0.239365 # ./tb time = 0.240109 > > Testcase tb.c (stolen from Anton) > /* gcc -O2 tb.c -o tb */ > #include <sys/time.h> > #include <stdio.h> > > int main() > { > int i; > > struct timeval tv_start, tv_end; > > gettimeofday(&tv_start, NULL); > > for(i = 0; i < 10000000; i++) { > gettimeofday(&tv_end, NULL); > } > > printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec + (tv_end.tv_usec > - tv_start.tv_usec) * 1e-6); > > return 0; > } > > Signed-off-by: Michael Neuling <mikey@neuling.org> > Reported-by: Aaron Sawdey <sawdey@us.ibm.com> > > diff --git a/arch/powerpc/kernel/vdso32/datapage.S > b/arch/powerpc/kernel/vdso32/datapage.S > index dc21e89..59cf5f4 100644 > --- a/arch/powerpc/kernel/vdso32/datapage.S > +++ b/arch/powerpc/kernel/vdso32/datapage.S > @@ -16,6 +16,10 @@ > #include <asm/vdso.h> > > .text > + .global __kernel_datapage_offset; > +__kernel_datapage_offset: > + .long 0 > + > V_FUNCTION_BEGIN(__get_datapage) > .cfi_startproc > /* We don't want that exposed or overridable as we want other objects > @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage) > mflr r0 > .cfi_register lr,r0 > > - bcl 20,31,1f > - .global __kernel_datapage_offset; > -__kernel_datapage_offset: > - .long 0 > -1: > + bcl 20,31,data_page_branch > +data_page_branch: > mflr r3 > mtlr r0 > + addi r3, r3, __kernel_datapage_offset-data_page_branch > lwz r0,0(r3) > add r3,r0,r3 > blr > diff --git a/arch/powerpc/kernel/vdso64/datapage.S > b/arch/powerpc/kernel/vdso64/datapage.S > index 79796de..2f01c4a 100644 > --- a/arch/powerpc/kernel/vdso64/datapage.S > +++ b/arch/powerpc/kernel/vdso64/datapage.S > @@ -16,6 +16,10 @@ > #include <asm/vdso.h> > > .text > +.global __kernel_datapage_offset; > +__kernel_datapage_offset: > + .long 0 > + > V_FUNCTION_BEGIN(__get_datapage) > .cfi_startproc > /* We don't want that exposed or overridable as we want other objects > @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage) > mflr r0 > .cfi_register lr,r0 > > - bcl 20,31,1f > - .global __kernel_datapage_offset; > -__kernel_datapage_offset: > - .long 0 > -1: > + bcl 20,31,data_page_branch > +data_page_branch: > mflr r3 > mtlr r0 > + addi r3, r3, __kernel_datapage_offset-data_page_branch > lwz r0,0(r3) > add r3,r0,r3 > blr > > > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() 2015-09-23 18:21 ` Denis Kirjanov @ 2015-09-23 18:35 ` Aaron Sawdey [not found] ` <201509231835.t8NIZr0Q029029@d01av04.pok.ibm.com> 2015-09-23 23:33 ` Michael Neuling 2 siblings, 0 replies; 8+ messages in thread From: Aaron Sawdey @ 2015-09-23 18:35 UTC (permalink / raw) To: Denis Kirjanov Cc: Anton Blanchard, benh, linuxppc-dev, Michael Ellerman, Michael Neuling, Steve Munroe [-- Attachment #1.1: Type: text/plain, Size: 4835 bytes --] One thing that occurred to me was that since the __kernel_datapage_offset is located immediately before the function, the offset is small and you could get rid of the addi and just fold it into the lwz. Aaron Sawdey, Ph.D. sawdey@us.ibm.com 050-2/C113 (507) 253-7520 TL: 553-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain From: Denis Kirjanov <kda@linux-powerpc.org> To: Michael Neuling <mikey@neuling.org> Cc: Michael Ellerman <michael@ellerman.id.au>, benh <benh@kernel.crashing.org>, Aaron Sawdey/Rochester/IBM@IBMUS, linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, Anton Blanchard <anton@samba.org>, Steve Munroe/Rochester/IBM@IBMUS Date: 09/23/2015 01:22 PM Subject: Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() On 9/23/15, Michael Neuling <mikey@neuling.org> wrote: > The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to > determine the loaded address of the VDSO. The current version of these > attempt to use the special bcl variant which avoids pushing to the > link stack. > > Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the > current code results in link stack corruption and the resulting > performance degradation (due to branch mis-prediction). > > This patch moves us to bcl+4 by moving __kernel_datapage_offset > out of __get_datapage(). > > With this patch, running the below benchmark we get a bump in > performance on POWER8 for gettimeofday() (which uses > __get_datapage()). > > 64bit gets ~4% improvement: > Without patch: > # ./tb > time = 0.180321 > With patch: > # ./tb > time = 0.187408 > > 32bit gets ~9% improvement: > Without patch: > # ./tb > time = 0.276551 > With patch: > # ./tb > time = 0.252767 I've got the following results on POWER7 64bit without the patch: # ./tb time = 0.263337 # ./tb time = 0.251273 # ./tb time = 0.258453 # ./tb time = 0.260189 with the patch: # ./tb time = 0.241517 # ./tb time = 0.241973 # ./tb time = 0.239365 # ./tb time = 0.240109 > > Testcase tb.c (stolen from Anton) > /* gcc -O2 tb.c -o tb */ > #include <sys/time.h> > #include <stdio.h> > > int main() > { > int i; > > struct timeval tv_start, tv_end; > > gettimeofday(&tv_start, NULL); > > for(i = 0; i < 10000000; i++) { > gettimeofday(&tv_end, NULL); > } > > printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec + (tv_end.tv_usec > - tv_start.tv_usec) * 1e-6); > > return 0; > } > > Signed-off-by: Michael Neuling <mikey@neuling.org> > Reported-by: Aaron Sawdey <sawdey@us.ibm.com> > > diff --git a/arch/powerpc/kernel/vdso32/datapage.S > b/arch/powerpc/kernel/vdso32/datapage.S > index dc21e89..59cf5f4 100644 > --- a/arch/powerpc/kernel/vdso32/datapage.S > +++ b/arch/powerpc/kernel/vdso32/datapage.S > @@ -16,6 +16,10 @@ > #include <asm/vdso.h> > > .text > + .global __kernel_datapage_offset; > +__kernel_datapage_offset: > + .long 0 > + > V_FUNCTION_BEGIN(__get_datapage) > .cfi_startproc > /* We don't want that exposed or overridable as we want other objects > @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage) > mflr r0 > .cfi_register lr,r0 > > - bcl 20,31,1f > - .global __kernel_datapage_offset; > -__kernel_datapage_offset: > - .long 0 > -1: > + bcl 20,31,data_page_branch > +data_page_branch: > mflr r3 > mtlr r0 > + addi r3, r3, __kernel_datapage_offset-data_page_branch > lwz r0,0(r3) > add r3,r0,r3 > blr > diff --git a/arch/powerpc/kernel/vdso64/datapage.S > b/arch/powerpc/kernel/vdso64/datapage.S > index 79796de..2f01c4a 100644 > --- a/arch/powerpc/kernel/vdso64/datapage.S > +++ b/arch/powerpc/kernel/vdso64/datapage.S > @@ -16,6 +16,10 @@ > #include <asm/vdso.h> > > .text > +.global __kernel_datapage_offset; > +__kernel_datapage_offset: > + .long 0 > + > V_FUNCTION_BEGIN(__get_datapage) > .cfi_startproc > /* We don't want that exposed or overridable as we want other objects > @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage) > mflr r0 > .cfi_register lr,r0 > > - bcl 20,31,1f > - .global __kernel_datapage_offset; > -__kernel_datapage_offset: > - .long 0 > -1: > + bcl 20,31,data_page_branch > +data_page_branch: > mflr r3 > mtlr r0 > + addi r3, r3, __kernel_datapage_offset-data_page_branch > lwz r0,0(r3) > add r3,r0,r3 > blr > > > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev [-- Attachment #1.2: Type: text/html, Size: 7752 bytes --] [-- Attachment #2: graycol.gif --] [-- Type: image/gif, Size: 105 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <201509231835.t8NIZr0Q029029@d01av04.pok.ibm.com>]
* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() [not found] ` <201509231835.t8NIZr0Q029029@d01av04.pok.ibm.com> @ 2015-09-23 23:28 ` Michael Neuling 0 siblings, 0 replies; 8+ messages in thread From: Michael Neuling @ 2015-09-23 23:28 UTC (permalink / raw) To: Aaron Sawdey Cc: Denis Kirjanov, Anton Blanchard, benh, linuxppc-dev, Michael Ellerman, Steve Munroe On Wed, 2015-09-23 at 13:35 -0500, Aaron Sawdey wrote: > One thing that occurred to me was that since the > __kernel_datapage_offset is located immediately before the function, > the offset is small and you could get rid of the addi and just fold it > into the lwz. r3 is reused at the end to add to r0, so if r3 doesn't have the right offset for __kernel_datapage_offset it doesn't work. Mikey >=20 >=20 > Aaron Sawdey, Ph.D. sawdey@us.ibm.com > 050-2/C113 (507) 253-7520 TL: 553-7520 home: 507/263-0782 > IBM Linux Technology Center - PPC Toolchain >=20 > Inactive hide details for Denis Kirjanov ---09/23/2015 01:22:39 > PM---On 9/23/15, Michael Neuling <mikey@neuling.org> wrote: > TDenis > Kirjanov ---09/23/2015 01:22:39 PM---On 9/23/15, Michael Neuling > <mikey@neuling.org> wrote: > The 32 and 64 bit variants of > __get_datapag >=20 > From: Denis Kirjanov <kda@linux-powerpc.org> > To: Michael Neuling <mikey@neuling.org> > Cc: Michael Ellerman <michael@ellerman.id.au>, benh > <benh@kernel.crashing.org>, Aaron Sawdey/Rochester/IBM@IBMUS, > linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, Anton Blanchard > <anton@samba.org>, Steve Munroe/Rochester/IBM@IBMUS > Date: 09/23/2015 01:22 PM > Subject: Re: [PATCH] powerpc/vdso: Avoid link stack corruption in > __get_datapage() >=20 >=20 >=20 > ______________________________________________________________________ >=20 >=20 >=20 > On 9/23/15, Michael Neuling <mikey@neuling.org> wrote: > > The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to > > determine the loaded address of the VDSO. The current version of > these > > attempt to use the special bcl variant which avoids pushing to the > > link stack. > > > > Unfortunately it uses bcl+8 rather than the required bcl+4. Hence > the > > current code results in link stack corruption and the resulting > > performance degradation (due to branch mis-prediction). > > > > This patch moves us to bcl+4 by moving __kernel_datapage_offset > > out of __get_datapage(). > > > > With this patch, running the below benchmark we get a bump in > > performance on POWER8 for gettimeofday() (which uses > > __get_datapage()). > > > > 64bit gets ~4% improvement: > > Without patch: > > # ./tb > > time =3D 0.180321 > > With patch: > > # ./tb > > time =3D 0.187408 > > > > 32bit gets ~9% improvement: > > Without patch: > > # ./tb > > time =3D 0.276551 > > With patch: > > # ./tb > > time =3D 0.252767 >=20 > I've got the following results on POWER7 64bit > without the patch: > # ./tb > time =3D 0.263337 > # ./tb > time =3D 0.251273 > # ./tb > time =3D 0.258453 > # ./tb > time =3D 0.260189 >=20 > with the patch: > # ./tb > time =3D 0.241517 > # ./tb > time =3D 0.241973 > # ./tb > time =3D 0.239365 > # ./tb > time =3D 0.240109 >=20 >=20 >=20 > > > > Testcase tb.c (stolen from Anton) > > /* gcc -O2 tb.c -o tb */ > > #include <sys/time.h> > > #include <stdio.h> > > > > int main() > > { > > int i; > > > > struct timeval tv_start, tv_end; > > > > gettimeofday(&tv_start, NULL); > > > > for(i =3D 0; i < 10000000; i++) { > > gettimeofday(&tv_end, NULL); > > } > > > > printf("time =3D %.6f\n", tv_end.tv_sec - tv_start.tv_sec + > (tv_end.tv_usec > > - tv_start.tv_usec) * 1e-6); > > > > return 0; > > } > > > > Signed-off-by: Michael Neuling <mikey@neuling.org> > > Reported-by: Aaron Sawdey <sawdey@us.ibm.com> > > > > diff --git a/arch/powerpc/kernel/vdso32/datapage.S > > b/arch/powerpc/kernel/vdso32/datapage.S > > index dc21e89..59cf5f4 100644 > > --- a/arch/powerpc/kernel/vdso32/datapage.S > > +++ b/arch/powerpc/kernel/vdso32/datapage.S > > @@ -16,6 +16,10 @@ > > #include <asm/vdso.h> > > > > .text > > + .global __kernel_datapage_offset; > > +__kernel_datapage_offset: > > + .long 0 > > + > > V_FUNCTION_BEGIN(__get_datapage) > > .cfi_startproc > > /* We don't want that exposed or overridable as we want other > objects > > @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage) > > mflr r0 > > .cfi_register lr,r0 > > > > - bcl 20,31,1f > > - .global __kernel_datapage_offset; > > -__kernel_datapage_offset: > > - .long 0 > > -1: > > + bcl 20,31,data_page_branch > > +data_page_branch: > > mflr r3 > > mtlr r0 > > + addi r3, r3, __kernel_datapage_offset-data_page_branch > > lwz r0,0(r3) > > add r3,r0,r3 > > blr > > diff --git a/arch/powerpc/kernel/vdso64/datapage.S > > b/arch/powerpc/kernel/vdso64/datapage.S > > index 79796de..2f01c4a 100644 > > --- a/arch/powerpc/kernel/vdso64/datapage.S > > +++ b/arch/powerpc/kernel/vdso64/datapage.S > > @@ -16,6 +16,10 @@ > > #include <asm/vdso.h> > > > > .text > > +.global __kernel_datapage_offset; > > +__kernel_datapage_offset: > > + .long 0 > > + > > V_FUNCTION_BEGIN(__get_datapage) > > .cfi_startproc > > /* We don't want that exposed or overridable as we want other > objects > > @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage) > > mflr r0 > > .cfi_register lr,r0 > > > > - bcl 20,31,1f > > - .global __kernel_datapage_offset; > > -__kernel_datapage_offset: > > - .long 0 > > -1: > > + bcl 20,31,data_page_branch > > +data_page_branch: > > mflr r3 > > mtlr r0 > > + addi r3, r3, __kernel_datapage_offset-data_page_branch > > lwz r0,0(r3) > > add r3,r0,r3 > > blr > > > > > > _______________________________________________ > > Linuxppc-dev mailing list > > Linuxppc-dev@lists.ozlabs.org > > https://lists.ozlabs.org/listinfo/linuxppc-dev >=20 >=20 >=20 >=20 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() 2015-09-23 18:21 ` Denis Kirjanov 2015-09-23 18:35 ` Aaron Sawdey [not found] ` <201509231835.t8NIZr0Q029029@d01av04.pok.ibm.com> @ 2015-09-23 23:33 ` Michael Neuling 2 siblings, 0 replies; 8+ messages in thread From: Michael Neuling @ 2015-09-23 23:33 UTC (permalink / raw) To: Denis Kirjanov Cc: Michael Ellerman, benh, Aaron Sawdey, linuxppc-dev, Anton Blanchard, Steve Munroe > I've got the following results on POWER7 64bit > without the patch: > # ./tb > time =3D 0.263337 > # ./tb > time =3D 0.251273 > # ./tb > time =3D 0.258453 > # ./tb > time =3D 0.260189 >=20 > with the patch: > # ./tb > time =3D 0.241517 > # ./tb > time =3D 0.241973 > # ./tb > time =3D 0.239365 > # ./tb > time =3D 0.240109 Nice, I was expecting a bump there too but didn't test.. Looks like about 7% to me. =20 Thanks, Mikey ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() 2015-09-23 6:05 [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() Michael Neuling 2015-09-23 18:21 ` Denis Kirjanov @ 2015-09-23 22:23 ` Michael Ellerman 2015-09-24 10:10 ` Denis Kirjanov 1 sibling, 1 reply; 8+ messages in thread From: Michael Ellerman @ 2015-09-23 22:23 UTC (permalink / raw) To: Michael Neuling, benh Cc: linuxppc-dev, Anton Blanchard, Aaron Sawdey, Steve Munroe On 23 September 2015 16:05:02 GMT+10:00, Michael Neuling <mikey@neuling.org> wrote: >The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to >determine the loaded address of the VDSO. The current version of these >attempt to use the special bcl variant which avoids pushing to the >link stack. > >Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the >current code results in link stack corruption and the resulting >performance degradation (due to branch mis-prediction). > >This patch moves us to bcl+4 by moving __kernel_datapage_offset >out of __get_datapage(). > >With this patch, running the below benchmark we get a bump in >performance on POWER8 for gettimeofday() (which uses >__get_datapage()). > >64bit gets ~4% improvement: > Without patch: > # ./tb > time = 0.180321 > With patch: > # ./tb > time = 0.187408 > >32bit gets ~9% improvement: > Without patch: > # ./tb > time = 0.276551 > With patch: > # ./tb > time = 0.252767 > >Testcase tb.c (stolen from Anton) > /* gcc -O2 tb.c -o tb */ > #include <sys/time.h> > #include <stdio.h> > > int main() > { > int i; > > struct timeval tv_start, tv_end; > > gettimeofday(&tv_start, NULL); > > for(i = 0; i < 10000000; i++) { > gettimeofday(&tv_end, NULL); > } > > printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec + >(tv_end.tv_usec - tv_start.tv_usec) * 1e-6); > > return 0; > } You know where test cases are supposed to go. I know it's not a pass/fail test, but it's still useful. If it's in the tree it will get run as part of automated test runs and we will have a record of the result over time. cheers -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() 2015-09-23 22:23 ` Michael Ellerman @ 2015-09-24 10:10 ` Denis Kirjanov 2015-09-25 0:05 ` Michael Ellerman 0 siblings, 1 reply; 8+ messages in thread From: Denis Kirjanov @ 2015-09-24 10:10 UTC (permalink / raw) To: Michael Ellerman Cc: Michael Neuling, benh, Aaron Sawdey, linuxppc-dev, Anton Blanchard, Steve Munroe On 9/24/15, Michael Ellerman <michael@ellerman.id.au> wrote: > > > On 23 September 2015 16:05:02 GMT+10:00, Michael Neuling <mikey@neuling.org> > wrote: >>The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to >>determine the loaded address of the VDSO. The current version of these >>attempt to use the special bcl variant which avoids pushing to the >>link stack. >> >>Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the >>current code results in link stack corruption and the resulting >>performance degradation (due to branch mis-prediction). >> >>This patch moves us to bcl+4 by moving __kernel_datapage_offset >>out of __get_datapage(). >> >>With this patch, running the below benchmark we get a bump in >>performance on POWER8 for gettimeofday() (which uses >>__get_datapage()). >> >>64bit gets ~4% improvement: >> Without patch: >> # ./tb >> time = 0.180321 >> With patch: >> # ./tb >> time = 0.187408 >> >>32bit gets ~9% improvement: >> Without patch: >> # ./tb >> time = 0.276551 >> With patch: >> # ./tb >> time = 0.252767 >> >>Testcase tb.c (stolen from Anton) >> /* gcc -O2 tb.c -o tb */ >> #include <sys/time.h> >> #include <stdio.h> >> >> int main() >> { >> int i; >> >> struct timeval tv_start, tv_end; >> >> gettimeofday(&tv_start, NULL); >> >> for(i = 0; i < 10000000; i++) { >> gettimeofday(&tv_end, NULL); >> } >> >> printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec + >>(tv_end.tv_usec - tv_start.tv_usec) * 1e-6); >> >> return 0; >> } > > You know where test cases are supposed to go. > > I know it's not a pass/fail test, but it's still useful. If it's in the tree > it will get run as part of automated test runs and we will have a record of > the result over time. I can send a patch for it. > > cheers > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() 2015-09-24 10:10 ` Denis Kirjanov @ 2015-09-25 0:05 ` Michael Ellerman 0 siblings, 0 replies; 8+ messages in thread From: Michael Ellerman @ 2015-09-25 0:05 UTC (permalink / raw) To: Denis Kirjanov Cc: Michael Neuling, Steve Munroe, Anton Blanchard, Aaron Sawdey, linuxppc-dev On Thu, 2015-09-24 at 13:10 +0300, Denis Kirjanov wrote: > On 9/24/15, Michael Ellerman <michael@ellerman.id.au> wrote: > > On 23 September 2015 16:05:02 GMT+10:00, Michael Neuling <mikey@neuling.org> > > wrote: > >> > >>Testcase tb.c (stolen from Anton) > >> /* gcc -O2 tb.c -o tb */ > >> #include <sys/time.h> > >> #include <stdio.h> > >> > >> int main() > >> { > >> int i; > >> > >> struct timeval tv_start, tv_end; > >> > >> gettimeofday(&tv_start, NULL); > >> > >> for(i = 0; i < 10000000; i++) { > >> gettimeofday(&tv_end, NULL); > >> } > >> > >> printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec + > >>(tv_end.tv_usec - tv_start.tv_usec) * 1e-6); > >> > >> return 0; > >> } > > > > You know where test cases are supposed to go. > > > > I know it's not a pass/fail test, but it's still useful. If it's in the tree > > it will get run as part of automated test runs and we will have a record of > > the result over time. > > I can send a patch for it. Mikey was working on it I think. cheers ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-09-25 0:05 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-09-23 6:05 [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage() Michael Neuling 2015-09-23 18:21 ` Denis Kirjanov 2015-09-23 18:35 ` Aaron Sawdey [not found] ` <201509231835.t8NIZr0Q029029@d01av04.pok.ibm.com> 2015-09-23 23:28 ` Michael Neuling 2015-09-23 23:33 ` Michael Neuling 2015-09-23 22:23 ` Michael Ellerman 2015-09-24 10:10 ` Denis Kirjanov 2015-09-25 0:05 ` Michael Ellerman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).