linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Michael Neuling <mikey@neuling.org>
To: mpe@ellerman.id.au, benh@kernel.crashing.org
Cc: mikey@neuling.org, anton@samba.org, linuxppc-dev@ozlabs.org,
	Aaron Sawdey <sawdey@us.ibm.com>,
	Denis Kirjanov <kda@linux-powerpc.org>,
	Steve Munroe <sjmunroe@us.ibm.com>
Subject: [PATCH v2 2/2] powerpc/vdso: Avoid link stack corruption in __get_datapage()
Date: Fri, 25 Sep 2015 14:01:40 +1000	[thread overview]
Message-ID: <1443153700-15395-3-git-send-email-mikey@neuling.org> (raw)
In-Reply-To: <1443153700-15395-1-git-send-email-mikey@neuling.org>

powerpc has a link register (lr) used for calling functions. We "bl
<func>" to call a function, and "blr" to return back to the call site.

The lr is only a single register, so if we call another function from
inside this function (ie. nested calls), software must save away the
lr on the software stack before calling the new function. Before
returning (ie. before the "blr"), the lr is restored by software from
the software stack.

This makes branch prediction quite difficult for the processor as it
will only know the branch target just before the "blr".

To help with this, modern powerpc processors keep a (non-architected)
hardware stack of lr called a "link stack". When a "bl <func>" is
run, the lr is pushed onto this stack. When a "blr" is called, the
branch predictor pops the lr value from the top of the link stack, and
uses it to predict the branch target. Hence the processor pipeline
knows a lot earlier the branch target.

This works great but there are some cases where you call "bl" but
without a matching "blr". Once such case is when trying to determine
the program counter (which can't be read directly). Here you "bl+4;
mflr" to get the program counter. If you do this, the link stack will
get out of sync with reality, causing the branch predictor to
mis-predict subsequent function returns.

To avoid this, modern micro-architectures have a special case of bl.
Using the form "bcl 20,31,+4", ensures the processor doesn't push to
the link stack.

The 32 and 64 bit variants of __get_datapage() use a "bl; mflr" to
determine the loaded address of the VDSO. The current versions of
these attempt to use this special bl variant.

Unfortunately they use +8 rather than the required +4. Hence the
current code results in the link stack getting out of sync with
reality and hence the resulting performance degradation.

This patch moves it to bcl+4 by moving __kernel_datapage_offset out of
__get_datapage().

With this patch, running a gettimeofday() (which uses
__get_datapage()) microbenchmark we get a decent bump in performance
on POWER7/8.

For the benchmark in tools/testing/selftests/powerpc/benchmarks/gettimeofday.c
  POWER8:
    64bit gets ~4% improvement
    32bit gets ~9% improvement
  POWER7:
    64bit gets ~7% improvement

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reported-by: Aaron Sawdey <sawdey@us.ibm.com>

---
Giant-commit-log-essay-suggested-by: mpe
---
 arch/powerpc/kernel/vdso32/datapage.S | 12 +++++++-----
 arch/powerpc/kernel/vdso64/datapage.S | 12 +++++++-----
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/vdso32/datapage.S b/arch/powerpc/kernel/vdso32/datapage.S
index dc21e89..59cf5f4 100644
--- a/arch/powerpc/kernel/vdso32/datapage.S
+++ b/arch/powerpc/kernel/vdso32/datapage.S
@@ -16,6 +16,10 @@
 #include <asm/vdso.h>
 
 	.text
+	.global	__kernel_datapage_offset;
+__kernel_datapage_offset:
+	.long	0
+
 V_FUNCTION_BEGIN(__get_datapage)
   .cfi_startproc
 	/* We don't want that exposed or overridable as we want other objects
@@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
 	mflr	r0
   .cfi_register lr,r0
 
-	bcl	20,31,1f
-	.global	__kernel_datapage_offset;
-__kernel_datapage_offset:
-	.long	0
-1:
+	bcl	20,31,data_page_branch
+data_page_branch:
 	mflr	r3
 	mtlr	r0
+	addi	r3, r3, __kernel_datapage_offset-data_page_branch
 	lwz	r0,0(r3)
 	add	r3,r0,r3
 	blr
diff --git a/arch/powerpc/kernel/vdso64/datapage.S b/arch/powerpc/kernel/vdso64/datapage.S
index 79796de..2f01c4a 100644
--- a/arch/powerpc/kernel/vdso64/datapage.S
+++ b/arch/powerpc/kernel/vdso64/datapage.S
@@ -16,6 +16,10 @@
 #include <asm/vdso.h>
 
 	.text
+.global	__kernel_datapage_offset;
+__kernel_datapage_offset:
+	.long	0
+
 V_FUNCTION_BEGIN(__get_datapage)
   .cfi_startproc
 	/* We don't want that exposed or overridable as we want other objects
@@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)
 	mflr	r0
   .cfi_register lr,r0
 
-	bcl	20,31,1f
-	.global	__kernel_datapage_offset;
-__kernel_datapage_offset:
-	.long	0
-1:
+	bcl	20,31,data_page_branch
+data_page_branch:
 	mflr	r3
 	mtlr	r0
+	addi	r3, r3, __kernel_datapage_offset-data_page_branch
 	lwz	r0,0(r3)
 	add	r3,r0,r3
 	blr
-- 
2.1.4

  parent reply	other threads:[~2015-09-25  4:01 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-25  4:01 [PATCH v2 0/2] powerpc: powerpc/vdso: Avoid link stack corruption in __get_datapage() Michael Neuling
2015-09-25  4:01 ` [PATCH v2 1/2] powerpc/selftest: Add gettimeofday() benchmark Michael Neuling
2015-09-25  7:39   ` Arnd Bergmann
2015-09-25  9:28     ` Denis Kirjanov
2015-09-25  9:37       ` Gabriel Paubert
2015-09-25 12:47         ` Denis Kirjanov
2015-09-28  8:58         ` Michael Neuling
2015-09-28  9:56           ` Denis Kirjanov
2015-09-30  1:49             ` Michael Ellerman
2015-10-02  7:47   ` [v2,1/2] " Michael Ellerman
2015-09-25  4:01 ` Michael Neuling [this message]
2015-10-02  7:47   ` [v2, 2/2] powerpc/vdso: Avoid link stack corruption in __get_datapage() Michael Ellerman
2015-10-02  8:24     ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1443153700-15395-3-git-send-email-mikey@neuling.org \
    --to=mikey@neuling.org \
    --cc=anton@samba.org \
    --cc=benh@kernel.crashing.org \
    --cc=kda@linux-powerpc.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=sawdey@us.ibm.com \
    --cc=sjmunroe@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).