linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Austin <jonathan.austin@arm.com>
To: "André Hentschel" <nerv@dawncrow.de>
Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	Russell King - ARM Linux <linux@arm.linux.org.uk>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Will Deacon <Will.Deacon@arm.com>
Subject: Re: arm: Only load TLS values when needed
Date: Tue, 16 Jul 2013 18:31:23 +0100	[thread overview]
Message-ID: <51E5836B.1010904@arm.com> (raw)
In-Reply-To: <51E42E11.1010903@dawncrow.de>

Hi André,

On 15/07/13 18:14, André Hentschel wrote:
> From: André Hentschel <nerv@dawncrow.de>
>
> This patch intents to reduce loading instructions when the resulting value is not used.
> It's a follow up on a4780adeefd042482f624f5e0d577bf9cdcbb760
>

Have you done any benchmarking to see that this has any real impact? Or 
tested on a !Vv6k system? It looks possible that the only case where 
this will perform better is where we're using switch_tls_none or 
switch_tls_software (both rare cases, I think) and there's some change 
it will make things worse in other cases?

One of the reasons for Russell's suggestion of placing the ldrd (which 
became the two ldr instructions you've removed from __switch_to, in 
order to maintain building for older cores) where it is was in order to 
reduce the chance of pipeline stalls.

As I've pointed out below, there is some risk that changing that has 
implications for the v6 only case below (and the v6k case is now more 
prone to stalls with !CONFIG_CPU_USE_DOMAINS, but newer cores should 
have more advanced scheduling to avoid such issues anyway...)

> Signed-off-by: André Hentschel <nerv@dawncrow.de>
>
> ---
> This patch is against Linux 3.11-rc1 (ad81f0545ef01ea651886dddac4bef6cec930092)
>
> Thanks to everyone who helped me with a4780adeefd042482f624f5e0d577bf9cdcbb760
>
> diff --git a/arch/arm/include/asm/tls.h b/arch/arm/include/asm/tls.h
> index 83259b8..3742722 100644
> --- a/arch/arm/include/asm/tls.h
> +++ b/arch/arm/include/asm/tls.h
> @@ -3,29 +3,32 @@
>
>   #ifdef __ASSEMBLY__
>   #include <asm/asm-offsets.h>
> -	.macro switch_tls_none, base, tp, tpuser, tmp1, tmp2
> +	.macro switch_tls_none, prev, next, tp, tpuser, tmp1, tmp2
>   	.endm
>
> -	.macro switch_tls_v6k, base, tp, tpuser, tmp1, tmp2
> +	.macro switch_tls_v6k, prev, next, tp, tpuser, tmp1, tmp2
> +	ldrd	\tp, \tpuser, [\next, #TI_TP_VALUE]	@ get the next TLS and user r/w register
>   	mrc	p15, 0, \tmp2, c13, c0, 2	@ get the user r/w register
>   	mcr	p15, 0, \tp, c13, c0, 3		@ set TLS register
>   	mcr	p15, 0, \tpuser, c13, c0, 2	@ and the user r/w register
> -	str	\tmp2, [\base, #TI_TP_VALUE + 4] @ save it
> +	str	\tmp2, [\prev, #TI_TP_VALUE + 4] @ save it
>   	.endm
>
> -	.macro switch_tls_v6, base, tp, tpuser, tmp1, tmp2
> +	.macro switch_tls_v6, prev, next, tp, tpuser, tmp1, tmp2
>   	ldr	\tmp1, =elf_hwcap
>   	ldr	\tmp1, [\tmp1, #0]
>   	mov	\tmp2, #0xffff0fff
> +	ldr	\tp, [\next, #TI_TP_VALUE]	@ get the next TLS register
>   	tst	\tmp1, #HWCAP_TLS		@ hardware TLS available?
>   	streq	\tp, [\tmp2, #-15]		@ set TLS value at 0xffff0ff0
> -	mrcne	p15, 0, \tmp2, c13, c0, 2	@ get the user r/w register
> +	mrcne	p15, 0, \tmp2, c13, c0, 2	@ get the previous user r/w register
> +	ldrne	\tpuser, [\next, #TI_TP_VALUE + 4]	@ get the next user r/w register
>   	mcrne	p15, 0, \tp, c13, c0, 3		@ yes, set TLS register
>   	mcrne	p15, 0, \tpuser, c13, c0, 2	@ set user r/w register

Now we've only got one instruction between the store and the load and 
risk stalling the pipeline...

Dave M cautiously says "The ancient advice was that one instruction was 
enough" but this is very core dependent... I wonder if anyone has a good 
idea about whether this is an issue here...?

> -	strne	\tmp2, [\base, #TI_TP_VALUE + 4] @ save it
> +	strne	\tmp2, [\prev, #TI_TP_VALUE + 4] @ save it
>   	.endm
>
> -	.macro switch_tls_software, base, tp, tpuser, tmp1, tmp2
> +	.macro switch_tls_software, prev, next, tp, tpuser, tmp1, tmp2
>   	mov	\tmp1, #0xffff0fff
>   	str	\tp, [\tmp1, #-15]		@ set TLS value at 0xffff0ff0
>   	.endm
> diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
> index a39cfc2a1..1484b59 100644
> --- a/arch/arm/kernel/entry-armv.S
> +++ b/arch/arm/kernel/entry-armv.S
> @@ -689,12 +689,10 @@ ENTRY(__switch_to)
>    THUMB(	stmia	ip!, {r4 - sl, fp}	   )	@ Store most regs on stack
>    THUMB(	str	sp, [ip], #4		   )
>    THUMB(	str	lr, [ip], #4		   )
> -	ldr	r4, [r2, #TI_TP_VALUE]
> -	ldr	r5, [r2, #TI_TP_VALUE + 4]
>   #ifdef CONFIG_CPU_USE_DOMAINS
>   	ldr	r6, [r2, #TI_CPU_DOMAIN]
>   #endif
> -	switch_tls r1, r4, r5, r3, r7
> +	switch_tls r1, r2, r4, r5, r3, r7
>   #if defined(CONFIG_CC_STACKPROTECTOR) && !defined(CONFIG_SMP)
>   	ldr	r7, [r2, #TI_TASK]
>   	ldr	r8, =__stack_chk_guard
>

Jonny

  reply	other threads:[~2013-07-16 17:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-15 17:14 arm: Only load TLS values when needed André Hentschel
2013-07-16 17:31 ` Jonathan Austin [this message]
2013-07-16 19:27   ` André Hentschel
2013-07-17 11:10     ` Jonathan Austin
2013-07-17 19:49       ` André Hentschel
     [not found]         ` <520B8F37.4040609@dawncrow.de>
     [not found]           ` <520BAE58.3060600@arm.com>
2013-08-14 21:21             ` André Hentschel
2013-08-15 17:29               ` Jonathan Austin
2013-08-15 18:27                 ` André Hentschel
2013-08-15 18:27                   ` André Hentschel
2013-08-26 19:19                   ` André Hentschel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51E5836B.1010904@arm.com \
    --to=jonathan.austin@arm.com \
    --cc=Will.Deacon@arm.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=nerv@dawncrow.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).