linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: slash.tmp@free.fr (Mason)
To: linux-arm-kernel@lists.infradead.org
Subject: read_cpuid_id() in arch/arm/kernel/setup.c
Date: Fri, 13 Mar 2015 17:39:23 +0100	[thread overview]
Message-ID: <550312BB.8020902@free.fr> (raw)
In-Reply-To: <20150313161953.GS8656@n2100.arm.linux.org.uk>

On 13/03/2015 17:19, Russell King - ARM Linux wrote:
> On Fri, Mar 13, 2015 at 05:03:52PM +0100, Mason wrote:
>> Hello everyone,
>>
>> As far as I can tell, read_cpuid_id() resolves to read_cpuid(CPUID_ID)
>> which resolves to mrc 15, 0, rN, cr0, cr0, {0}
>>
>> Consider this:
>>
>> /*
>>   * The CPU ID never changes at run time, so we might as well tell the
>>   * compiler that it's constant.  Use this function to read the CPU ID
>>   * rather than directly reading processor_id or read_cpuid() directly.
>>   */
>> static inline unsigned int __attribute_const__ read_cpuid_id(void)
>> {
>> 	return read_cpuid(CPUID_ID);
>> }
>>
>> Despite the comment and attribute, my compiler(*) still reloads the
>> value every time.
>>
>> (*) gcc version 4.9.3 20141031 (prerelease) (Linaro GCC 2014.11)
>>
>> e.g.
>>
>> static int __get_cpu_architecture(void)
>> {
>> 	int cpu_arch;
>>
>> 	unsigned int id = read_cpuid_id();
>> 	if ((id & 0x0008f000) == 0) {
>> 		cpu_arch = CPU_ARCH_UNKNOWN;
>> 	} else if ((id & 0x0008f000) == 0x00007000) {
>> 		cpu_arch = (id & (1 << 23)) ? CPU_ARCH_ARMv4T : CPU_ARCH_ARMv3;
>> 	} else if ((id & 0x00080000) == 0x00000000) {
>> 		cpu_arch = (id >> 16) & 7;
>> 		if (cpu_arch)
>> 			cpu_arch += CPU_ARCH_ARMv3;
>> 	} else if ((id & 0x000f0000) == 0x000f0000) {
>>
>> resolves to
>>
>> c01fec74:       ee10cf10        mrc     15, 0, ip, cr0, cr0, {0}
>> c01fec78:       e21cca8f        ands    ip, ip, #585728 ; 0x8f000
>> c01fec7c:       e34c3023        movt    r3, #49187      ; 0xc023
>> c01fec80:       e5837008        str     r7, [r3, #8]
>> c01fec84:       e50b304c        str     r3, [fp, #-76]  ; 0x4c
>> c01fec88:       0a000022        beq     c01fed18 <setup_arch+0xe4>
>> c01fec8c:       ee103f10        mrc     15, 0, r3, cr0, cr0, {0}
>> c01fec90:       e2033a8f        and     r3, r3, #585728 ; 0x8f000
>> c01fec94:       e3530a07        cmp     r3, #28672      ; 0x7000
>> c01fec98:       1a000004        bne     c01fecb0 <setup_arch+0x7c>
>> c01fec9c:       ee103f10        mrc     15, 0, r3, cr0, cr0, {0}
>> c01feca0:       e3130502        tst     r3, #8388608    ; 0x800000
>> c01feca4:       13a0c003        movne   ip, #3
>> c01feca8:       03a0c001        moveq   ip, #1
>> c01fecac:       ea000019        b       c01fed18 <setup_arch+0xe4>
>> c01fecb0:       ee103f10        mrc     15, 0, r3, cr0, cr0, {0}
>> c01fecb4:       e3130702        tst     r3, #524288     ; 0x80000
>>
>>
>> So I thought it would be nice to give the poor compiler a break,
>> and just stuff the result in a local variable:
>
> NAK.
>
> Your compiler behaviour is different to mine (stock gcc 4.7.4):
>
>   4f8:   e1a0c00d        mov     ip, sp
>   4fc:   e92ddff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr, pc}
>   500:   e24cb004        sub     fp, ip, #4
>   504:   e24dd00c        sub     sp, sp, #12
>   508:   ee105f10        mrc     15, 0, r5, cr0, cr0, {0} <== load id
>   50c:   e1a07000        mov     r7, r0
>   510:   e1a00005        mov     r0, r5			<== r5 = id
>   514:   ebfffffe        bl      0 <lookup_processor_type>
>   518:   e2504000        subs    r4, r0, #0
>   51c:   1a000003        bne     530 <setup_arch+0x38>
>   520:   e59f063c        ldr     r0, [pc, #1596] ; b64 <setup_arch+0x66c>
>   524:   e1a01005        mov     r1, r5
>   528:   ebfffffe        bl      0 <printk>
>   52c:   eafffffe        b       52c <setup_arch+0x34>
>   530:   e5948020        ldr     r8, [r4, #32]
>   534:   e2153a8f        ands    r3, r5, #585728 ; 0x8f000 <== r5 = id
>   538:   e59f2628        ldr     r2, [pc, #1576] ; b68 <setup_arch+0x670>
>   538:   e59f2628        ldr     r2, [pc, #1576] ; b68 <setup_arch+0x670>
>   53c:   e5828000        str     r8, [r2]
>   540:   0a00001f        beq     5c4 <setup_arch+0xcc>
>   544:   e3530a07        cmp     r3, #28672      ; 0x7000
>   548:   1a000003        bne     55c <setup_arch+0x64>
>   54c:   e3150502        tst     r5, #8388608    ; 0x800000 <== r5 = id
>   550:   03a03001        moveq   r3, #1
>   554:   13a03003        movne   r3, #3
>   558:   ea000019        b       5c4 <setup_arch+0xcc>
>   55c:   e3150702        tst     r5, #524288     ; 0x80000 <== r5 = id
>   560:   1a000003        bne     574 <setup_arch+0x7c>
>   564:   e7e23855        ubfx    r3, r5, #16, #3		<== r5 = id
>   568:   e3530000        cmp     r3, #0
>   56c:   12833001        addne   r3, r3, #1
>   570:   ea000013        b       5c4 <setup_arch+0xcc>
>   574:   e205580f        and     r5, r5, #983040 ; 0xf0000 <== r5 = id
>   578:   e355080f        cmp     r5, #983040     ; 0xf0000
>   57c:   13a03000        movne   r3, #0
>   580:   1a00000f        bne     5c4 <setup_arch+0xcc>
> ...
>
> The point here is that the compiler is free to optimise the code as it
> sees fit - if it decides that the register pressure from having the
> value cached in a register is too high, it can decide to spill the
> cached value, and reload it from CP15 as and when it needs to.  That
> is an advantage.

Good point. I hadn't thought of that.

Do you know the latency of an mrc instruction? (compared to a mov)

> It seems that GCC 4.7.4 optimises better than Linaro's 4.9.3.  In fact,
> it looks like Linaro's 4.9.3 is rather buggy as far as optimisation
> goes.
>
> Later compilers aren't always better.

I did NOT expect that. Compiler optimizations passes are so fragile.

Anyway, here's another proposed nano-improvement ;-)
(This one is code factorization)

--- setup.c	2015-03-03 18:04:59.000000000 +0100
+++ setup.bar.c	2015-03-13 17:29:23.800668429 +0100
@@ -246,12 +246,9 @@
  		if (cpu_arch)
  			cpu_arch += CPU_ARCH_ARMv3;
  	} else if ((read_cpuid_id() & 0x000f0000) == 0x000f0000) {
-		unsigned int mmfr0;
-
  		/* Revised CPUID format. Read the Memory Model Feature
  		 * Register 0 and check for VMSAv7 or PMSAv7 */
-		asm("mrc	p15, 0, %0, c0, c1, 4"
-		    : "=r" (mmfr0));
+		unsigned int mmfr0 = read_cpuid_ext(CPUID_EXT_MMFR0);
  		if ((mmfr0 & 0x0000000f) >= 0x00000003 ||
  		    (mmfr0 & 0x000000f0) >= 0x00000030)
  			cpu_arch = CPU_ARCH_ARMv7;


This one looks good, doesn't it? :-)

Regards.

  reply	other threads:[~2015-03-13 16:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-13 16:03 read_cpuid_id() in arch/arm/kernel/setup.c Mason
2015-03-13 16:19 ` Russell King - ARM Linux
2015-03-13 16:39   ` Mason [this message]
2015-03-13 16:45     ` Russell King - ARM Linux
2015-03-13 17:06       ` Mason
2015-03-15 17:40       ` Mason
2015-03-16  8:44         ` Mason
2015-03-16 16:54           ` Paul Walmsley
2015-03-16 22:17             ` Mason
2015-03-16 23:30               ` Mason
2015-03-13 16:56 ` Mark Rutland
2015-03-13 17:02   ` Russell King - ARM Linux
2015-03-13 18:26     ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=550312BB.8020902@free.fr \
    --to=slash.tmp@free.fr \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).