Re: [RFC Patch]Use ar.kr2 for smp_processor_id

public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed

From: Zou Nan hai <nanhai.zou@intel.com>
To: linux-ia64@vger.kernel.org
Subject: Re: [RFC Patch]Use ar.kr2 for smp_processor_id
Date: Thu, 08 Feb 2007 07:14:54 +0000	[thread overview]
Message-ID: <1170918894.3230.32.camel@linux-znh> (raw)
In-Reply-To: <1170905324.3230.7.camel@linux-znh>

On Thu, 2007-02-08 at 16:40, Keith Owens wrote:
> Zou Nan hai (on 08 Feb 2007 13:11:49 +0800) wrote:
> >On Thu, 2007-02-08 at 14:55, Keith Owens wrote:
> >> Keith Owens (on Thu, 08 Feb 2007 17:37:54 +1100) wrote:
> >> Correction: ar.k3 contains the physical address of the per-cpu data
> >> area, virtual access to per-cpu data goes via the cpu local TLB and
> >> does not rely on an ar.k<n> variable.  ar.k3 is used in the MCA
> >> assembler handler, see GET_THIS_PADDR in include/asm-ia64/mca_asm.h
> >> and
> >> arch/ia64/kernel/mca_asm.S.
> >> 
> >
> > Since MCA is slow path, 
> > so I think put smp_processor_id in ar.kr3 is a gain.
> >
> > We could even optimize get_cpu_var based on this...
> 
> (1) Somebody else (not me) gets to fix up and test the MCA handler
>     assembler code - lots of luck.
> 
> (2) smp_processor_id() in the IA64 kernel is accessed via struct
>     thread_info.cpu.  That maps to a simple memory access with code
>     like this:
> 
>        adds r14252,r13
>        ;;
>        ld4 r15=[r14]
> 
>     The stop bits usually get amortized away with other code.
>     thread_info.cpu will normally be cached in L1 so reading
>     smp_processor_id() is relatively fast.
> 
> (3) Reading smp_processor_id() from ar.k3 in the kernel is 10 times
>     slower than the existing kernel code.  See the timing program
>     below.
> 
> (4) If the justification for storing cpu number in ar.k<n> is to speed
>     up user space, how can user space tell if the current kernel
> stores
>     the physical address of the per-cpu data in k3 or if it stores the
>     cpu number in k3?  Detecting which variant of the kernel is
> running
>     will slow down user space.
> 
> 
> Timing results on 'modprobe measure'
> 
> init_measure: empty_loop 2000007 cpu_loop 3000011 k3_loop 11999992
> 
> module measure.c
> 
> -----------------------------------------------------------------------
> 
> #include <linux/init.h>
> #include <linux/kernel.h>
> #include <linux/module.h>
> #include <linux/preempt.h>
> #include <asm/kregs.h>
> #include <asm/timex.h>
> 
> MODULE_LICENSE("GPL");
> 
> #define LOOPS 1000000
> 
> static int __init init_measure(void)
> {
>         int loop;
>         register int cpu;
>         unsigned long start, end, empty_loop, cpu_loop, k3_loop;
>         printk("%s: start\n", __FUNCTION__);
>         preempt_disable();
> 
>         local_irq_disable();
>         start = get_cycles();
>         barrier();
>         for (loop = 0; loop < LOOPS; ++loop) {
>                 /* ensure that all loops are the same size (2 bundles)
> */
>                 asm volatile ("nop 0; nop 0; nop 0;");
>                 barrier();
>         };
>         end = get_cycles();
>         barrier();
>         local_irq_enable();
>         empty_loop = end - start;
> 
>         local_irq_disable();
>         start = get_cycles();
>         barrier();
>         for (loop = 0; loop < LOOPS; ++loop) {
>                 /* hand code the read of smp_processor_id() to stop
> gcc moving
>                  * the address calculation outside the loop
>                  */
>                 asm volatile ("adds r14=%0,r13"
>                               ";;"
>                               "ld4 r15=[r14]"
>                               : :
>                               "i" (IA64_TASK_SIZE + offsetof(struct
> thread_info, cpu)) :
>                               "r14", "r15" );
>                 barrier();
>         };
>         end = get_cycles();
>         barrier();
>         local_irq_enable();
>         cpu_loop = end - start;
> 
>         local_irq_disable();
>         start = get_cycles();
>         barrier();
>         for (loop = 0; loop < LOOPS; ++loop) {
>                 cpu = ia64_get_kr(IA64_KR_PER_CPU_DATA);
>                 barrier();
>         };
>         end = get_cycles();
>         barrier();
>         local_irq_enable();
>         k3_loop = end - start;
> 
>         preempt_enable();
>         printk("%s: empty_loop %ld cpu_loop %ld k3_loop %ld\n",
> __FUNCTION__, empty_loop, cpu_loop, k3_loop);
>         return 0;
> }
> 
> static void __exit exit_measure(void)
> {
>         printk("%s: start\n", __FUNCTION__);
>         printk("%s: end\n", __FUNCTION__);
> }
> 
> module_init(init_measure)
> module_exit(exit_measure)
> 
> -----------------------------------------------------------------------
> 
> objdump of the interesting bits (the three loops):
> 
> empty loop:
> 
>   40:   09 08 00 50 00 21       [MMI]       mov r1=r40
>   46:   00 00 00 02 00 e0                   nop.m 0x0
>   4c:   81 6c 64 84                         adds r15272,r13;;
>   50:   0a 18 00 1e 10 10       [MMI]       ld4 r3=[r15];;
>   56:   20 08 0c 00 42 00                   adds r2=1,r3
>   5c:   00 00 04 00                         nop.i 0x0
>   60:   0b 00 00 00 01 00       [MMI]       nop.m 0x0;;
>   66:   00 10 3c 20 23 00                   st4 [r15]=r2
>   6c:   00 00 04 00                         nop.i 0x0;;
>   70:   0b 00 00 02 07 00       [MMI]       rsm 0x4000;;
>   76:   50 02 b0 44 08 00                   mov.m r37=ar.itc
>   7c:   00 00 04 00                         nop.i 0x0;;
>   80:   0b 70 fc 78 84 24       [MMI]       mov r14™9999;;
>   86:   00 00 00 02 00 00                   nop.m 0x0
>   8c:   e0 08 aa 00                         mov.i ar.lc=r14;;
>   90:   01 00 00 00 01 00       [MII]       nop.m 0x0
>   96:   00 00 00 02 00 00                   nop.i 0x0
>   9c:   00 00 04 00                         nop.i 0x0;;
>   a0:   10 00 00 00 01 00       [MIB]       nop.m 0x0
>   a6:   00 00 00 02 00 a0                   nop.i 0x0
>   ac:   f0 ff ff 48                         br.cloop.sptk.few 90
> <init_module+0x90>
>   b0:   0b 20 01 58 22 04       [MMI]       mov.m r36=ar.itc;;
>   b6:   00 00 04 0c 00 00                   ssm 0x4000
>   bc:   00 00 04 00                         nop.i 0x0;;
>   c0:   0b 00 00 00 30 00       [MMI]       srlz.d;;
> 
> Read smp_processor_id:
> 
>   c6:   00 00 04 0e 00 00                   rsm 0x4000
>   cc:   00 00 04 00                         nop.i 0x0;;
>   d0:   01 18 01 58 22 04       [MII]       mov.m r35=ar.itc
>   d6:   00 00 00 02 00 00                   nop.i 0x0
>   dc:   00 00 04 00                         nop.i 0x0;;
>   e0:   0a 40 fc 78 84 24       [MMI]       mov r8™9999;;
>   e6:   00 00 00 02 00 00                   nop.m 0x0
>   ec:   80 08 aa 00                         mov.i ar.lc=r8
>   f0:   0b 70 d0 1a 19 21       [MMI]       adds r14252,r13;;
>   f6:   f0 00 38 20 20 00                   ld4 r15=[r14]
>   fc:   00 00 04 00                         nop.i 0x0;;
>  100:   10 00 00 00 01 00       [MIB]       nop.m 0x0
>  106:   00 00 00 02 00 a0                   nop.i 0x0
>  10c:   f0 ff ff 48                         br.cloop.sptk.few f0
> <init_module+0xf0>
>  110:   0b 10 01 58 22 04       [MMI]       mov.m r34=ar.itc;;
>  116:   00 00 04 0c 00 00                   ssm 0x4000
>  11c:   00 00 04 00                         nop.i 0x0;;
>  120:   0b 00 00 00 30 00       [MMI]       srlz.d;;
> 
> Read ar.k3:
> 
>  126:   00 00 04 0e 00 00                   rsm 0x4000
>  12c:   00 00 04 00                         nop.i 0x0;;
>  130:   01 08 01 58 22 04       [MII]       mov.m r33=ar.itc
>  136:   00 00 00 02 00 00                   nop.i 0x0
>  13c:   00 00 04 00                         nop.i 0x0;;
>  140:   0a 48 fc 78 84 24       [MMI]       mov r9™9999;;
>  146:   00 00 00 02 00 00                   nop.m 0x0
>  14c:   90 08 aa 00                         mov.i ar.lc=r9
>  150:   01 70 00 06 22 04       [MII]       mov.m r14=ar.k3
>  156:   00 00 00 02 00 00                   nop.i 0x0
>  15c:   00 00 04 00                         nop.i 0x0;;
>  160:   10 00 00 00 01 00       [MIB]       nop.m 0x0
>  166:   00 00 00 02 00 a0                   nop.i 0x0
>  16c:   f0 ff ff 48                         br.cloop.sptk.few 150
> <init_module+0x150>
>  170:   0b 00 01 58 22 04       [MMI]       mov.m r32=ar.itc;;
>  176:   00 00 04 0c 00 00                   ssm 0x4000
>  17c:   00 00 04 00                         nop.i 0x0;;
>  180:   01 00 00 00 30 00       [MII]       srlz.d
> 
 
Ok, 
  I think using a static value to cache getcpu will heavily bounced on
that cache line contain the static value if multi cpus calls getcpu very
frequently. 

  then implement current_thread_info()->cpu in fsys call should be
better?

Thanks
Zou Nan hai

next prev parent reply	other threads:[~2007-02-08  7:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-08  3:28 [RFC Patch]Use ar.kr2 for smp_processor_id Zou Nan hai
2007-02-08  4:27 ` Zou Nan hai
2007-02-08  4:59 ` Zou Nan hai
2007-02-08  5:11 ` Zou Nan hai
2007-02-08  6:04 ` Keith Owens
2007-02-08  6:37 ` Keith Owens
2007-02-08  6:55 ` Keith Owens
2007-02-08  7:14 ` Zou Nan hai [this message]
2007-02-08  7:38 ` Zou Nan hai
2007-02-08  8:28 ` peterc
2007-02-08  8:40 ` Keith Owens
2007-02-08 18:03 ` Luck, Tony
2007-02-08 23:59 ` Keith Owens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1170918894.3230.32.camel@linux-znh \
    --to=nanhai.zou@intel.com \
    --cc=linux-ia64@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox