From: Zou Nan hai <nanhai.zou@intel.com>
To: linux-ia64@vger.kernel.org
Subject: Re: [RFC Patch]Use ar.kr2 for smp_processor_id
Date: Thu, 08 Feb 2007 07:14:54 +0000 [thread overview]
Message-ID: <1170918894.3230.32.camel@linux-znh> (raw)
In-Reply-To: <1170905324.3230.7.camel@linux-znh>
On Thu, 2007-02-08 at 16:40, Keith Owens wrote:
> Zou Nan hai (on 08 Feb 2007 13:11:49 +0800) wrote:
> >On Thu, 2007-02-08 at 14:55, Keith Owens wrote:
> >> Keith Owens (on Thu, 08 Feb 2007 17:37:54 +1100) wrote:
> >> Correction: ar.k3 contains the physical address of the per-cpu data
> >> area, virtual access to per-cpu data goes via the cpu local TLB and
> >> does not rely on an ar.k<n> variable. ar.k3 is used in the MCA
> >> assembler handler, see GET_THIS_PADDR in include/asm-ia64/mca_asm.h
> >> and
> >> arch/ia64/kernel/mca_asm.S.
> >>
> >
> > Since MCA is slow path,
> > so I think put smp_processor_id in ar.kr3 is a gain.
> >
> > We could even optimize get_cpu_var based on this...
>
> (1) Somebody else (not me) gets to fix up and test the MCA handler
> assembler code - lots of luck.
>
> (2) smp_processor_id() in the IA64 kernel is accessed via struct
> thread_info.cpu. That maps to a simple memory access with code
> like this:
>
> adds r14252,r13
> ;;
> ld4 r15=[r14]
>
> The stop bits usually get amortized away with other code.
> thread_info.cpu will normally be cached in L1 so reading
> smp_processor_id() is relatively fast.
>
> (3) Reading smp_processor_id() from ar.k3 in the kernel is 10 times
> slower than the existing kernel code. See the timing program
> below.
>
> (4) If the justification for storing cpu number in ar.k<n> is to speed
> up user space, how can user space tell if the current kernel
> stores
> the physical address of the per-cpu data in k3 or if it stores the
> cpu number in k3? Detecting which variant of the kernel is
> running
> will slow down user space.
>
>
> Timing results on 'modprobe measure'
>
> init_measure: empty_loop 2000007 cpu_loop 3000011 k3_loop 11999992
>
> module measure.c
>
> -----------------------------------------------------------------------
>
> #include <linux/init.h>
> #include <linux/kernel.h>
> #include <linux/module.h>
> #include <linux/preempt.h>
> #include <asm/kregs.h>
> #include <asm/timex.h>
>
> MODULE_LICENSE("GPL");
>
> #define LOOPS 1000000
>
> static int __init init_measure(void)
> {
> int loop;
> register int cpu;
> unsigned long start, end, empty_loop, cpu_loop, k3_loop;
> printk("%s: start\n", __FUNCTION__);
> preempt_disable();
>
> local_irq_disable();
> start = get_cycles();
> barrier();
> for (loop = 0; loop < LOOPS; ++loop) {
> /* ensure that all loops are the same size (2 bundles)
> */
> asm volatile ("nop 0; nop 0; nop 0;");
> barrier();
> };
> end = get_cycles();
> barrier();
> local_irq_enable();
> empty_loop = end - start;
>
> local_irq_disable();
> start = get_cycles();
> barrier();
> for (loop = 0; loop < LOOPS; ++loop) {
> /* hand code the read of smp_processor_id() to stop
> gcc moving
> * the address calculation outside the loop
> */
> asm volatile ("adds r14=%0,r13"
> ";;"
> "ld4 r15=[r14]"
> : :
> "i" (IA64_TASK_SIZE + offsetof(struct
> thread_info, cpu)) :
> "r14", "r15" );
> barrier();
> };
> end = get_cycles();
> barrier();
> local_irq_enable();
> cpu_loop = end - start;
>
> local_irq_disable();
> start = get_cycles();
> barrier();
> for (loop = 0; loop < LOOPS; ++loop) {
> cpu = ia64_get_kr(IA64_KR_PER_CPU_DATA);
> barrier();
> };
> end = get_cycles();
> barrier();
> local_irq_enable();
> k3_loop = end - start;
>
> preempt_enable();
> printk("%s: empty_loop %ld cpu_loop %ld k3_loop %ld\n",
> __FUNCTION__, empty_loop, cpu_loop, k3_loop);
> return 0;
> }
>
> static void __exit exit_measure(void)
> {
> printk("%s: start\n", __FUNCTION__);
> printk("%s: end\n", __FUNCTION__);
> }
>
> module_init(init_measure)
> module_exit(exit_measure)
>
> -----------------------------------------------------------------------
>
> objdump of the interesting bits (the three loops):
>
> empty loop:
>
> 40: 09 08 00 50 00 21 [MMI] mov r1=r40
> 46: 00 00 00 02 00 e0 nop.m 0x0
> 4c: 81 6c 64 84 adds r15272,r13;;
> 50: 0a 18 00 1e 10 10 [MMI] ld4 r3=[r15];;
> 56: 20 08 0c 00 42 00 adds r2=1,r3
> 5c: 00 00 04 00 nop.i 0x0
> 60: 0b 00 00 00 01 00 [MMI] nop.m 0x0;;
> 66: 00 10 3c 20 23 00 st4 [r15]=r2
> 6c: 00 00 04 00 nop.i 0x0;;
> 70: 0b 00 00 02 07 00 [MMI] rsm 0x4000;;
> 76: 50 02 b0 44 08 00 mov.m r37=ar.itc
> 7c: 00 00 04 00 nop.i 0x0;;
> 80: 0b 70 fc 78 84 24 [MMI] mov r14™9999;;
> 86: 00 00 00 02 00 00 nop.m 0x0
> 8c: e0 08 aa 00 mov.i ar.lc=r14;;
> 90: 01 00 00 00 01 00 [MII] nop.m 0x0
> 96: 00 00 00 02 00 00 nop.i 0x0
> 9c: 00 00 04 00 nop.i 0x0;;
> a0: 10 00 00 00 01 00 [MIB] nop.m 0x0
> a6: 00 00 00 02 00 a0 nop.i 0x0
> ac: f0 ff ff 48 br.cloop.sptk.few 90
> <init_module+0x90>
> b0: 0b 20 01 58 22 04 [MMI] mov.m r36=ar.itc;;
> b6: 00 00 04 0c 00 00 ssm 0x4000
> bc: 00 00 04 00 nop.i 0x0;;
> c0: 0b 00 00 00 30 00 [MMI] srlz.d;;
>
> Read smp_processor_id:
>
> c6: 00 00 04 0e 00 00 rsm 0x4000
> cc: 00 00 04 00 nop.i 0x0;;
> d0: 01 18 01 58 22 04 [MII] mov.m r35=ar.itc
> d6: 00 00 00 02 00 00 nop.i 0x0
> dc: 00 00 04 00 nop.i 0x0;;
> e0: 0a 40 fc 78 84 24 [MMI] mov r8™9999;;
> e6: 00 00 00 02 00 00 nop.m 0x0
> ec: 80 08 aa 00 mov.i ar.lc=r8
> f0: 0b 70 d0 1a 19 21 [MMI] adds r14252,r13;;
> f6: f0 00 38 20 20 00 ld4 r15=[r14]
> fc: 00 00 04 00 nop.i 0x0;;
> 100: 10 00 00 00 01 00 [MIB] nop.m 0x0
> 106: 00 00 00 02 00 a0 nop.i 0x0
> 10c: f0 ff ff 48 br.cloop.sptk.few f0
> <init_module+0xf0>
> 110: 0b 10 01 58 22 04 [MMI] mov.m r34=ar.itc;;
> 116: 00 00 04 0c 00 00 ssm 0x4000
> 11c: 00 00 04 00 nop.i 0x0;;
> 120: 0b 00 00 00 30 00 [MMI] srlz.d;;
>
> Read ar.k3:
>
> 126: 00 00 04 0e 00 00 rsm 0x4000
> 12c: 00 00 04 00 nop.i 0x0;;
> 130: 01 08 01 58 22 04 [MII] mov.m r33=ar.itc
> 136: 00 00 00 02 00 00 nop.i 0x0
> 13c: 00 00 04 00 nop.i 0x0;;
> 140: 0a 48 fc 78 84 24 [MMI] mov r9™9999;;
> 146: 00 00 00 02 00 00 nop.m 0x0
> 14c: 90 08 aa 00 mov.i ar.lc=r9
> 150: 01 70 00 06 22 04 [MII] mov.m r14=ar.k3
> 156: 00 00 00 02 00 00 nop.i 0x0
> 15c: 00 00 04 00 nop.i 0x0;;
> 160: 10 00 00 00 01 00 [MIB] nop.m 0x0
> 166: 00 00 00 02 00 a0 nop.i 0x0
> 16c: f0 ff ff 48 br.cloop.sptk.few 150
> <init_module+0x150>
> 170: 0b 00 01 58 22 04 [MMI] mov.m r32=ar.itc;;
> 176: 00 00 04 0c 00 00 ssm 0x4000
> 17c: 00 00 04 00 nop.i 0x0;;
> 180: 01 00 00 00 30 00 [MII] srlz.d
>
Ok,
I think using a static value to cache getcpu will heavily bounced on
that cache line contain the static value if multi cpus calls getcpu very
frequently.
then implement current_thread_info()->cpu in fsys call should be
better?
Thanks
Zou Nan hai
next prev parent reply other threads:[~2007-02-08 7:14 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-02-08 3:28 [RFC Patch]Use ar.kr2 for smp_processor_id Zou Nan hai
2007-02-08 4:27 ` Zou Nan hai
2007-02-08 4:59 ` Zou Nan hai
2007-02-08 5:11 ` Zou Nan hai
2007-02-08 6:04 ` Keith Owens
2007-02-08 6:37 ` Keith Owens
2007-02-08 6:55 ` Keith Owens
2007-02-08 7:14 ` Zou Nan hai [this message]
2007-02-08 7:38 ` Zou Nan hai
2007-02-08 8:28 ` peterc
2007-02-08 8:40 ` Keith Owens
2007-02-08 18:03 ` Luck, Tony
2007-02-08 23:59 ` Keith Owens
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1170918894.3230.32.camel@linux-znh \
--to=nanhai.zou@intel.com \
--cc=linux-ia64@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox