From: Eric Dumazet <dada1@cosmosbay.com>
To: Andi Kleen <ak@suse.de>
Cc: linux-kernel@vger.kernel.org
Subject: [x86_64] Strange oprofile results on access to per_cpu data
Date: Fri, 03 Nov 2006 08:26:05 +0100 [thread overview]
Message-ID: <454AEF0D.1090402@cosmosbay.com> (raw)
In-Reply-To: <200611030356.54074.ak@suse.de>
Hi Andi
While doing some oprofile analysis, I got this result on ip_route_input() :
one particular instruction seems to spend a lot of cycles.
machine is a dual core 285, 2.6 GHz
/*
* Command line: opannotate -a event:CPU_CLK_UNHALTED
/usr/src/linux-2.6.18/vmlinux
*
* Interpretation of command line:
* Output annotated assembly listing with samples
*
* CPU: AMD64 processors, speed 2600.01 MHz (estimated)
* Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
mask of 0x00 (No unit mask) count 10000
*/
ffffffff803e9860 <ip_route_input>: /* ip_route_input total: 543098 2.5487 */
/* relevant extract from ip_route_input() */
600 0.0028 :ffffffff803e98b3: mov $0xffffffff806375e0,%rsi
883 0.0041 :ffffffff803e98ba: mov %rax,%rdx
6 2.8e-05 :ffffffff803e98bd: mov %rsi,%rcx
2281 0.0107 :ffffffff803e98c0: cmp 0xf0(%rdx),%r12d
9767 0.0458 :ffffffff803e98c7: jne ffffffff803e98f1
<ip_route_input+0x91>
108 5.1e-04 :ffffffff803e98c9: cmp 0xf4(%rdx),%r14d
41459 0.1946 :ffffffff803e98d0: jne ffffffff803e98f1
<ip_route_input+0x91>
549 0.0026 :ffffffff803e98d2: cmp 0xec(%rdx),%ebx
88604 0.4158 :ffffffff803e98d8: jne ffffffff803e98f1
<ip_route_input+0x91>
478 0.0022 :ffffffff803e98da: mov 0xe8(%rdx),%eax
315 0.0015 :ffffffff803e98e0: test %eax,%eax
241 0.0011 :ffffffff803e98e2: jne ffffffff803e98f1
<ip_route_input+0x91>
248 0.0012 :ffffffff803e98e4: cmp 0xfc(%rdx),%r13b
2314 0.0109 :ffffffff803e98eb: je ffffffff803ea3b3
################ BEGIN
370 0.0017 :ffffffff803e98f1: mov %gs:0x8,%rax
222769 1.0454 :ffffffff803e98fa: incl 0x38(%rcx,%rax,1)
################ END
6 2.8e-05 :ffffffff803e98fe: mov (%rdx),%rdx
833 0.0039 :ffffffff803e9901: test %rdx,%rdx
__raw_get_cpu_var(rt_cache_stat).field++ appears to be very expensive
(about 18000 RT_CACHE_STAT_INC(in_hlist_search); are done per second, not an
impressive count in fact)
Are segment prefixes that expensive ?
Or is it only the first access to %gs:8 that is doing extra checks ?
(because other RT_CACHE_STAT_INC() done in the same function dont have this cost)
Or is it the loading of %rcx (done in ffffffff803e98bd) that is stalling ?
I was wondering if avoiding a dependancy would help :
As we dont have TLS support in kernel yet, I was considering trying (just for
experimentation) to stick a struct rt_cache_stat in pda, since it avoids one step.
#if defined(RT_CACHE_STAT_IN_PDA)
# define RT_CACHE_STAT_INC(field) add_pda(rt_cache_stat.field, 1)
# define addr_of_rt_cache_stat(cpu) &cpu_pda(cpu)->rt_cache_stat
#else
static DEFINE_PER_CPU(struct rt_cache_stat, rt_cache_stat);
# define RT_CACHE_STAT_INC(field) (__raw_get_cpu_var(rt_cache_stat).field++)
# define addr_of_rt_cache_stat(cpu) &per_cpu(rt_cache_stat, cpu)
#endif
so that RT_CACHE_STAT_INC(field) would map to
addl $1,%gs:OFFSET /* no register needed */
Thank you
Eric
next prev parent reply other threads:[~2006-11-03 7:26 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-29 2:45 [PATCH 0/7] x86 paravirtualization infrastructure Chris Wright
2006-10-29 2:45 ` Chris Wright
2006-10-28 7:00 ` [PATCH 1/7] header and stubs for paravirtualizing critical operations Chris Wright
2006-10-29 16:40 ` Andi Kleen
2006-10-28 7:00 ` [PATCH 2/7] Patch inline replacements for common paravirt operations Chris Wright
2006-10-28 7:00 ` [PATCH 3/7] More generic paravirtualization entry point Chris Wright
2006-10-29 16:41 ` Andi Kleen
2006-10-28 7:00 ` [PATCH 4/7] Allow selected bug checks to be skipped by paravirt kernels Chris Wright
2006-11-01 12:17 ` Pavel Machek
2006-11-01 22:40 ` Dave Jones
2006-11-01 23:24 ` Zachary Amsden
2006-11-02 10:20 ` Pavel Machek
2006-11-02 10:20 ` Pavel Machek
2006-11-02 11:04 ` Zachary Amsden
2006-10-28 7:00 ` [PATCH 5/7] Allow disabling legacy power management modes with " Chris Wright
2006-10-28 7:00 ` [PATCH 6/7] Add APIC accessors to paravirt-ops Chris Wright
2006-10-29 16:31 ` Andi Kleen
2006-10-29 16:31 ` Andi Kleen
2006-10-30 3:28 ` Rusty Russell
2006-10-30 3:28 ` Rusty Russell
2006-10-30 23:11 ` Andi Kleen
2006-10-30 23:42 ` Chris Wright
2006-10-30 23:46 ` Andi Kleen
2006-10-30 23:55 ` Chris Wright
2006-10-31 1:45 ` Rusty Russell
2006-11-01 10:25 ` Rusty Russell
2006-11-01 10:27 ` [PATCH 1/7] paravirtualization: header and stubs for paravirtualizing critical operations Rusty Russell
2006-11-01 10:28 ` [PATCH 2/7] paravirtualization: Patch inline replacements for common paravirt operations Rusty Russell
2006-11-01 10:29 ` [PATCH 3/7] paravirtualization: More generic paravirtualization entry point Rusty Russell
2006-11-01 10:30 ` [PATCH 4/7] paravirtualization: Allow selected bug checks to be skipped by paravirt kernels Rusty Russell
2006-11-01 10:30 ` Rusty Russell
2006-11-01 10:31 ` [PATCH 5/7] paravirtualization: Allow disabling legacy power management modes with " Rusty Russell
2006-11-01 10:32 ` [PATCH 6/7] paravirtualization: Add APIC accessors to paravirt-ops Rusty Russell
2006-11-01 10:34 ` [PATCH 7/7] paravirtualization: Add mmu virtualization " Rusty Russell
2006-11-01 23:31 ` [PATCH 6/7] paravirtualization: Add APIC accessors " Andrew Morton
2006-11-01 23:31 ` Andrew Morton
2006-11-02 0:46 ` Rusty Russell
2006-11-02 0:46 ` Rusty Russell
2006-11-01 23:29 ` [PATCH 4/7] paravirtualization: Allow selected bug checks to be skipped by paravirt kernels Andrew Morton
2006-11-01 23:29 ` Andrew Morton
2006-11-01 23:58 ` Jeremy Fitzhardinge
2006-11-01 23:58 ` Jeremy Fitzhardinge
2006-11-02 0:01 ` Rusty Russell
2006-11-02 0:01 ` Rusty Russell
2006-11-01 23:27 ` [PATCH 2/7] paravirtualization: Patch inline replacements for common paravirt operations Andrew Morton
2006-11-01 23:27 ` Andrew Morton
2006-11-02 0:47 ` Rusty Russell
2006-11-02 0:47 ` Rusty Russell
2006-11-02 0:54 ` Zachary Amsden
2006-11-01 10:45 ` [PATCH 1/7] paravirtualization: header and stubs for paravirtualizing critical operations Arjan van de Ven
2006-11-01 10:45 ` Arjan van de Ven
2006-11-01 17:27 ` Andi Kleen
2006-11-01 23:32 ` Rusty Russell
2006-11-02 7:13 ` Andrew Morton
2006-11-02 7:13 ` Andrew Morton
2006-11-02 7:44 ` Oleg Verych
2006-11-03 2:56 ` Andi Kleen
2006-11-03 2:56 ` Andi Kleen
2006-11-03 7:26 ` Eric Dumazet [this message]
2006-11-03 17:01 ` [x86_64] Strange oprofile results on access to per_cpu data Andi Kleen
2006-11-03 20:35 ` [PATCH 1/7] paravirtualization: header and stubs for paravirtualizing critical operations Zachary Amsden
2006-11-03 21:09 ` Andi Kleen
2006-11-05 4:43 ` Rusty Russell
2006-11-05 4:59 ` Zachary Amsden
2006-11-05 4:59 ` Zachary Amsden
2006-11-05 5:08 ` Rusty Russell
2006-11-05 5:08 ` Rusty Russell
2006-11-05 5:46 ` Andi Kleen
2006-11-05 6:18 ` Andrew Morton
2006-11-05 6:18 ` Andrew Morton
2006-11-05 6:21 ` Rusty Russell
2006-11-05 6:21 ` Rusty Russell
2006-11-05 6:57 ` Andi Kleen
2006-11-18 2:08 ` john stultz
2006-10-28 7:00 ` [PATCH 7/7] Add mmu virtualization to paravirt-ops Chris Wright
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=454AEF0D.1090402@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=ak@suse.de \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.