All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Andi Kleen <ak@suse.de>
Cc: linux-kernel@vger.kernel.org
Subject: [x86_64] Strange oprofile results on access to per_cpu data
Date: Fri, 03 Nov 2006 08:26:05 +0100	[thread overview]
Message-ID: <454AEF0D.1090402@cosmosbay.com> (raw)
In-Reply-To: <200611030356.54074.ak@suse.de>

Hi Andi

While doing some oprofile analysis, I got this result on ip_route_input() : 
one particular instruction seems to spend a lot of cycles.

machine is a dual core 285, 2.6 GHz

/*
  * Command line: opannotate -a event:CPU_CLK_UNHALTED 
/usr/src/linux-2.6.18/vmlinux
  *
  * Interpretation of command line:
  * Output annotated assembly listing with samples
  *
  * CPU: AMD64 processors, speed 2600.01 MHz (estimated)
  * Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit 
mask of 0x00 (No unit mask) count 10000
  */

ffffffff803e9860 <ip_route_input>: /* ip_route_input total: 543098  2.5487 */

/* relevant extract from ip_route_input() */
    600  0.0028 :ffffffff803e98b3:       mov    $0xffffffff806375e0,%rsi
    883  0.0041 :ffffffff803e98ba:       mov    %rax,%rdx
      6 2.8e-05 :ffffffff803e98bd:       mov    %rsi,%rcx
   2281  0.0107 :ffffffff803e98c0:       cmp    0xf0(%rdx),%r12d
   9767  0.0458 :ffffffff803e98c7:       jne    ffffffff803e98f1 
<ip_route_input+0x91>
    108 5.1e-04 :ffffffff803e98c9:       cmp    0xf4(%rdx),%r14d
  41459  0.1946 :ffffffff803e98d0:       jne    ffffffff803e98f1 
<ip_route_input+0x91>
    549  0.0026 :ffffffff803e98d2:       cmp    0xec(%rdx),%ebx
  88604  0.4158 :ffffffff803e98d8:       jne    ffffffff803e98f1 
<ip_route_input+0x91>
    478  0.0022 :ffffffff803e98da:       mov    0xe8(%rdx),%eax
    315  0.0015 :ffffffff803e98e0:       test   %eax,%eax
    241  0.0011 :ffffffff803e98e2:       jne    ffffffff803e98f1 
<ip_route_input+0x91>
    248  0.0012 :ffffffff803e98e4:       cmp    0xfc(%rdx),%r13b

   2314  0.0109 :ffffffff803e98eb:       je     ffffffff803ea3b3
################ BEGIN
    370  0.0017 :ffffffff803e98f1:       mov    %gs:0x8,%rax
222769  1.0454 :ffffffff803e98fa:       incl   0x38(%rcx,%rax,1)
################ END
      6 2.8e-05 :ffffffff803e98fe:       mov    (%rdx),%rdx
    833  0.0039 :ffffffff803e9901:       test   %rdx,%rdx

__raw_get_cpu_var(rt_cache_stat).field++ appears to be very expensive

(about 18000 RT_CACHE_STAT_INC(in_hlist_search); are done per second, not an 
impressive count in fact)

Are segment prefixes that expensive ?
Or is it only the first access to %gs:8 that is doing extra checks ?
(because other RT_CACHE_STAT_INC() done in the same function dont have this cost)
Or is it the loading of %rcx (done in ffffffff803e98bd) that is stalling ?

I was wondering if avoiding a dependancy would help :

As we dont have TLS support in kernel yet, I was considering trying (just for 
experimentation) to stick a struct rt_cache_stat in pda, since it avoids one step.

#if defined(RT_CACHE_STAT_IN_PDA)
# define RT_CACHE_STAT_INC(field) add_pda(rt_cache_stat.field, 1)
# define addr_of_rt_cache_stat(cpu) &cpu_pda(cpu)->rt_cache_stat
#else
   static DEFINE_PER_CPU(struct rt_cache_stat, rt_cache_stat);
# define RT_CACHE_STAT_INC(field) (__raw_get_cpu_var(rt_cache_stat).field++)
# define addr_of_rt_cache_stat(cpu) &per_cpu(rt_cache_stat, cpu)
#endif

so that RT_CACHE_STAT_INC(field) would map to

    addl $1,%gs:OFFSET  /* no register needed */

Thank you
Eric

  reply	other threads:[~2006-11-03  7:26 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-29  2:45 [PATCH 0/7] x86 paravirtualization infrastructure Chris Wright
2006-10-29  2:45 ` Chris Wright
2006-10-28  7:00 ` [PATCH 1/7] header and stubs for paravirtualizing critical operations Chris Wright
2006-10-29 16:40   ` Andi Kleen
2006-10-28  7:00 ` [PATCH 2/7] Patch inline replacements for common paravirt operations Chris Wright
2006-10-28  7:00 ` [PATCH 3/7] More generic paravirtualization entry point Chris Wright
2006-10-29 16:41   ` Andi Kleen
2006-10-28  7:00 ` [PATCH 4/7] Allow selected bug checks to be skipped by paravirt kernels Chris Wright
2006-11-01 12:17   ` Pavel Machek
2006-11-01 22:40     ` Dave Jones
2006-11-01 23:24     ` Zachary Amsden
2006-11-02 10:20       ` Pavel Machek
2006-11-02 10:20         ` Pavel Machek
2006-11-02 11:04         ` Zachary Amsden
2006-10-28  7:00 ` [PATCH 5/7] Allow disabling legacy power management modes with " Chris Wright
2006-10-28  7:00 ` [PATCH 6/7] Add APIC accessors to paravirt-ops Chris Wright
2006-10-29 16:31   ` Andi Kleen
2006-10-29 16:31     ` Andi Kleen
2006-10-30  3:28     ` Rusty Russell
2006-10-30  3:28       ` Rusty Russell
2006-10-30 23:11       ` Andi Kleen
2006-10-30 23:42         ` Chris Wright
2006-10-30 23:46           ` Andi Kleen
2006-10-30 23:55             ` Chris Wright
2006-10-31  1:45             ` Rusty Russell
2006-11-01 10:25         ` Rusty Russell
2006-11-01 10:27         ` [PATCH 1/7] paravirtualization: header and stubs for paravirtualizing critical operations Rusty Russell
2006-11-01 10:28           ` [PATCH 2/7] paravirtualization: Patch inline replacements for common paravirt operations Rusty Russell
2006-11-01 10:29             ` [PATCH 3/7] paravirtualization: More generic paravirtualization entry point Rusty Russell
2006-11-01 10:30               ` [PATCH 4/7] paravirtualization: Allow selected bug checks to be skipped by paravirt kernels Rusty Russell
2006-11-01 10:30                 ` Rusty Russell
2006-11-01 10:31                 ` [PATCH 5/7] paravirtualization: Allow disabling legacy power management modes with " Rusty Russell
2006-11-01 10:32                   ` [PATCH 6/7] paravirtualization: Add APIC accessors to paravirt-ops Rusty Russell
2006-11-01 10:34                     ` [PATCH 7/7] paravirtualization: Add mmu virtualization " Rusty Russell
2006-11-01 23:31                     ` [PATCH 6/7] paravirtualization: Add APIC accessors " Andrew Morton
2006-11-01 23:31                       ` Andrew Morton
2006-11-02  0:46                       ` Rusty Russell
2006-11-02  0:46                         ` Rusty Russell
2006-11-01 23:29                 ` [PATCH 4/7] paravirtualization: Allow selected bug checks to be skipped by paravirt kernels Andrew Morton
2006-11-01 23:29                   ` Andrew Morton
2006-11-01 23:58                   ` Jeremy Fitzhardinge
2006-11-01 23:58                     ` Jeremy Fitzhardinge
2006-11-02  0:01                   ` Rusty Russell
2006-11-02  0:01                     ` Rusty Russell
2006-11-01 23:27             ` [PATCH 2/7] paravirtualization: Patch inline replacements for common paravirt operations Andrew Morton
2006-11-01 23:27               ` Andrew Morton
2006-11-02  0:47               ` Rusty Russell
2006-11-02  0:47                 ` Rusty Russell
2006-11-02  0:54                 ` Zachary Amsden
2006-11-01 10:45           ` [PATCH 1/7] paravirtualization: header and stubs for paravirtualizing critical operations Arjan van de Ven
2006-11-01 10:45             ` Arjan van de Ven
2006-11-01 17:27             ` Andi Kleen
2006-11-01 23:32             ` Rusty Russell
2006-11-02  7:13           ` Andrew Morton
2006-11-02  7:13             ` Andrew Morton
2006-11-02  7:44             ` Oleg Verych
2006-11-03  2:56           ` Andi Kleen
2006-11-03  2:56             ` Andi Kleen
2006-11-03  7:26             ` Eric Dumazet [this message]
2006-11-03 17:01               ` [x86_64] Strange oprofile results on access to per_cpu data Andi Kleen
2006-11-03 20:35             ` [PATCH 1/7] paravirtualization: header and stubs for paravirtualizing critical operations Zachary Amsden
2006-11-03 21:09               ` Andi Kleen
2006-11-05  4:43                 ` Rusty Russell
2006-11-05  4:59                   ` Zachary Amsden
2006-11-05  4:59                     ` Zachary Amsden
2006-11-05  5:08                     ` Rusty Russell
2006-11-05  5:08                       ` Rusty Russell
2006-11-05  5:46                   ` Andi Kleen
2006-11-05  6:18                     ` Andrew Morton
2006-11-05  6:18                       ` Andrew Morton
2006-11-05  6:21                     ` Rusty Russell
2006-11-05  6:21                       ` Rusty Russell
2006-11-05  6:57                       ` Andi Kleen
2006-11-18  2:08           ` john stultz
2006-10-28  7:00 ` [PATCH 7/7] Add mmu virtualization to paravirt-ops Chris Wright

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=454AEF0D.1090402@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=ak@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.