Re: [PATCH 1/3] x86: Move msr accesses out of line

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	x86@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/3] x86: Move msr accesses out of line
Date: Wed, 25 Feb 2015 13:27:01 +0100	[thread overview]
Message-ID: <20150225122701.GK5029@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20150223174340.GD27767@tassilo.jf.intel.com>

On Mon, Feb 23, 2015 at 09:43:40AM -0800, Andi Kleen wrote:
> On Mon, Feb 23, 2015 at 06:04:36PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 20, 2015 at 05:38:55PM -0800, Andi Kleen wrote:
> > 
> > > This patch moves the MSR functions out of line. A MSR access is typically
> > > 40-100 cycles or even slower, a call is a few cycles at best, so the
> > > additional function call is not really significant.
> > 
> > If I look at the below PDF a CALL+PUSH EBP+MOV RSP,RBP+ ... +POP+RET
> > ends up being 5+1.5+0.5+ .. + 1.5+8 = 16.5 + .. cycles.
> 
> You cannot just add up the latency cycles. The CPU runs all of this 
> in parallel. 
> 
> Latency cycles would only be interesting if these instructions were
> on the critical path for computing the result, which they are not. 
> 
> It should be a few cycles overhead.

I thought that since CALL touches RSP, PUSH touches RSP, MOV RSP,
(obviously) touches RSP, POP touches RSP and well, RET does too. There
were strong dependencies on the instructions and there would be little
room to parallelize things.

I'm glad you so patiently educated me on the wonders of modern
architectures and how it can indeed do all this in parallel.

Still, I wondered, so I ran me a little test. Note that I used a
serializing instruction (LOCK XCHG) because WRMSR is too.

I see a ~14 cycle difference between the inline and noinline version.

If I substitute the LOCK XCHG with XADD, I get to 1,5 cycles in
difference, so clearly there is some magic happening, but serializing
instructions wreck it.

Anybody can explain how such RSP deps get magiced away?

---

root@ivb-ep:~# cat call.c

#define __always_inline         inline __attribute__((always_inline))
#define  noinline                       __attribute__((noinline))

static int
#ifdef FOO
noinline
#else
__always_inline
#endif
xchg(int *ptr, int val)
{
        asm volatile ("LOCK xchgl %0, %1\n"
                        : "+r" (val), "+m" (*(ptr))
                        : : "memory", "cc");
        return val;
}

void main(void)
{
        int val = 0, old;

        for (int i = 0; i < 1000000000; i++)
                old = xchg(&val, i);
}

root@ivb-ep:~# gcc -std=gnu99 -O3 -fno-omit-frame-pointer -DFOO -o call call.c
root@ivb-ep:~# objdump -D call | awk '/<[^>]*>:/ {p=0} /<main>:/ {p=1} /<xchg>:/ {p=1} { if (p) print $0 }'
00000000004003e0 <main>:
  4003e0:       55                      push   %rbp
  4003e1:       48 89 e5                mov    %rsp,%rbp
  4003e4:       53                      push   %rbx
  4003e5:       31 db                   xor    %ebx,%ebx
  4003e7:       48 83 ec 18             sub    $0x18,%rsp
  4003eb:       c7 45 e0 00 00 00 00    movl   $0x0,-0x20(%rbp)
  4003f2:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
  4003f8:       48 8d 7d e0             lea    -0x20(%rbp),%rdi
  4003fc:       89 de                   mov    %ebx,%esi
  4003fe:       83 c3 01                add    $0x1,%ebx
  400401:       e8 fa 00 00 00          callq  400500 <xchg>
  400406:       81 fb 00 ca 9a 3b       cmp    $0x3b9aca00,%ebx
  40040c:       75 ea                   jne    4003f8 <main+0x18>
  40040e:       48 83 c4 18             add    $0x18,%rsp
  400412:       5b                      pop    %rbx
  400413:       5d                      pop    %rbp
  400414:       c3                      retq   

0000000000400500 <xchg>:
  400500:       55                      push   %rbp
  400501:       89 f0                   mov    %esi,%eax
  400503:       48 89 e5                mov    %rsp,%rbp
  400506:       f0 87 07                lock xchg %eax,(%rdi)
  400509:       5d                      pop    %rbp
  40050a:       c3                      retq   
  40050b:       90                      nop
  40050c:       90                      nop
  40050d:       90                      nop
  40050e:       90                      nop
  40050f:       90                      nop

root@ivb-ep:~# gcc -std=gnu99 -O3 -fno-omit-frame-pointer -o call-inline call.c
root@ivb-ep:~# objdump -D call-inline | awk '/<[^>]*>:/ {p=0} /<main>:/ {p=1} /<xchg>:/ {p=1} { if (p) print $0 }'
00000000004003e0 <main>:
  4003e0:       55                      push   %rbp
  4003e1:       31 c0                   xor    %eax,%eax
  4003e3:       48 89 e5                mov    %rsp,%rbp
  4003e6:       c7 45 f0 00 00 00 00    movl   $0x0,-0x10(%rbp)
  4003ed:       0f 1f 00                nopl   (%rax)
  4003f0:       89 c2                   mov    %eax,%edx
  4003f2:       f0 87 55 f0             lock xchg %edx,-0x10(%rbp)
  4003f6:       83 c0 01                add    $0x1,%eax
  4003f9:       3d 00 ca 9a 3b          cmp    $0x3b9aca00,%eax
  4003fe:       75 f0                   jne    4003f0 <main+0x10>
  400400:       5d                      pop    %rbp
  400401:       c3                      retq   

root@ivb-ep:~# perf stat -e "cycles:u" ./call

 Performance counter stats for './call':

    36,309,274,162      cycles:u                 

      10.561819310 seconds time elapsed

root@ivb-ep:~# perf stat -e "cycles:u" ./call-inline 

 Performance counter stats for './call-inline':

    22,004,045,745      cycles:u                 

       6.498271508 seconds time elapsed

next prev parent reply	other threads:[~2015-02-25 12:27 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-21  1:38 [PATCH 1/3] x86: Move msr accesses out of line Andi Kleen
2015-02-21  1:38 ` [PATCH 2/3] x86: Add trace point for MSR accesses Andi Kleen
2015-02-21  1:38 ` [PATCH 3/3] perf, x86: Remove old MSR perf tracing code Andi Kleen
2015-02-23 17:04 ` [PATCH 1/3] x86: Move msr accesses out of line Peter Zijlstra
2015-02-23 17:43   ` Andi Kleen
2015-02-25 12:27     ` Peter Zijlstra [this message]
2015-02-25 18:20       ` Andi Kleen
2015-02-25 18:34         ` Borislav Petkov
2015-02-26 11:43     ` [RFC][PATCH] module: Optimize __module_address() using a latched RB-tree Peter Zijlstra
2015-02-26 12:00       ` Ingo Molnar
2015-02-26 14:12       ` Peter Zijlstra
2015-02-27 11:51         ` Rusty Russell
2015-02-26 16:02       ` Mathieu Desnoyers
2015-02-26 16:43         ` Peter Zijlstra
2015-02-26 16:55           ` Mathieu Desnoyers
2015-02-26 17:16             ` Peter Zijlstra
2015-02-26 17:22             ` Peter Zijlstra
2015-02-26 18:28           ` Paul E. McKenney
2015-02-26 19:06             ` Mathieu Desnoyers
2015-02-26 19:13             ` Peter Zijlstra
2015-02-26 19:41               ` Paul E. McKenney
2015-02-26 19:45                 ` Peter Zijlstra
2015-02-26 22:32                   ` Peter Zijlstra
2015-02-26 20:52                 ` Andi Kleen
2015-02-26 22:36                   ` Peter Zijlstra
2015-02-27 10:01                 ` Peter Zijlstra
2015-02-28 23:30                   ` Paul E. McKenney
2015-02-28 16:41               ` Peter Zijlstra
2015-02-28 16:56                 ` Peter Zijlstra
2015-02-28 23:32                   ` Paul E. McKenney
2015-03-02  9:24                     ` Peter Zijlstra
2015-03-02 16:58                       ` Paul E. McKenney
2015-02-27 12:02       ` Rusty Russell
2015-02-27 14:30         ` Peter Zijlstra
  -- strict thread matches above, loose matches on Subject: below --
2015-03-20  0:29 Updated MSR tracing patchkit v2 Andi Kleen
2015-03-20  0:29 ` [PATCH 1/3] x86: Move msr accesses out of line Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150225122701.GK5029@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=ak@linux.intel.com \
    --cc=andi@firstfloor.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox