Re: x86 - cpu_relax - why nop vs. pause?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Michael Breuer <mbreuer@majjas.com>
To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: Mike Galbraith <efault@gmx.de>
Subject: Re: x86 - cpu_relax - why nop vs. pause?
Date: Sun, 07 Feb 2010 15:08:14 -0500	[thread overview]
Message-ID: <4B6F1DAE.6020407@majjas.com> (raw)
In-Reply-To: <1265566470.6280.10.camel@marge.simson.net>

On 2/7/2010 1:14 PM, Mike Galbraith wrote:
> On Sun, 2010-02-07 at 12:28 -0500, Michael Breuer wrote:
>    
>> I did search and noticed some old discussions. Looking at both Intel and
>> AMD documentation, it would seem that PAUSE is the preferred instruction
>> within a spin lock. Further, both Intel and AMD specifications state
>> that the instruction is backward compatible with older x86 processors.
>>
>> For fun, I changed nop to pause on my core i7 920 (smt enabled) and I'm
>> seeing about a 5-10% performance improvement on 2.6.33 rc7. Perf top
>> shows time spent in spin_lock under load drops from an average of around
>> 35% to about 25%.
>>
>> Thoughts?
>>      
> /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
>
> 00000000004004fc<rep_nop>:
>    4004fc:       55                      push   %rbp
>    4004fd:       48 89 e5                mov    %rsp,%rbp
>    400500:       f3 90                   pause
>    400502:       c9                      leaveq
>    400503:       c3                      retq
>
> 0000000000400504<pause>:
>    400504:       55                      push   %rbp
>    400505:       48 89 e5                mov    %rsp,%rbp
>    400508:       f3 90                   pause
>    40050a:       c9                      leaveq
>    40050b:       c3                      retq
>
> foo.c
>
> static inline void rep_nop(void)
> {
>          asm volatile("rep; nop" ::: "memory");
> }
>
> static inline void pause(void)
> {
>          asm volatile("pause" ::: "memory");
> }
>
> void main(void)
> {
> 	rep_nop();
> 	pause();
> }
>
>    
Interesting, and this got me thinking... and testing... I think there's 
an optimization issue with gcc:

First of all - a bit of background on how I got here:

After reading the Intel documentation, I tried replacing rep:nop with 
pause (in theory exactly what's shown above). The system hung on booting.
I then tried replacing nop with pause (rep:pause) and the system booted. 
Using the above example, the opcode becomes f3 f3 90 vs f3 90 (rep nop).

Given the above compiler test case, this seemed odd, to say the least. 
So I played a bit more with gcc. Seems that the optimizer (-O3) is 
handling the *three*cases differently (objdump output)

Base code for all three cases (only change is the asm volitile line as 
shown for each case):

static inline void pause(void)
{
         asm volatile("pause" ::: "memory");
}

void main(void)
{
     pause();
}

Case1 - asm volatile("pause" ::: "memory");
0000000000400480 <main>:
   400480:    f3 90                    pause
   400482:    c3                       retq
   400483:    90                       nop

Case2 - asm volitile("rep;nop" ::: "memory") Note: this didn't inline!

0000000000400474 <pause>:
   400474:    55                       push   %rbp
   400475:    48 89 e5                 mov    %rsp,%rbp
   400478:    f3 90                    pause
   40047a:    c9                       leaveq
   40047b:    c3                       retq

000000000040047c <main>:
   40047c:    55                       push   %rbp
   40047d:    48 89 e5                 mov    %rsp,%rbp
   400480:    e8 ef ff ff ff           callq  400474 <pause>
   400485:    c9                       leaveq
   400486:    c3                       retq
   400487:    90                       nop
   400488:    90                       nop
   400489:    90                       nop
   40048a:    90                       nop
   40048b:    90                       nop
   40048c:    90                       nop
   40048d:    90                       nop
   40048e:    90                       nop
   40048f:    90                       nop

Case3 - asm volitile("rep;pause" ::: "memory")
0000000000400480 <main>:
   400480:    f3 f3 90                 pause
   400483:    c3                       retq
   400484:    90                       nop
_______
Note the difference between opcodes case 1 and case 3, and the mess made 
by the compiler in case 2.

As to benchmarks  - I've checked a few things, no formal or lasting 
stuff... but striking at first glance:

1) At idle, perf top shows time spent in _raw_spin_lock dropping from 
~35% to ~25%.
2) Running a media transcode (single core - handbrakecli): frame rate 
increased by about 5-10%.
3) During file-intensive operations (#2, above, or copying large files - 
ext4 on software raid6) - latencytop shows a decerase on writing a page 
to disc from about 120ms to about 90ms.

next prev parent reply	other threads:[~2010-02-07 20:08 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-07 17:28 x86 - cpu_relax - why nop vs. pause? Michael Breuer
2010-02-07 18:09 ` Joerg Roedel
2010-02-07 18:32 ` Arjan van de Ven
     [not found] ` <1265566470.6280.10.camel@marge.simson.net>
2010-02-07 20:08   ` Michael Breuer [this message]
2010-02-07 21:15     ` Michael Breuer
2010-02-08  3:50       ` Michael Breuer
2010-02-08 13:33         ` Artur Skawina

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B6F1DAE.6020407@majjas.com \
    --to=mbreuer@majjas.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox