Re: i386 spinlock fairness: bizarre test results

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Eric Dumazet <dada1@cosmosbay.com>
To: Chuck Ebbert <76306.1226@compuserve.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	linux@horizon.com, Linus Torvalds <torvalds@osdl.org>,
	Kirill Korotaev <dev@sw.ru>
Subject: Re: i386 spinlock fairness: bizarre test results
Date: Tue, 11 Oct 2005 11:42:45 +0200	[thread overview]
Message-ID: <434B8915.3040706@cosmosbay.com> (raw)
In-Reply-To: <200510110007_MC3-1-AC4C-97EA@compuserve.com>

Chuck Ebbert a écrit :
>   After seeing Kirill's message about spinlocks I decided to do my own
> testing with the userspace program below; the results were very strange.
> 
>   When using the 'mov' instruction to do the unlock I was able to reproduce
> hogging of the spinlock by a single CPU even on Pentium II under some
> conditions, while using 'xchg' always allowed the other CPU to get the
> lock:
> 
> 
> parent CPU 1, child CPU 0, using mov instruction for unlock
> parent did: 34534 of 10000001 iterations
> CPU clocks:     2063591864
> 
> parent CPU 1, child CPU 0, using xchg instruction for unlock
> parent did: 5169760 of 10000001 iterations
> CPU clocks:     2164689784
> 
Hi Chuck

Thats interesting, because on my machines I get opposite results :

On a Xeon 2.0 GHz, Hyper Threading, I get better results (in the sense of 
fairness) with the MOV version

    parent CPU 1, child CPU 0, using mov instruction for unlock
    parent did: 5291346 of 10000001 iterations
    CPU clocks:     3.437.053.244

    parent CPU 1, child CPU 0, using xchg instruction for unlock
    parent did: 7732349 of 10000001 iterations
    CPU clocks:     3.408.285.308


On a dual Opteron 248 (2.2GHz) machine, I get bad fairness results regardless 
of MOV/XCHG (but less cycles per iteration). A lot of variations in the 
results (maybe because of NUMA effects ?), but xchg gives slightly better 
fairness.

    parent CPU 1, child CPU 0, using mov instruction for unlock
    parent did: 256810 of 10000001 iterations
    CPU clocks:      772.838.640

    parent CPU 1, child CPU 0, using xchg instruction for unlock
    parent did: 438280 of 10000001 iterations
    CPU clocks:     1.115.653.346

    parent CPU 1, child CPU 0, using xchg instruction for unlock
    parent did: 574501 of 10000001 iterations
    CPU clocks:     1.200.129.428


On a dual core Opteron 275 (2.2GHz), xchg is faster but unfair.

(threads running on the same physical CPU)
    parent CPU 1, child CPU 0, using mov instruction for unlock
    parent did: 4822270 of 10000001 iterations
    CPU clocks:      738.438.383

    parent CPU 1, child CPU 0, using xchg instruction for unlock
    parent did: 9702075 of 10000001 iterations
    CPU clocks:      561.457.724

Totally different results if affinity changed so that threads run on different 
physical cpus  : (XCHG is slower)

    parent CPU 0, child CPU 2, using mov instruction for unlock
    parent did: 1081522 of 10000001 iterations
    CPU clocks:      508.611.273

    parent CPU 0, child CPU 2, using xchg instruction for unlock
    parent did: 4310427 of 10000000 iterations
    CPU clocks:     1.074.170.246



For reference, if only one thread is running, the MOV version is also faster 
on both platforms :

[Xeon 2GHz]
    one thread, using mov instruction for unlock
    10000000 iterations
    CPU clocks:     1.278.879.528

[Xeon 2GHz]
    one thread, using xchg instruction for unlock
    10000000 iterations
    CPU clocks:     2.486.912.752

Of course Opterons are faster :) (less cycles per iteration)

[Opteron 248]
    one thread, using mov instruction for unlock
    10000000 iterations
    CPU clocks:      212.514.637

[Opteron 248]
    one thread, using xchg instruction for unlock
    10000000 iterations
    CPU clocks:      383.306.420

[Opteron 275]
    one thread, using mov instruction for unlock
    10000000 iterations
    CPU clocks:      208.472.009

[Opteron 275]
    one thread, using xchg instruction for unlock
    10000000 iterations
    CPU clocks:      417.502.675

In conclusion, I would say that for uncontended locks, MOV is faster. For 
contended locks, XCHG might be faster on some platforms (dual core Opterons, 
only if on the same physical CPU)

Eric

next prev parent reply	other threads:[~2005-10-11  9:44 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-11  4:04 i386 spinlock fairness: bizarre test results Chuck Ebbert
2005-10-11  9:42 ` Eric Dumazet [this message]
2005-10-11 13:00 ` Alan Cox
2005-10-11 14:44   ` Linus Torvalds
2005-10-11 15:32     ` [PATCH] i386 spinlocks should use the full 32 bits, not only 8 bits Eric Dumazet
2005-10-11 16:03       ` Linus Torvalds
2005-10-11 16:36         ` Eric Dumazet
2005-10-11 16:54           ` Linus Torvalds
2005-10-11 16:55             ` Linus Torvalds
2005-10-17  7:03           ` Andrew Morton
2005-10-17  7:20             ` Arjan van de Ven
2005-10-20 21:50               ` Ingo Molnar
2005-10-20 21:57                 ` Linus Torvalds
2005-10-20 22:02                   ` Ingo Molnar
2005-10-20 22:15                     ` Linus Torvalds
2005-10-20 22:27                       ` Ingo Molnar
2005-10-20 22:44                         ` Andrew Morton
2005-10-20 22:53                           ` Ingo Molnar
2005-10-20 23:01                             ` Andrew Morton
2005-10-20 23:26                               ` Ingo Molnar
2005-10-27 16:54                                 ` Paul Jackson
2005-10-11 17:59         ` Andi Kleen
     [not found] <4WjCM-7Aq-7@gated-at.bofh.it>
2005-10-11  4:32 ` i386 spinlock fairness: bizarre test results Robert Hancock
2005-10-11 12:31   ` Joe Seigh
2005-10-11 21:01     ` Bill Davidsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=434B8915.3040706@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=76306.1226@compuserve.com \
    --cc=dev@sw.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@horizon.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox