From: Waiman Long <waiman.long@hp.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
virtualization@lists.linux-foundation.org,
Andi Kleen <andi@firstfloor.org>,
"H. Peter Anvin" <hpa@zytor.com>,
Michel Lespinasse <walken@google.com>,
Alok Kataria <akataria@vmware.com>,
linux-arch@vger.kernel.org, x86@kernel.org,
Ingo Molnar <mingo@redhat.com>,
Scott J Norton <scott.norton@hp.com>,
xen-devel@lists.xenproject.org,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Alexander Fyodorov <halcy@yandex.ru>,
Rik van Riel <riel@redhat.com>, Arnd Bergmann <arnd@arndb.de>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Daniel J Blueman <daniel@numascale.com>,
Oleg Nesterov <oleg@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Chris Wright <chrisw@sous-sol.org>,
George Spelvin <linux@horizon.com>,
Thomas Gleixner <tglx@linutro>
Subject: Re: [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Date: Tue, 04 Mar 2014 10:27:03 -0500 [thread overview]
Message-ID: <5315F0C7.8090909@hp.com> (raw)
In-Reply-To: <20140303174305.GK9987@twins.programming.kicks-ass.net>
On 03/03/2014 12:43 PM, Peter Zijlstra wrote:
> Hi,
>
> Here are some numbers for my version -- also attached is the test code.
> I found that booting big machines is tediously slow so I lifted the
> whole lot to userspace.
>
> I measure the cycles spend in arch_spin_lock() + arch_spin_unlock().
>
> The machines used are a 4 node (2 socket) AMD Interlagos, and a 2 node
> (2 socket) Intel Westmere-EP.
>
> AMD (ticket) AMD (qspinlock + pending + opt)
>
> Local: Local:
>
> 1: 324.425530 1: 324.102142
> 2: 17141.324050 2: 620.185930
> 3: 52212.232343 3: 25242.574661
> 4: 93136.458314 4: 47982.037866
> 6: 167967.455965 6: 95345.011864
> 8: 245402.534869 8: 142412.451438
>
> 2 - nodes: 2 - nodes:
>
> 2: 12763.640956 2: 1879.460823
> 4: 94423.027123 4: 48278.719130
> 6: 167903.698361 6: 96747.767310
> 8: 257243.508294 8: 144672.846317
>
> 4 - nodes: 4 - nodes:
>
> 4: 82408.853603 4: 49820.323075
> 8: 260492.952355 8: 143538.264724
> 16: 630099.031148 16: 337796.553795
>
>
>
> Intel (ticket) Intel (qspinlock + pending + opt)
>
> Local: Local:
>
> 1: 19.002249 1: 29.002844
> 2: 5093.275530 2: 1282.209519
> 3: 22300.859761 3: 22127.477388
> 4: 44929.922325 4: 44493.881832
> 6: 86338.755247 6: 86360.083940
>
> 2 - nodes: 2 - nodes:
>
> 2: 1509.193824 2: 1209.090219
> 4: 48154.495998 4: 48547.242379
> 8: 137946.787244 8: 141381.498125
>
> ---
>
> There a few curious facts I found (assuming my test code is sane).
>
> - Intel seems to be an order of magnitude faster on uncontended LOCKed
> ops compared to AMD
>
> - On Intel the uncontended qspinlock fast path (cmpxchg) seems slower
> than the uncontended ticket xadd -- although both are plenty fast
> when compared to AMD.
>
> - In general, replacing cmpxchg loops with unconditional atomic ops
> doesn't seem to matter a whole lot when the thing is contended.
>
> Below is the (rather messy) qspinlock slow path code (the only thing
> that really differs between our versions.
>
> I'll try and slot your version in tomorrow.
>
> ---
>
It is curious to see that the qspinlock code offers a big benefit on AMD
machines, but no so much on Intel. Anyway, I am working on a revised
version of the patch that includes some of your comments. I will also
try to see if I can get an AMD machine to run test on.
-Longman
next prev parent reply other threads:[~2014-03-04 15:27 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-26 15:14 [PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support Waiman Long
2014-02-26 15:14 ` [PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation Waiman Long
2014-02-26 16:22 ` Peter Zijlstra
2014-02-27 20:25 ` Waiman Long
2014-02-26 16:24 ` Peter Zijlstra
2014-02-27 20:25 ` Waiman Long
2014-02-26 15:14 ` [PATCH v5 2/8] qspinlock, x86: Enable x86-64 to use queue spinlock Waiman Long
2014-02-26 15:14 ` [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks Waiman Long
2014-02-26 16:20 ` Peter Zijlstra
2014-02-27 20:42 ` Waiman Long
2014-02-28 9:29 ` Peter Zijlstra
2014-02-28 16:25 ` Linus Torvalds
2014-02-28 17:37 ` Peter Zijlstra
2014-02-28 16:38 ` Waiman Long
2014-02-28 17:56 ` Peter Zijlstra
2014-03-03 17:43 ` Peter Zijlstra
2014-03-04 15:27 ` Waiman Long [this message]
2014-03-04 16:58 ` Peter Zijlstra
2014-03-04 18:09 ` Peter Zijlstra
2014-03-04 17:48 ` Waiman Long
2014-03-04 22:40 ` Peter Zijlstra
2014-03-05 20:59 ` Peter Zijlstra
2014-02-26 15:14 ` [PATCH RFC v5 4/8] pvqspinlock, x86: Allow unfair spinlock in a real PV environment Waiman Long
2014-02-26 17:07 ` Konrad Rzeszutek Wilk
2014-02-28 17:06 ` Waiman Long
2014-03-03 10:55 ` Paolo Bonzini
2014-03-04 15:15 ` Waiman Long
2014-03-04 15:23 ` Paolo Bonzini
2014-03-04 15:39 ` David Vrabel
2014-03-04 17:50 ` Raghavendra K T
2014-02-27 12:28 ` David Vrabel
2014-02-27 19:40 ` Waiman Long
2014-02-26 15:14 ` [PATCH RFC v5 5/8] pvqspinlock, x86: Enable unfair queue spinlock in a KVM guest Waiman Long
2014-02-26 17:08 ` Konrad Rzeszutek Wilk
2014-02-28 17:08 ` Waiman Long
2014-02-27 9:41 ` Paolo Bonzini
2014-02-27 19:05 ` Waiman Long
2014-02-27 10:40 ` Raghavendra K T
2014-02-27 19:12 ` Waiman Long
2014-02-26 15:14 ` [PATCH RFC v5 6/8] pvqspinlock, x86: Rename paravirt_ticketlocks_enabled Waiman Long
2014-02-26 15:14 ` [PATCH RFC v5 7/8] pvqspinlock, x86: Add qspinlock para-virtualization support Waiman Long
2014-02-26 17:54 ` Konrad Rzeszutek Wilk
2014-02-27 12:11 ` David Vrabel
2014-02-27 13:11 ` Paolo Bonzini
2014-02-27 14:18 ` David Vrabel
2014-02-27 14:45 ` Paolo Bonzini
2014-02-27 15:22 ` Raghavendra K T
2014-02-27 15:50 ` Paolo Bonzini
2014-03-03 11:06 ` [Xen-devel] " David Vrabel
2014-02-27 20:50 ` Waiman Long
2014-02-27 19:42 ` Waiman Long
2014-02-26 15:14 ` [PATCH RFC v5 8/8] pvqspinlock, x86: Enable KVM to use qspinlock's PV support Waiman Long
2014-02-27 9:31 ` Paolo Bonzini
2014-02-27 18:36 ` Waiman Long
2014-02-26 17:00 ` [PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with " Konrad Rzeszutek Wilk
2014-02-28 16:56 ` Waiman Long
2014-02-26 22:26 ` Paul E. McKenney
-- strict thread matches above, loose matches on Subject: below --
2014-02-27 4:32 Waiman Long
2014-02-27 4:32 ` [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks Waiman Long
2014-03-02 13:16 ` Oleg Nesterov
2014-03-04 14:54 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5315F0C7.8090909@hp.com \
--to=waiman.long@hp.com \
--cc=akataria@vmware.com \
--cc=andi@firstfloor.org \
--cc=arnd@arndb.de \
--cc=boris.ostrovsky@oracle.com \
--cc=chrisw@sous-sol.org \
--cc=daniel@numascale.com \
--cc=halcy@yandex.ru \
--cc=hpa@zytor.com \
--cc=jeremy@goop.org \
--cc=konrad.wilk@oracle.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux@horizon.com \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@linux.vnet.ibm.com \
--cc=riel@redhat.com \
--cc=rostedt@goodmis.org \
--cc=scott.norton@hp.com \
--cc=tglx@linutro \
--cc=virtualization@lists.linux-foundation.org \
--cc=walken@google.com \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).