All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>,
	Nikita Danilov <nikita@clusterfs.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Ravikiran G Thirumalai <kiran@scalex86.org>
Subject: Re: [rfc][patch] queued spinlocks (i386)
Date: Fri, 30 Mar 2007 03:59:02 +0200	[thread overview]
Message-ID: <20070330015902.GC19407@wotan.suse.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0703291723140.1199@alien.or.mcafeemobile.com>

On Thu, Mar 29, 2007 at 05:27:24PM -0700, Davide Libenzi wrote:
> On Thu, 29 Mar 2007, Nick Piggin wrote:
> 
> > On Thu, Mar 29, 2007 at 03:36:52AM +0200, Nick Piggin wrote:
> > > In most cases, no. For the uncontended case they should be about the
> > > same. They have the same spinning behaviour. However there is a little
> > > window where they might be a bit slower I think... actually perhaps I'm
> > > wrong!
> > > 
> > > Currently if you have 4 CPUs spinning and the lock is released, all 4
> > > CPU cachelines will be invalidated, then they will be loaded again, and
> > > found to be 0, so they all try to atomic_dec_return the counter, each
> > > one invalidating others' cachelines. 1 gets through.
> > > 
> > > With my queued locks, all 4 cachelines are invalidated and loaded, but
> > > only one will be allowed to proceed, and there are 0 atomic operations
> > > or stores of any kind.
> > > 
> > > So I take that back: our current spinlocks have a worse thundering herd
> > > behaviour under contention than my queued ones. So I'll definitely
> > > push the patch through.
> > 
> > OK, it isn't a big difference, but a user-space test is showing slightly
> > (~2%) improvement in the contended case on a 16 core Opteron.
> > 
> > There is a case where the present spinlocks are almost twice as fast on
> > this machine (in terms of aggregate throughput), and that is when a lock
> > is taken right after it is released. This is because the same CPU will
> > often be able to retake the lock without transitioning the cache. This is
> > going to be a rare case for us, and would suggest suboptimal code anyway 
> > (ie. the lock should just be kept rather than dropped and retaken).
> > 
> > Actually, one situation where it comes up is when we drop and retake a
> > lock that needs_lockbreak. Of course, the queued lock behaviour is
> > desired in that case anyway.
> > 
> > However single-thread performance is presently a bit down. OTOH, the
> > assembly generated by gcc looks like it could be improved upon (even by
> > me :P).
> > 
> > This is what I've got so far. Should work for i386 and x86_64. Any
> > enhancements or results from other CPUs would be interesting.
> 
> I slightly modified it to use cycles:
> 
> http://www.xmailserver.org/qspins.c

Slightly more than slightly ;)

You want to have a delay _outside_ the critical section as well, for
multi-thread tests, otherwise the releasing CPU often just retakes
the lock (in the unqueued lock case). As I said, most kernel code
should _not_ be dropping and retaking locks.

> Here (Dual Opteron 252) queued locks (ticklocks) are about 10% slower in 
> both cases. This is really a microbench, and assembly matter a lot. I did 
> not have time to look at the generated one yet, but optimizing branches 
> can help in those cases.


  reply	other threads:[~2007-03-30  1:59 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-23  8:59 [rfc][patch] queued spinlocks (i386) Nick Piggin
2007-03-23  9:40 ` Eric Dumazet
2007-03-23  9:59   ` Nick Piggin
2007-03-23 19:27   ` Ravikiran G Thirumalai
2007-03-23 10:04 ` Ingo Molnar
2007-03-23 10:10   ` Nick Piggin
2007-03-23 16:48     ` Parag Warudkar
2007-03-23 18:15     ` Davide Libenzi
2007-03-23 10:32   ` Nick Piggin
2007-03-23 10:40     ` Eric Dumazet
2007-03-23 11:02     ` William Lee Irwin III
2007-03-24 15:55     ` Nikita Danilov
2007-03-24 17:29       ` Ingo Molnar
2007-03-24 18:49         ` Nikita Danilov
2007-03-28  6:43         ` Nick Piggin
2007-03-28 19:26           ` Davide Libenzi
2007-03-28 22:00             ` Davide Libenzi
2007-03-29  1:36               ` Nick Piggin
2007-03-29  7:16                 ` Nick Piggin
2007-03-30  0:27                   ` Davide Libenzi
2007-03-30  1:59                     ` Nick Piggin [this message]
2007-03-30  2:43                       ` Davide Libenzi
2007-03-29  1:24             ` Nick Piggin
2007-03-24 21:41     ` Andrew Morton
2007-03-28  6:56       ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070330015902.GC19407@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=davidel@xmailserver.org \
    --cc=kiran@scalex86.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nikita@clusterfs.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.