public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Ravikiran G Thirumalai <kiran@scalex86.org>,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: [rfc][patch] queued spinlocks (i386)
Date: Fri, 23 Mar 2007 10:59:46 +0100	[thread overview]
Message-ID: <20070323095946.GC11577@wotan.suse.de> (raw)
In-Reply-To: <20070323104017.cc1ea4fe.dada1@cosmosbay.com>

On Fri, Mar 23, 2007 at 10:40:17AM +0100, Eric Dumazet wrote:
> On Fri, 23 Mar 2007 09:59:11 +0100
> Nick Piggin <npiggin@suse.de> wrote:
> 
> > 
> > Implement queued spinlocks for i386. This shouldn't increase the size of
> > the spinlock structure, while still able to handle 2^16 CPUs.
> > 
> > Not completely implemented with assembly yet, to make the algorithm a bit
> > clearer.
> > 
> > The queued spinlock has 2 fields, a head and a tail, which are indexes
> > into a FIFO of waiting CPUs. To take a spinlock, a CPU performs an
> > "atomic_inc_return" on the head index, and keeps the returned value as
> > a ticket. The CPU then spins until the tail index is equal to that
> > ticket.
> > 
> > To unlock a spinlock, the tail index is incremented (this can be non
> > atomic, because only the lock owner will modify tail).
> > 
> > Implementation inefficiencies aside, this change should have little
> > effect on performance for uncontended locks, but will have quite a
> > large cost for highly contended locks [O(N) cacheline transfers vs
> > O(1) per lock aquisition, where N is the number of CPUs contending].
> > The benefit is is that contended locks will not cause any starvation.
> > 
> > Just an idea. Big NUMA hardware seems to have fairness logic that
> > prevents starvation for the regular spinlock logic. But it might be
> > interesting for -rt kernel or systems with starvation issues. 
> 
> It's a very nice idea Nick.
> 
> You also have for free the number or cpus that are before you.
> 
> On big SMP/NUMA, we could use this information to call a special lock_cpu_relax() function to avoid cacheline transferts.
> 
> 	asm volatile(LOCK_PREFIX "xaddw %0, %1\n\t"
> 		     : "+r" (pos), "+m" (lock->qhead) : : "memory");
> 	for (;;) {
> 		unsigned short nwait = pos - lock->qtail;
> 		if (likely(nwait == 0))
> 			break;
> 		lock_cpu_relax(lock, nwait);
> 	}
> 
> lock_cpu_relax(raw_spinlock_t *lock, unsigned int nwait)
> {
> unsigned int cycles = nwait * lock->min_cycles_per_round;
> busy_loop(cycles);
> }

Yeah that's true, it would give you a lot more information for use in a
lock backoff system, so that might counter the lesser efficiency of queued
locks under contention.

You could perhaps even wait using monitor/mwait on the lock cacheline
if there are more than X CPUs in front of you. I'm not sure if the cost of
mwait is entirely within the core that executes it, or whether it slows
down cache coherency as well (which would make that idea a showstopper).


  reply	other threads:[~2007-03-23  9:59 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-23  8:59 [rfc][patch] queued spinlocks (i386) Nick Piggin
2007-03-23  9:40 ` Eric Dumazet
2007-03-23  9:59   ` Nick Piggin [this message]
2007-03-23 19:27   ` Ravikiran G Thirumalai
2007-03-23 10:04 ` Ingo Molnar
2007-03-23 10:10   ` Nick Piggin
2007-03-23 16:48     ` Parag Warudkar
2007-03-23 18:15     ` Davide Libenzi
2007-03-23 10:32   ` Nick Piggin
2007-03-23 10:40     ` Eric Dumazet
2007-03-23 11:02     ` William Lee Irwin III
2007-03-24 15:55     ` Nikita Danilov
2007-03-24 17:29       ` Ingo Molnar
2007-03-24 18:49         ` Nikita Danilov
2007-03-28  6:43         ` Nick Piggin
2007-03-28 19:26           ` Davide Libenzi
2007-03-28 22:00             ` Davide Libenzi
2007-03-29  1:36               ` Nick Piggin
2007-03-29  7:16                 ` Nick Piggin
2007-03-30  0:27                   ` Davide Libenzi
2007-03-30  1:59                     ` Nick Piggin
2007-03-30  2:43                       ` Davide Libenzi
2007-03-29  1:24             ` Nick Piggin
2007-03-24 21:41     ` Andrew Morton
2007-03-28  6:56       ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070323095946.GC11577@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=dada1@cosmosbay.com \
    --cc=kiran@scalex86.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox