public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Christoph Lameter <clameter@sgi.com>,
	Petr Tesarik <ptesarik@suse.cz>, Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org,
	Nick Piggin <nickpiggin@yahoo.com.au>
Subject: Re: Spinlocks: Factor our GENERIC_LOCKBREAK in order to avoid spin with irqs disable
Date: Mon, 07 Jul 2008 13:14:39 -0700	[thread overview]
Message-ID: <4872792F.9080609@goop.org> (raw)
In-Reply-To: <20080707154651.35d1004c@cuia.bos.redhat.com>

Rik van Riel wrote:
> Alternatively, the guest could tell the host which vcpus
> are next in line for a ticket spinlock, or a vcpu that gets
> scheduled but is not supposed to grab the lock yet can give
> some CPU time to the vcpu that should get the lock next.
>   

Those are possible, but would either 1) require hypervisor changes, 
and/or 2) changes no less extensive than the ones I had to make anyway.

Thomas's proposal was to modify the scheduler to try to avoiding 
preempting vcpus while they're in kernel mode.  That's nice because it 
requires no guest changes, and seems at least somewhat successful at 
mitigating the problem.  But it can't completely solve the problem, and 
you end up with a bunch of heuristics in the hypervisor to decide who to 
preempt.

The other point, of course, is that ticket locks are massive overkill 
for the problem they're trying to solve.  It's one thing to introduce an 
element of fairness into spinlocks, its another to impose strict FIFO 
ordering.  It would be enough to make the locks "polite" by preventing a 
new lock-holder from taking the lock while its under contention.  
Something like:

	union lock {
		unsigned short word;
		struct { unsigned char lock, count; };
	};

spin_lock:		# ebx - lock pointer
	movw	$0x0001, %ax	# add 1 to lock, 0 to count
	xaddw	%ax, (%ebx)	# attempt to take lock and test user count
	testw	%ax,%ax
	jnz	slow

taken:	ret

# slow path
slow:	lock incb 1(%ebx)	# inc count

1:	rep;nop
	cmpb	$0,(%ebx)
	jnz	1b		# wait for unlocked
	
	movb	$1,%al		# attempt to take lock (count already increased)
	xchgb	%al,(%ebx)
	testb	%al,%al
	jnz	1b

	lock decb 1(%ebx)	# drop count
	jmp	taken

spin_unlock:
	movb	$0,(%ebx)
	ret


The uncontended fastpath is similar to the pre-ticket locks, but it 
refuses to take the lock if there are other waiters, even if the lock is 
not currently held.  This prevents the rapid lock-unlock cycle on one 
CPU from starving another CPU, which I understand was the original 
problem tickets locks were trying to solve.

But it also means that all the contended spinners get the lock in 
whatever order the system decides to give it to them, rather than 
imposing a strict order.

> I believe the IBM PPC64 people have done some work to implement
> just that.
>   

Do you have any references?

    J

  reply	other threads:[~2008-07-07 20:15 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-06 19:26 Spinlocks waiting with interrupts disabled / preempt disabled Christoph Lameter
2008-05-07  7:30 ` Ingo Molnar
2008-05-07 17:04   ` Christoph Lameter
2008-05-07 17:24     ` Christoph Lameter
2008-05-07 18:49       ` Spinlocks: Factor our GENERIC_LOCKBREAK in order to avoid spin with irqs disable Christoph Lameter
2008-05-09 10:26         ` Ingo Molnar
2008-05-09 16:28           ` Christoph Lameter
2008-06-23 17:19         ` Petr Tesarik
2008-06-23 17:54           ` Peter Zijlstra
2008-06-23 18:20           ` Christoph Lameter
2008-06-23 20:39             ` Peter Zijlstra
2008-06-23 20:45               ` Christoph Lameter
2008-06-23 20:58                 ` Peter Zijlstra
2008-06-26  2:51                   ` Jeremy Fitzhardinge
2008-06-26  6:51                     ` Peter Zijlstra
2008-06-26 15:49                       ` Jeremy Fitzhardinge
2008-06-26  9:17                     ` Petr Tesarik
2008-06-26 17:02                       ` Christoph Lameter
2008-06-26 17:48                         ` Peter Zijlstra
2008-07-07 11:50                     ` Nick Piggin
2008-07-07 11:52                       ` Nick Piggin
2008-07-07 15:56                         ` Jeremy Fitzhardinge
2008-07-08  2:08                           ` Nick Piggin
2008-07-07 15:53                       ` Jeremy Fitzhardinge
2008-07-07 19:46                     ` Rik van Riel
2008-07-07 20:14                       ` Jeremy Fitzhardinge [this message]
2008-07-08  2:07                         ` Nick Piggin
2008-07-08  5:57                           ` Jeremy Fitzhardinge
2008-07-08  8:41                             ` Nick Piggin
2008-07-08 15:58                               ` Jeremy Fitzhardinge
2008-05-09 16:35   ` Spinlocks waiting with interrupts disabled / preempt disabled Olaf Weber
2008-05-09 17:56     ` Peter Zijlstra
2008-05-09 18:00       ` Christoph Lameter
2008-05-09 18:06         ` Peter Zijlstra
2008-05-09 20:01       ` Olaf Weber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4872792F.9080609@goop.org \
    --to=jeremy@goop.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=clameter@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=ptesarik@suse.cz \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox