All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Keir Fraser <keir.fraser@eu.citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
	Juergen Gross <juergen.gross@fujitsu-siemens.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: [Patch 2 of 2]: PV-domain SMP performance Linux-part
Date: Fri, 16 Jan 2009 09:41:39 -0800	[thread overview]
Message-ID: <4970C6D3.2080206@goop.org> (raw)
In-Reply-To: <C59609B4.21084%keir.fraser@eu.citrix.com>

Keir Fraser wrote:
> On 16/01/2009 09:36, "Juergen Gross" <juergen.gross@fujitsu-siemens.com>
> wrote:
>
>   
>>> The approach taken in Linux is not merely 'yield on spinlock' by the way, it
>>> is 'block on event channel on spinlock' essentially turning a contended
>>> spinlock into a sleeping mutex. I think that is quite different behaviour
>>> from merely yielding, and expecting the scheduler to do something sensible
>>> with your yield request.
>>>       
>> Could you explain this a little bit more in detail, please?
>>     
>
> Jeremy Fitzhardinge did the implementation for Linux, so I'm cc'ing him in
> case he remembers more details than me.
>
> Basically each CPU allocates itself an IPI event channel at start of day.
> When a CPU attempts to acquire a spinlock it spins a short while (perhaps a
> few microseconds?) and then adds itself to a bitmap stored in the lock
> structure (I think, or it might be a linked list of sleepers?). It then
> calls SCHEDOP_poll listing its IPI evtchn as its wakeup requirement. When
> the lock holder releases the lock it checks for sleepers and if it sees one
> then it pings one of them (or is it all of them?) on its event channel, thus
> waking it to take the lock.
>   

Yes, that's more or less right.  Each lock has a count of how many cpus 
are waiting for the lock; if its non-zero on unlock, the unlocker kicks 
all the waiting cpus via IPI.  There's a per-cpu variable of "lock I am 
waiting for"; the kicker looks at each cpu's entry and kicks it if its 
waiting for the lock being unlocked.

The locking side does the expected "spin for a while, then block on 
timeout".  The timeout is settable if you have the appropriate debugfs 
option enabled (which also produces quite a lot of detailed stats about 
locking behaviour).  The IPI is never delivered as an event BTW; the 
locker uses the event poll hypercall to block until the event is pending 
(this hypercall had some performance problems until relatively recent 
versions of Xen; I'm not sure which release versions has the fix).

The lock itself is a simple byte spinlock, with no fairness guarantees; 
I'm assuming (hoping) that the pathological cases that ticket locks were 
introduced to solve will be mitigated by the timeout/blocking path 
(and/or less likely in a virtual environment anyway).

I measured a small performance improvement within the domain with this 
patch (kernbench-type workload), but an overall 10% reduction in 
system-wide CPU use with multiple competing domains.

    J

  reply	other threads:[~2009-01-16 17:41 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-17 12:22 [Patch 2 of 2]: PV-domain SMP performance Linux-part Juergen Gross
2008-12-17 15:06 ` Jan Beulich
2008-12-18  7:18   ` Juergen Gross
2008-12-18  7:41     ` Jan Beulich
2008-12-19  8:12       ` Juergen Gross
2008-12-19  9:10         ` Keir Fraser
2008-12-19  9:25           ` Juergen Gross
2008-12-19  9:56             ` Keir Fraser
2009-01-16  7:16               ` Juergen Gross
2009-01-16  7:38                 ` Venefax
2009-01-16  7:48                   ` Juergen Gross
2009-01-16  7:57                     ` Venefax
2009-01-16  8:19                       ` Juergen Gross
2009-01-16 10:16                     ` James Harper
2009-01-16 10:31                       ` Juergen Gross
2009-01-16 10:41                       ` Keir Fraser
2009-01-16 11:01                         ` James Harper
2009-01-16 11:14                           ` Keir Fraser
2009-01-16 11:18                           ` Jan Beulich
2009-01-16 14:40                           ` Steve Prochniak
2009-01-16 17:43                       ` Jeremy Fitzhardinge
2009-01-16  8:17                 ` Keir Fraser
2009-01-16  9:36                   ` Juergen Gross
2009-01-16  9:53                     ` Keir Fraser
2009-01-16 17:41                       ` Jeremy Fitzhardinge [this message]
2009-01-19 17:15                         ` George Dunlap
2009-01-20 20:12                           ` Jeremy Fitzhardinge
2008-12-19  9:33           ` Jan Beulich
2008-12-19  9:56             ` Keir Fraser
2008-12-19 15:15               ` George Dunlap
2009-01-12 12:55                 ` Juergen Gross
2009-01-19 17:32                   ` George Dunlap
2009-01-20  7:56                     ` Juergen Gross
2008-12-19 10:06             ` Juergen Gross
2008-12-19 10:36               ` Jan Beulich
2008-12-19 10:42                 ` Juergen Gross
2008-12-19 10:48                 ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4970C6D3.2080206@goop.org \
    --to=jeremy@goop.org \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=juergen.gross@fujitsu-siemens.com \
    --cc=keir.fraser@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.