From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: [Patch 2 of 2]: PV-domain SMP performance Linux-part Date: Fri, 16 Jan 2009 09:41:39 -0800 Message-ID: <4970C6D3.2080206@goop.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: George Dunlap , Juergen Gross , "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org Keir Fraser wrote: > On 16/01/2009 09:36, "Juergen Gross" > wrote: > > >>> The approach taken in Linux is not merely 'yield on spinlock' by the way, it >>> is 'block on event channel on spinlock' essentially turning a contended >>> spinlock into a sleeping mutex. I think that is quite different behaviour >>> from merely yielding, and expecting the scheduler to do something sensible >>> with your yield request. >>> >> Could you explain this a little bit more in detail, please? >> > > Jeremy Fitzhardinge did the implementation for Linux, so I'm cc'ing him in > case he remembers more details than me. > > Basically each CPU allocates itself an IPI event channel at start of day. > When a CPU attempts to acquire a spinlock it spins a short while (perhaps a > few microseconds?) and then adds itself to a bitmap stored in the lock > structure (I think, or it might be a linked list of sleepers?). It then > calls SCHEDOP_poll listing its IPI evtchn as its wakeup requirement. When > the lock holder releases the lock it checks for sleepers and if it sees one > then it pings one of them (or is it all of them?) on its event channel, thus > waking it to take the lock. > Yes, that's more or less right. Each lock has a count of how many cpus are waiting for the lock; if its non-zero on unlock, the unlocker kicks all the waiting cpus via IPI. There's a per-cpu variable of "lock I am waiting for"; the kicker looks at each cpu's entry and kicks it if its waiting for the lock being unlocked. The locking side does the expected "spin for a while, then block on timeout". The timeout is settable if you have the appropriate debugfs option enabled (which also produces quite a lot of detailed stats about locking behaviour). The IPI is never delivered as an event BTW; the locker uses the event poll hypercall to block until the event is pending (this hypercall had some performance problems until relatively recent versions of Xen; I'm not sure which release versions has the fix). The lock itself is a simple byte spinlock, with no fairness guarantees; I'm assuming (hoping) that the pathological cases that ticket locks were introduced to solve will be mitigated by the timeout/blocking path (and/or less likely in a virtual environment anyway). I measured a small performance improvement within the domain with this patch (kernbench-type workload), but an overall 10% reduction in system-wide CPU use with multiple competing domains. J