Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Juergen Gross <jgross@suse.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>,
	linux-kernel@vger.kernel.org, x86@kernel.org, hpa@zytor.com,
	tglx@linutronix.de, mingo@redhat.com,
	xen-devel@lists.xensource.com, konrad.wilk@oracle.com,
	david.vrabel@citrix.com, boris.ostrovsky@oracle.com,
	chrisw@sous-sol.org, akataria@vmware.com, rusty@rustcorp.com.au,
	virtualization@lists.linux-foundation.org, gleb@kernel.org,
	pbonzini@redhat.com, kvm@vger.kernel.org
Subject: Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead
Date: Mon, 18 May 2015 10:11:46 +0200	[thread overview]
Message-ID: <55599EC2.8090203@suse.com> (raw)
In-Reply-To: <20150517053036.GB16607@gmail.com>

On 05/17/2015 07:30 AM, Ingo Molnar wrote:
>
> * Juergen Gross <jgross@suse.com> wrote:
>
>> On 05/05/2015 07:21 PM, Jeremy Fitzhardinge wrote:
>>> On 05/03/2015 10:55 PM, Juergen Gross wrote:
>>>> I did a small measurement of the pure locking functions on bare metal
>>>> without and with my patches.
>>>>
>>>> spin_lock() for the first time (lock and code not in cache) dropped from
>>>> about 600 to 500 cycles.
>>>>
>>>> spin_unlock() for first time dropped from 145 to 87 cycles.
>>>>
>>>> spin_lock() in a loop dropped from 48 to 45 cycles.
>>>>
>>>> spin_unlock() in the same loop dropped from 24 to 22 cycles.
>>>
>>> Did you isolate icache hot/cold from dcache hot/cold? It seems to me the
>>> main difference will be whether the branch predictor is warmed up rather
>>> than if the lock itself is in dcache, but its much more likely that the
>>> lock code is icache if the code is lock intensive, making the cold case
>>> moot. But that's pure speculation.
>>>
>>> Could you see any differences in workloads beyond microbenchmarks?
>>>
>>> Not that its my call at all, but I think we'd need to see some concrete
>>> improvements in real workloads before adding the complexity of more pvops.
>>
>> I did another test on a larger machine:
>>
>> 25 kernel builds (time make -j 32) on a 32 core machine. Before each
>> build "make clean" was called, the first result after boot was omitted
>> to avoid disk cache warmup effects.
>>
>> System time without my patches: 861.5664 +/- 3.3665 s
>>                 with my patches: 852.2269 +/- 3.6629 s
>
> So how does the profile look like in the guest, before/after the PV
> spinlock patches? I'm a bit surprised to see so much spinlock
> overhead.

I did another test in Xen dom0:

System time without my patches: 2903 +/- 2 s
                with my patches: 2904 +/- 2 s

BTW, this was what I expected: There should be no significant change in
system time, as the only real difference between both variants in a
guest is an additional 2-byte nop in the inlined unlock function call,
another one in the lock call and one jmp instruction less in the lock
call.

What I didn't expect was the huge performance difference between native
and guest. The used configuration (32 cores with hyperthreads enabled)
surely is one reason for the difference, but still this seems to be too
much. I double checked the results on bare metal, they are still more
or less the same (did only one kernel build resulting in 862 seconds
system time). There seems to be a lot of room for improvement, but
this is another story.

Regarding spinlock overhead: I think the reason I saw about 1% less
system time with my patches was mainly due to less cache misses.
Inlining of the unlock function avoided an additional instruction cache
miss for the unlock function. KT Raghavendra did some benchmarks with
only small user programs and high kernel load which showed nearly no
effect at all.

Additionally I've compared the two kernels using bloat-o-meter:

add/remove: 11/13 grow/shrink: 654/603 up/down: 6046/-31754 (-25708)

with some hot path functions going down in size quite nice, e.g.:

__raw_spin_unlock_irq                        336      90    -246

Juergen

next prev parent reply	other threads:[~2015-05-18  8:11 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-30 10:53 [PATCH 0/6] x86: reduce paravirtualized spinlock overhead Juergen Gross
2015-04-30 10:53 ` [PATCH 1/6] x86: use macro instead of "0" for setting TICKET_SLOWPATH_FLAG Juergen Gross
2015-04-30 10:53 ` [PATCH 2/6] x86: move decision about clearing slowpath flag into arch_spin_lock() Juergen Gross
2015-04-30 10:54 ` [PATCH 3/6] x86: introduce new pvops function clear_slowpath Juergen Gross
2015-04-30 10:54 ` [PATCH 4/6] x86: introduce new pvops function spin_unlock Juergen Gross
2015-04-30 10:54 ` [PATCH 5/6] x86: switch config from UNINLINE_SPIN_UNLOCK to INLINE_SPIN_UNLOCK Juergen Gross
2015-04-30 10:54 ` [PATCH 6/6] x86: remove no longer needed paravirt_ticketlocks_enabled Juergen Gross
2015-04-30 16:39 ` [PATCH 0/6] x86: reduce paravirtualized spinlock overhead Jeremy Fitzhardinge
2015-05-04  5:55   ` Juergen Gross
2015-05-05 17:21     ` Jeremy Fitzhardinge
2015-05-06 11:55       ` Juergen Gross
2015-05-17  5:30         ` Ingo Molnar
2015-05-18  8:11           ` Juergen Gross [this message]
2015-05-15 12:16 ` Juergen Gross
2015-06-08  4:09 ` Juergen Gross
2015-06-16 14:37 ` Juergen Gross
2015-06-16 15:18   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55599EC2.8090203@suse.com \
    --to=jgross@suse.com \
    --cc=akataria@vmware.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=chrisw@sous-sol.org \
    --cc=david.vrabel@citrix.com \
    --cc=gleb@kernel.org \
    --cc=hpa@zytor.com \
    --cc=jeremy.fitzhardinge@citrix.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).