xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: Gleb Natapov <gleb@redhat.com>, Andrew Jones <drjones@redhat.com>,
	mingo@redhat.com, ouyang@cs.pitt.edu
Cc: jeremy@goop.org, gregkh@suse.de, kvm@vger.kernel.org,
	linux-doc@vger.kernel.org, peterz@infradead.org,
	virtualization@lists.linux-foundation.org, andi@firstfloor.org,
	hpa@zytor.com, stefano.stabellini@eu.citrix.com,
	xen-devel@lists.xensource.com, x86@kernel.org,
	habanero@linux.vnet.ibm.com, riel@redhat.com,
	konrad.wilk@oracle.com, avi.kivity@gmail.com, tglx@linutronix.de,
	chegu_vinod@hp.com, linux-kernel@vger.kernel.org,
	srivatsa.vaddagiri@gmail.com, attilio.rao@citrix.com,
	pbonzini@redhat.com, torvalds@linux-foundation.org,
	stephan.diestelhorst@amd.com
Subject: Re: [PATCH RFC V9 0/19] Paravirtualized ticket spinlocks
Date: Tue, 09 Jul 2013 14:41:30 +0530	[thread overview]
Message-ID: <51DBD3C2.2040807@linux.vnet.ibm.com> (raw)
In-Reply-To: <51CB2AD9.5060508@linux.vnet.ibm.com>

On 06/26/2013 11:24 PM, Raghavendra K T wrote:
> On 06/26/2013 09:41 PM, Gleb Natapov wrote:
>> On Wed, Jun 26, 2013 at 07:10:21PM +0530, Raghavendra K T wrote:
>>> On 06/26/2013 06:22 PM, Gleb Natapov wrote:
>>>> On Wed, Jun 26, 2013 at 01:37:45PM +0200, Andrew Jones wrote:
>>>>> On Wed, Jun 26, 2013 at 02:15:26PM +0530, Raghavendra K T wrote:
>>>>>> On 06/25/2013 08:20 PM, Andrew Theurer wrote:
>>>>>>> On Sun, 2013-06-02 at 00:51 +0530, Raghavendra K T wrote:
>>>>>>>> This series replaces the existing paravirtualized spinlock
>>>>>>>> mechanism
>>>>>>>> with a paravirtualized ticketlock mechanism. The series provides
>>>>>>>> implementation for both Xen and KVM.
>>>>>>>>
>>>>>>>> Changes in V9:
>>>>>>>> - Changed spin_threshold to 32k to avoid excess halt exits that are
>>>>>>>>     causing undercommit degradation (after PLE handler
>>>>>>>> improvement).
>>>>>>>> - Added  kvm_irq_delivery_to_apic (suggested by Gleb)
>>>>>>>> - Optimized halt exit path to use PLE handler
>>>>>>>>
>>>>>>>> V8 of PVspinlock was posted last year. After Avi's suggestions
>>>>>>>> to look
>>>>>>>> at PLE handler's improvements, various optimizations in PLE
>>>>>>>> handling
>>>>>>>> have been tried.
>>>>>>>
>>>>>>> Sorry for not posting this sooner.  I have tested the v9
>>>>>>> pv-ticketlock
>>>>>>> patches in 1x and 2x over-commit with 10-vcpu and 20-vcpu VMs.  I
>>>>>>> have
>>>>>>> tested these patches with and without PLE, as PLE is still not
>>>>>>> scalable
>>>>>>> with large VMs.
>>>>>>>
>>>>>>
>>>>>> Hi Andrew,
>>>>>>
>>>>>> Thanks for testing.
>>>>>>
>>>>>>> System: x3850X5, 40 cores, 80 threads
>>>>>>>
>>>>>>>
>>>>>>> 1x over-commit with 10-vCPU VMs (8 VMs) all running dbench:
>>>>>>> ----------------------------------------------------------
>>>>>>>                         Total
>>>>>>> Configuration                Throughput(MB/s)    Notes
>>>>>>>
>>>>>>> 3.10-default-ple_on            22945            5% CPU in host
>>>>>>> kernel, 2% spin_lock in guests
>>>>>>> 3.10-default-ple_off            23184            5% CPU in host
>>>>>>> kernel, 2% spin_lock in guests
>>>>>>> 3.10-pvticket-ple_on            22895            5% CPU in host
>>>>>>> kernel, 2% spin_lock in guests
>>>>>>> 3.10-pvticket-ple_off            23051            5% CPU in host
>>>>>>> kernel, 2% spin_lock in guests
>>>>>>> [all 1x results look good here]
>>>>>>
>>>>>> Yes. The 1x results look too close
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2x over-commit with 10-vCPU VMs (16 VMs) all running dbench:
>>>>>>> -----------------------------------------------------------
>>>>>>>                         Total
>>>>>>> Configuration                Throughput        Notes
>>>>>>>
>>>>>>> 3.10-default-ple_on             6287            55% CPU  host
>>>>>>> kernel, 17% spin_lock in guests
>>>>>>> 3.10-default-ple_off             1849            2% CPU in host
>>>>>>> kernel, 95% spin_lock in guests
>>>>>>> 3.10-pvticket-ple_on             6691            50% CPU in host
>>>>>>> kernel, 15% spin_lock in guests
>>>>>>> 3.10-pvticket-ple_off            16464            8% CPU in host
>>>>>>> kernel, 33% spin_lock in guests
>>>>>>
>>>>>> I see 6.426% improvement with ple_on
>>>>>> and 161.87% improvement with ple_off. I think this is a very good
>>>>>> sign
>>>>>>   for the patches
>>>>>>
>>>>>>> [PLE hinders pv-ticket improvements, but even with PLE off,
>>>>>>>   we still off from ideal throughput (somewhere >20000)]
>>>>>>>
>>>>>>
>>>>>> Okay, The ideal throughput you are referring is getting around
>>>>>> atleast
>>>>>> 80% of 1x throughput for over-commit. Yes we are still far away from
>>>>>> there.
>>>>>>
>>>>>>>
>>>>>>> 1x over-commit with 20-vCPU VMs (4 VMs) all running dbench:
>>>>>>> ----------------------------------------------------------
>>>>>>>                         Total
>>>>>>> Configuration                Throughput        Notes
>>>>>>>
>>>>>>> 3.10-default-ple_on            22736            6% CPU in host
>>>>>>> kernel, 3% spin_lock in guests
>>>>>>> 3.10-default-ple_off            23377            5% CPU in host
>>>>>>> kernel, 3% spin_lock in guests
>>>>>>> 3.10-pvticket-ple_on            22471            6% CPU in host
>>>>>>> kernel, 3% spin_lock in guests
>>>>>>> 3.10-pvticket-ple_off            23445            5% CPU in host
>>>>>>> kernel, 3% spin_lock in guests
>>>>>>> [1x looking fine here]
>>>>>>>
>>>>>>
>>>>>> I see ple_off is little better here.
>>>>>>
>>>>>>>
>>>>>>> 2x over-commit with 20-vCPU VMs (8 VMs) all running dbench:
>>>>>>> ----------------------------------------------------------
>>>>>>>                         Total
>>>>>>> Configuration                Throughput        Notes
>>>>>>>
>>>>>>> 3.10-default-ple_on             1965            70% CPU in host
>>>>>>> kernel, 34% spin_lock in guests
>>>>>>> 3.10-default-ple_off              226            2% CPU in host
>>>>>>> kernel, 94% spin_lock in guests
>>>>>>> 3.10-pvticket-ple_on             1942            70% CPU in host
>>>>>>> kernel, 35% spin_lock in guests
>>>>>>> 3.10-pvticket-ple_off             8003            11% CPU in host
>>>>>>> kernel, 70% spin_lock in guests
>>>>>>> [quite bad all around, but pv-tickets with PLE off the best so far.
>>>>>>>   Still quite a bit off from ideal throughput]
>>>>>>
>>>>>> This is again a remarkable improvement (307%).
>>>>>> This motivates me to add a patch to disable ple when pvspinlock is
>>>>>> on.
>>>>>> probably we can add a hypercall that disables ple in kvm init patch.
>>>>>> but only problem I see is what if the guests are mixed.
>>>>>>
>>>>>>   (i.e one guest has pvspinlock support but other does not. Host
>>>>>> supports pv)
>>>>>
>>>>> How about reintroducing the idea to create per-kvm ple_gap,ple_window
>>>>> state. We were headed down that road when considering a dynamic
>>>>> window at
>>>>> one point. Then you can just set a single guest's ple_gap to zero,
>>>>> which
>>>>> would lead to PLE being disabled for that guest. We could also revisit
>>>>> the dynamic window then.
>>>>>
>>>> Can be done, but lets understand why ple on is such a big problem.
>>>> Is it
>>>> possible that ple gap and SPIN_THRESHOLD are not tuned properly?
>>>>
>>>
>>> The one obvious reason I see is commit awareness inside the guest. for
>>> under-commit there is no necessity to do PLE, but unfortunately we do.
>>>
>>> atleast we return back immediately in case of potential undercommits,
>>> but we still incur vmexit delay.
>> But why do we? If SPIN_THRESHOLD will be short enough (or ple windows
>> long enough) to not generate PLE exit we will not go into PLE handler
>> at all, no?
>>
>
> Yes. you are right. dynamic ple window was an attempt to solve it.
>
> Probelm is, reducing the SPIN_THRESHOLD is resulting in excess halt
> exits in under-commits and increasing ple_window may be sometimes
> counter productive as it affects other busy-wait constructs such as
> flush_tlb AFAIK.
> So if we could have had a dynamically changing SPIN_THRESHOLD too, that
> would be nice.
>

Gleb, Andrew,
I tested with the global ple window change (similar to what I posted 
here https://lkml.org/lkml/2012/11/11/14 ),
But did not see good result. May be it is good to go with per VM
ple_window.

Gleb,
Can you elaborate little more on what you have in mind regarding per VM 
ple_window. (maintaining part of it as a per vm variable is clear to
  me), but is it that we have to load that every time of guest entry?

I 'll try that idea next.

Ingo, Gleb,

 From the results perspective, Andrew Theurer, Vinod's test results are
pro-pvspinlock.
Could you please help me to know what will make it a mergeable
candidate?.

I agree that Jiannan's Preemptable Lock idea is promising and we could
evaluate that  approach, and make the best one get into kernel and also
will carry on discussion with Jiannan to improve that patch.
Experiments so far have been good for smaller machine but it is not
scaling for bigger machines.

  reply	other threads:[~2013-07-09  9:11 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-01 19:21 [PATCH RFC V9 0/19] Paravirtualized ticket spinlocks Raghavendra K T
2013-06-01 19:21 ` [PATCH RFC V9 1/19] x86/spinlock: Replace pv spinlocks with pv ticketlocks Raghavendra K T
2013-06-01 20:32   ` Jeremy Fitzhardinge
2013-06-02  6:54     ` Raghavendra K T
2013-06-01 19:22 ` [PATCH RFC V9 2/19] x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks Raghavendra K T
2013-06-03 15:28   ` Konrad Rzeszutek Wilk
2013-06-01 19:22 ` [PATCH RFC V9 3/19] x86/ticketlock: Collapse a layer of functions Raghavendra K T
2013-06-03 15:28   ` Konrad Rzeszutek Wilk
2013-06-01 19:22 ` [PATCH RFC V9 4/19] xen: Defer spinlock setup until boot CPU setup Raghavendra K T
2013-06-01 19:23 ` [PATCH RFC V9 5/19] xen/pvticketlock: Xen implementation for PV ticket locks Raghavendra K T
2013-06-03 16:03   ` Konrad Rzeszutek Wilk
2013-06-04  7:21     ` Raghavendra K T
2013-06-01 19:23 ` [PATCH RFC V9 6/19] xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks Raghavendra K T
2013-06-01 19:23 ` [PATCH RFC V9 7/19] x86/pvticketlock: Use callee-save for lock_spinning Raghavendra K T
2013-06-01 19:24 ` [PATCH RFC V9 8/19] x86/pvticketlock: When paravirtualizing ticket locks, increment by 2 Raghavendra K T
2013-06-03 15:53   ` Konrad Rzeszutek Wilk
2013-06-01 19:24 ` [PATCH RFC V9 9/19] Split out rate limiting from jump_label.h Raghavendra K T
2013-06-03 15:56   ` Konrad Rzeszutek Wilk
2013-06-04  7:15     ` Raghavendra K T
2013-06-01 19:24 ` [PATCH RFC V9 10/19] x86/ticketlock: Add slowpath logic Raghavendra K T
2013-06-01 19:24 ` [PATCH RFC V9 11/19] xen/pvticketlock: Allow interrupts to be enabled while blocking Raghavendra K T
2013-06-01 19:25 ` [PATCH RFC V9 12/19] xen: Enable PV ticketlocks on HVM Xen Raghavendra K T
2013-06-03 15:57   ` Konrad Rzeszutek Wilk
2013-06-04  7:16     ` Raghavendra K T
2013-06-04 14:44       ` Konrad Rzeszutek Wilk
2013-06-04 15:00         ` Raghavendra K T
2013-06-01 19:25 ` [PATCH RFC V9 13/19] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks Raghavendra K T
2013-06-01 19:25 ` [PATCH RFC V9 14/19] kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration Raghavendra K T
2013-06-01 19:25 ` [PATCH RFC V9 15/19] kvm guest : Add configuration support to enable debug information for KVM Guests Raghavendra K T
2013-06-01 19:25 ` [PATCH RFC V9 16/19] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor Raghavendra K T
2013-06-03 16:00   ` Konrad Rzeszutek Wilk
2013-06-04  7:19     ` Raghavendra K T
2013-06-01 19:26 ` [PATCH RFC V9 17/19] kvm hypervisor : Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic Raghavendra K T
2013-06-01 19:26 ` [PATCH RFC V9 18/19] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock Raghavendra K T
2013-06-03 16:04   ` Konrad Rzeszutek Wilk
2013-06-04  7:22     ` Raghavendra K T
2013-06-01 19:26 ` [PATCH RFC V9 19/19] kvm hypervisor: Add directed yield in vcpu block path Raghavendra K T
2013-06-03 16:05   ` Konrad Rzeszutek Wilk
2013-06-04  7:28     ` Raghavendra K T
2013-06-02  8:07 ` [PATCH RFC V9 0/19] Paravirtualized ticket spinlocks Gleb Natapov
2013-06-02 16:20   ` Jiannan Ouyang
2013-06-03  1:40     ` Raghavendra K T
2013-06-03  6:21       ` Raghavendra K T
2013-06-07  6:15         ` Raghavendra K T
2013-06-07 13:29           ` Andrew Theurer
2013-06-07 23:41           ` Jiannan Ouyang
2013-06-25 14:50 ` Andrew Theurer
2013-06-26  8:45   ` Raghavendra K T
2013-06-26 11:37     ` Andrew Jones
2013-06-26 12:52       ` Gleb Natapov
2013-06-26 13:40         ` Raghavendra K T
2013-06-26 14:39           ` Chegu Vinod
2013-06-26 15:37             ` Raghavendra K T
2013-06-26 16:11           ` Gleb Natapov
2013-06-26 17:54             ` Raghavendra K T
2013-07-09  9:11               ` Raghavendra K T [this message]
2013-07-10 10:33                 ` Gleb Natapov
2013-07-10 10:40                   ` Peter Zijlstra
2013-07-10 10:47                     ` Gleb Natapov
2013-07-10 11:28                       ` Raghavendra K T
2013-07-10 11:29                         ` Gleb Natapov
2013-07-10 11:40                         ` Raghavendra K T
2013-07-10 15:03                       ` Konrad Rzeszutek Wilk
2013-07-10 15:16                         ` Gleb Natapov
2013-07-11  0:12                           ` Konrad Rzeszutek Wilk
2013-07-10 11:24                   ` Raghavendra K T
2013-07-10 11:41                     ` Gleb Natapov
2013-07-10 11:50                       ` Raghavendra K T
2013-07-11  9:13                   ` Raghavendra K T
2013-07-11  9:48                     ` Gleb Natapov
2013-07-11 10:10                       ` Raghavendra K T
2013-07-11 10:11                         ` Gleb Natapov
2013-07-11 10:53                           ` Raghavendra K T
2013-07-11 10:56                             ` Gleb Natapov
2013-07-11 11:14                               ` Raghavendra K T
2013-06-26 14:13         ` Konrad Rzeszutek Wilk
2013-06-26 15:56         ` Andrew Theurer
2013-07-01  9:30           ` Raghavendra K T
  -- strict thread matches above, loose matches on Subject: below --
2013-06-01  8:21 Raghavendra K T
2013-06-01 19:21 ` Raghavendra KT
2013-06-01 20:14 ` Andi Kleen
2013-06-01 20:28   ` Jeremy Fitzhardinge
2013-06-01 20:46     ` Andi Kleen
2013-06-04 10:58   ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51DBD3C2.2040807@linux.vnet.ibm.com \
    --to=raghavendra.kt@linux.vnet.ibm.com \
    --cc=andi@firstfloor.org \
    --cc=attilio.rao@citrix.com \
    --cc=avi.kivity@gmail.com \
    --cc=chegu_vinod@hp.com \
    --cc=drjones@redhat.com \
    --cc=gleb@redhat.com \
    --cc=gregkh@suse.de \
    --cc=habanero@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jeremy@goop.org \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=ouyang@cs.pitt.edu \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=srivatsa.vaddagiri@gmail.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=stephan.diestelhorst@amd.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).