From mboxrd@z Thu Jan 1 00:00:00 1970 From: Raghavendra K T Subject: Re: [PATCH RFC 1/1] kvm: Use vcpu_id as pivot instead of last boosted vcpu in PLE handler Date: Tue, 04 Sep 2012 17:27:05 +0530 Message-ID: <5045EC91.9050406@linux.vnet.ibm.com> References: <20120829192100.22412.92575.sendpatchset@codeblue> <20120902101234.GB27250@redhat.com> <50438978.9000405@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Avi Kivity , Marcelo Tosatti , Srikar , "Nikunj A. Dadhania" , KVM , LKML , Srivatsa Vaddagiri To: Rik van Riel , Gleb Natapov Return-path: In-Reply-To: <50438978.9000405@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 09/02/2012 09:59 PM, Rik van Riel wrote: > On 09/02/2012 06:12 AM, Gleb Natapov wrote: >> On Thu, Aug 30, 2012 at 12:51:01AM +0530, Raghavendra K T wrote: >>> The idea of starting from next vcpu (source of yield_to + 1) seem to >>> work >>> well for overcomitted guest rather than using last boosted vcpu. We >>> can also >>> remove per VM variable with this approach. >>> >>> Iteration for eligible candidate after this patch starts from vcpu >>> source+1 >>> and ends at source-1 (after wrapping) >>> >>> Thanks Nikunj for his quick verification of the patch. >>> >>> Please let me know if this patch is interesting and makes sense. >>> >> This last_boosted_vcpu thing caused us trouble during attempt to >> implement vcpu destruction. It is good to see it removed from this POV. > > I like this implementation. It should achieve pretty much > the same as my old code, but without the downsides and without > having to keep the same amount of global state. > My theoretical understanding how it would help is, | V T0 ------- T1 suppose there are 4 vcpus (v1..v4) out of 32/64 vcpus simpultaneously enter directed yield handler, if last_boosted_vcpu = i then v1 .. v4 will start from i, and there may be some unnecessary attempts for directed yields. We may not see such attempts with above patch. But again I agree that, whole directed_yield stuff itself is very complicated because of possibility of each vcpu in different state (running/pauseloop exited while spinning/eligible) and how they are located w.r.t each other. Here is the result I got for ebizzy, 32 vcpu guest 32 core PLE machine for 1x 2x and 3x overcommits. base = 3.5-rc5 kernel with ple handler improvements patches applied patched = base + vcpuid patch base stdev patched stdev %improvement 1x 1955.6250 39.8961 1863.3750 37.8302 -4.71716 2x 2475.3750 165.0307 3078.8750 341.9500 24.38014 3x 2071.5556 91.5370 2112.6667 56.6171 1.98455 Note: I have to admit that, I am seeing very inconsistent results while experimenting with 3.6-rc kernel (not specific to vcpuid patch but as a whole) but not sure if it is some thing wrong in my config or should I spend some time debugging. Anybody has observed same?