From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Lieven Subject: Re: performance trouble Date: Wed, 28 Mar 2012 10:38:01 +0200 Message-ID: <4F72CDE9.4050206@dlh.net> References: <20120222163356.GE26955@nfs-rbx.ovh.net> <201203271812.12374.vrozenfe@redhat.com> <4F71E7CB.9000709@dlh.net> <201203271906.28163.vrozenfe@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Gleb Natapov , David Cure , Avi Kivity , kvm@vger.kernel.org To: Vadim Rozenfeld Return-path: Received: from ssl.dlh.net ([91.198.192.8]:60463 "EHLO ssl.dlh.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752856Ab2C1IiE (ORCPT ); Wed, 28 Mar 2012 04:38:04 -0400 In-Reply-To: <201203271906.28163.vrozenfe@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 27.03.2012 19:06, Vadim Rozenfeld wrote: > On Tuesday, March 27, 2012 06:16:11 PM Peter Lieven wrote: >> On 27.03.2012 18:12, Vadim Rozenfeld wrote: >>> On Tuesday, March 27, 2012 05:58:01 PM Peter Lieven wrote: >>>> On 27.03.2012 17:44, Vadim Rozenfeld wrote: >>>>> On Tuesday, March 27, 2012 04:06:13 PM Peter Lieven wrote: >>>>>> On 27.03.2012 14:29, Gleb Natapov wrote: >>>>>>> On Tue, Mar 27, 2012 at 02:28:04PM +0200, Peter Lieven wrote: >>>>>>>> On 27.03.2012 14:26, Gleb Natapov wrote: >>>>>>>>> On Tue, Mar 27, 2012 at 02:20:23PM +0200, Peter Lieven wrote: >>>>>>>>>> On 27.03.2012 12:00, Gleb Natapov wrote: >>>>>>>>>>> On Tue, Mar 27, 2012 at 11:26:29AM +0200, Peter Lieven wrote: >>>>>>>>>>>> On 27.03.2012 11:23, Vadim Rozenfeld wrote: >>>>>>>>>>>>> On Tuesday, March 27, 2012 10:56:05 AM Gleb Natapov wrote: >>>>>>>>>>>>>> On Mon, Mar 26, 2012 at 10:11:43PM +0200, Vadim Rozenfeld > wrote: >>>>>>>>>>>>>>> On Monday, March 26, 2012 08:54:50 PM Peter Lieven wrote: >>>>>>>>>>>>>>>> On 26.03.2012 20:36, Vadim Rozenfeld wrote: >>>>>>>>>>>>>>>>> On Monday, March 26, 2012 07:52:49 PM Gleb Natapov wrote: >>>>>>>>>>>>>>>>>> On Mon, Mar 26, 2012 at 07:46:03PM +0200, Vadim Rozenfeld >>>>> wrote: >>>>>>>>>>>>>>>>>>> On Monday, March 26, 2012 07:00:32 PM Peter Lieven wrote: >>>>>>>>>>>>>>>>>>>> On 22.03.2012 10:38, Vadim Rozenfeld wrote: >>>>>>>>>>>>>>>>>>>>> On Thursday, March 22, 2012 10:52:42 AM Peter Lieven >>> wrote: >>>>>>>>>>>>>>>>>>>>>> On 22.03.2012 09:48, Vadim Rozenfeld wrote: >>>>>>>>>>>>>>>>>>>>>>> On Thursday, March 22, 2012 09:53:45 AM Gleb Natapov >>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> On Wed, Mar 21, 2012 at 06:31:02PM +0100, Peter >>>>>>>>>>>>>>>>>>>>>>>> Lieven >>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> On 21.03.2012 12:10, David Cure wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> hello, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb >>>>>>>>>>>>>>>>>>>>>>>>>> Natapov >>>>>>>>>>>>> ecrivait : >>>>>>>>>>>>>>>>>>>>>>>>>>> Try to add>>>>>>>>>>>>>>>>>>>>>>>>>> name='hypervisor'/> to cpu definition in XML >>>>>>>>>>>>>>>>>>>>>>>>>>> and check command line. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> ok I try this but I can't use >>>>>>>>>>>>>>>>>>>>>>>>>> to map the host cpu >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> (my libvirt is 0.9.8) so I use : >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Opteron_G3 >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> name='hypervisor'/> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> (the physical server use Opteron CPU). >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> The log is here : >>>>>>>>>>>>>>>>>>>>>>>>>> http://www.roullier.net/Report/report-3.2-vhost-ne >>>>>>>>>>>>>>>>>>>>>>>>>> t- 1v cpu-cp u.tx t.gz >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> And now with only 1 vcpu, the response time is >>>>>>>>>>>>>>>>>>>>>>>>>> 8.5s, great >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> improvment. We keep this configuration for >>>>>>>>>>>>>>>>>>>>>>>>>> production >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> : we check the response time when some other users >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> are connected. >>>>>>>>>>>>>>>>>>>>>>>>> please keep in mind, that setting -hypervisor, >>>>>>>>>>>>>>>>>>>>>>>>> disabling hpet and only one vcpu >>>>>>>>>>>>>>>>>>>>>>>>> makes windows use tsc as clocksource. you have to >>>>>>>>>>>>>>>>>>>>>>>>> make sure, that your vm is not switching between >>>>>>>>>>>>>>>>>>>>>>>>> physical sockets on your system and that you have >>>>>>>>>>>>>>>>>>>>>>>>> constant_tsc feature to have a stable tsc between >>>>>>>>>>>>>>>>>>>>>>>>> the cores in the same socket. its also likely that >>>>>>>>>>>>>>>>>>>>>>>>> the vm will crash when live migrated. >>>>>>>>>>>>>>>>>>>>>>>> All true. I asked to try -hypervisor only to verify >>>>>>>>>>>>>>>>>>>>>>>> where we loose performance. Since you get good >>>>>>>>>>>>>>>>>>>>>>>> result with it frequent access to PM timer is >>>>>>>>>>>>>>>>>>>>>>>> probably the reason. I do not recommend using >>>>>>>>>>>>>>>>>>>>>>>> -hypervisor for production! >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> @gleb: do you know whats the state of in-kernel >>>>>>>>>>>>>>>>>>>>>>>>> hyper-v timers? >>>>>>>>>>>>>>>>>>>>>>>> Vadim is working on it. I'll let him answer. >>>>>>>>>>>>>>>>>>>>>>> It would be nice to have synthetic timers supported. >>>>>>>>>>>>>>>>>>>>>>> But, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> at the moment, I'm only researching this feature. >>>>>>>>>>>>>>>>>>>>>> So it will take months at least? >>>>>>>>>>>>>>>>>>>>> I would say weeks. >>>>>>>>>>>>>>>>>>>> Is there a way, we could contribute and help you with >>>>>>>>>>>>>>>>>>>> this? >>>>>>>>>>>>>>>>>>> Hi Peter, >>>>>>>>>>>>>>>>>>> You are welcome to add an appropriate handler. >>>>>>>>>>>>>>>>>> I think Vadim refers to this HV MSR >>>>>>>>>>>>>>>>>> http://msdn.microsoft.com/en-us/library/windows/hardware/f >>>>>>>>>>>>>>>>>> f5 42 633%28 v=vs .85 %29.aspx >>>>>>>>>>>>>>>>> This one is pretty simple to support. Please see >>>>>>>>>>>>>>>>> attachments for more details. I was thinking about >>>>>>>>>>>>>>>>> synthetic timers http://msdn.microsoft.com/en- >>>>>>>>>>>>>>>>> us/library/windows/hardware/ff542758(v=vs.85).aspx >>>>>>>>>>>>>>>> is this what microsoft qpc uses as clocksource in hyper-v? >>>>>>>>>>>>>>> Yes, it should be enough for Win7 / W2K8R2. >>>>>>>>>>>>>> To clarify the thing that microsoft qpc uses is what is >>>>>>>>>>>>>> implemented by the patch Vadim attached to his previous email. >>>>>>>>>>>>>> But I believe that additional qemu patch is needed for Windows >>>>>>>>>>>>>> to actually use it. >>>>>>>>>>>>> You are right. >>>>>>>>>>>>> bits 1 and 9 must be set to on in leaf 0x40000003 and HPET >>>>>>>>>>>>> should be completely removed from ACPI. >>>>>>>>>>>> could you advise how to do this and/or make a patch? >>>>>>>>>>>> >>>>>>>>>>>> the stuff you send yesterday is for qemu, right? would >>>>>>>>>>>> it be possible to use it in qemu-kvm also? >>>>>>>>>>> No, they are for kernel. >>>>>>>>>> i meant the qemu.diff file. >>>>>>>>> Yes, I missed the second attachment. >>>>>>>>> >>>>>>>>>> if i understand correctly i have to pass -cpu host,+hv_refcnt to >>>>>>>>>> qemu? >>>>>>>>> Looks like it. >>>>>>>> ok, so it would be interesting if it helps to avoid the pmtimer >>>>>>>> reads we observed earlier. right? >>>>>>> Yes. >>>>>> first feedback: performance seems to be amazing. i cannot confirm that >>>>>> it breaks hv_spinlocks, hv_vapic and hv_relaxed. >>>>>> why did you assume this? >>>>> I didn't mean that hv_refcnt will break any other hyper-v features. >>>>> I just want to say that turning hv_refcnt on (as any other hv_ option) >>>>> will crash Win8 on boot-up. >>>> yes, i got it meanwhile ;-) >>>> >>>> let me know what you think should be done to further test >>>> the refcnt implementation. >>>> >>>> i would suggest to return at least 0xFFFFFFFF if msr 0x40000021 >>>> is read. >>> IIRC Win7(W2k8R2) only reads this MSR. Win8 reads and writes. >> you mean win7 only writes, don't you? > Oh, yes. It only writes. > Actually it works this way: kernel allocates one page, maps it into the system > space and writes the its address to > 0x40000021 MSR. But to make guest accessing this page, the partition > reference tsc facility must be enabled, otherwise you can keep any garbage > there without breaking your guest. yes, but garbage is different from failing the msr read, isn't it? another thing i came across with the refcnt. the spec says it runs at units of 100ns. from what is see inside windows (e.g. when pinging a local device in the network). it seems that it is 100times to slow. i will fix this and test further. peter > >> at least you put a break in set_msr_hyperv for this msr. >> >> i just thought that it would be ok to return the value that >> is defined for iTSC is not supported? >> >> peter >> >>>> peter >>>> >>>>> Cheers, >>>>> Vadim. >>>>> >>>>>> no more pmtimer reads. i can now almost fully utililizy a 1GBit >>>>>> interface with a file transfer while there was not one >>>>>> cpu core fully utilized as observed with pmtimer. some live migration >>>>>> tests revealed that it did not crash even under load. >>>>>> >>>>>> @vadim: i think we need a proper patch for the others to test this ;-) >>>>>> >>>>>> what i observed: is it right, that HV_X64_MSR_TIME_REF_COUNT is >>>>>> missing in msrs_to_save[] in x86/x86.c of the kernel module? >>>>>> >>>>>> thanks for you help, >>>>>> peter >>>>>> >>>>>>> -- >>>>>>> >>>>>>> Gleb.