From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: Xen 4.3 development update Date: Fri, 3 May 2013 17:41:35 +0100 Message-ID: <5183E8BF.7030706@eu.citrix.com> References: <515B186F02000078000CA1A7@nat28.tlf.novell.com> <20130402163440.GB17022@ocelot.phlegethon.org> <515BF5F102000078000CA39C@nat28.tlf.novell.com> <515C0A0D.6020007@eu.citrix.com> <4691AE88-FA92-4826-BF5E-50175BACA5D9@gmail.com> <20130404152321.GI42936@ocelot.phlegethon.org> <20130404170507.GJ42936@ocelot.phlegethon.org> <517E73DE.7090803@brockmann-consult.de> <20130502154856.GO65547@ocelot.phlegethon.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20130502154856.GO65547@ocelot.phlegethon.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Tim Deegan Cc: Andres Lagar-Cavilla , Peter Maloney , "suravee.suthikulpanit@amd.com" , Jan Beulich , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 02/05/13 16:48, Tim Deegan wrote: > At 15:21 +0200 on 29 Apr (1367248894), Peter Maloney wrote: >> On 04/04/2013 07:05 PM, Tim Deegan wrote: >>> Also, if there is still a bad slowdown, caused by the p2m lookups, this >>> might help a little bit: >>> >>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c >>> index 38e87ce..7bd8646 100644 >>> --- a/xen/arch/x86/hvm/hvm.c >>> +++ b/xen/arch/x86/hvm/hvm.c >>> @@ -1361,6 +1361,18 @@ int hvm_hap_nested_page_fault(paddr_t gpa, >>> } >>> } >>> >>> + >>> + /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs, >>> + * a fast path for LAPIC accesses, skipping the p2m lookup. */ >>> + if ( !nestedhvm_vcpu_in_guestmode(v) >>> + && gfn == vlapic_base_address(vcpu_vlapic(current)) >> PAGE_SHIFT ) >>> + { >>> + if ( !handle_mmio() ) >>> + hvm_inject_hw_exception(TRAP_gp_fault, 0); >>> + rc = 1; >>> + goto out; >>> + } >>> + >>> p2m = p2m_get_hostp2m(v->domain); >>> mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, >>> P2M_ALLOC | (access_w ? P2M_UNSHARE : 0), NULL); >> This patch (applied to 4.2.2) has a very large improvement on my box >> (AMD FX-8150) and WinXP 32 bit. > Hmm - I expected it to be only a mild improvement. How about this one, > which puts in the same shortcut in another place as well? I don't think > it will be much better than the last one, but it's worth a try. So I dusted off my old perf testing scripts and added in one to measure boot performance. Below are boot times, from after "xl create" returns, until a specific python daemon running in the VM starts responding to requests. So lower is better. There are a number of places where there can be a few seconds of noise either way, but on the whole the tests seem fairly repeatable. I ran this with w2k3eesp2 and with winxpsp3, using some of the auto-install test images made for the XenServer regression testing. All of them are using a flat file disk backend with qemu-traditional. Results are in order of commits: Xen 4.1: w2k3: 43 34 34 33 34 winxp: 110 111 111 110 112 Xen 4.2: w2k3: 34 44 45 45 45 winxp: 203 221 210 211 200 Xen-unstable w/ RTC fix: w2k3: 43 44 44 45 44 winxp: 268 275 265 276 265 Xen-unstable with rtc fix + this "fast lapic" patch: w2k3: 43 45 44 45 45 winxp: 224 232 232 232 232 So w2k3 boots fairly quickly anyway; has a 50% slow-down when moving from 4.1 to 4.2, and no discernible change after that. winxp boots fairly slowly; nearly doubles in speed for 4.2, and gets even worse for xen-unstable. The patch is a measurable improvement, but still nowhere near 4.1, or even 4.2. On the whole however -- I'm not sure that boot time by itself is a blocker. If the problem really is primarily the "eager TPR" issue for Windows XP, then I'm not terribly motivated either: the Citrix PV drivers patch Windows XP to modify the routine to be lazy (like w2k3); there is hardware available which allows the TPR to be virtualized; and there are plenty of Windows-based OSes available which do not have this problem. I'll be doing some more workload-based benchmarks (probably starting with the Windows ddk example build) to see if there are other issues I turn up. -George