From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: Xen 4.3 development update Date: Thu, 25 Apr 2013 16:50:19 +0100 Message-ID: <517950BB.8040108@eu.citrix.com> References: <515B186F02000078000CA1A7@nat28.tlf.novell.com> <20130402163440.GB17022@ocelot.phlegethon.org> <515BF5F102000078000CA39C@nat28.tlf.novell.com> <515C0A0D.6020007@eu.citrix.com> <4691AE88-FA92-4826-BF5E-50175BACA5D9@gmail.com> <20130404152321.GI42936@ocelot.phlegethon.org> <20130425154625.GF37678@ocelot.phlegethon.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20130425154625.GF37678@ocelot.phlegethon.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Tim Deegan Cc: Andres Lagar-Cavilla , Suravee Suthikulpanit , Jan Beulich , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 04/25/2013 04:46 PM, Tim Deegan wrote: > At 16:20 +0100 on 25 Apr (1366906804), George Dunlap wrote: >> On Thu, Apr 4, 2013 at 4:23 PM, Tim Deegan wrote: >>> At 11:34 -0400 on 03 Apr (1364988853), Andres Lagar-Cavilla wrote: >>>> On Apr 3, 2013, at 6:53 AM, George Dunlap wrote: >>>> >>>>> On 03/04/13 08:27, Jan Beulich wrote: >>>>>>>>> On 02.04.13 at 18:34, Tim Deegan wrote: >>>>>>> This is a separate problem. IIRC the AMD XP perf issue is caused by the >>>>>>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking >>>>>>> patches. XP doesn't have 'lazy IRQL' or support for CR8, so it takes a >>>>>>> _lot_ of vmexits for IRQL reads and writes. >>>>>> Ah, okay, sorry for mixing this up. But how is this a regression >>>>>> then? >>>>> >>>>> My sense, when I looked at this back whenever that there was much more to this. The XP IRQL updating is a problem, but it's made terribly worse by the changset in question. It seemed to me like the kind of thing that would be caused by TLB or caches suddenly becoming much less effective. >>>> >>>> The commit in question does not add p2m mutations, so it doesn't nuke the NPT/EPT TLBs. It introduces a spin lock in the hot path and that is the problem. Later in the 4.2 cycle we changed the common case to use an rwlock. Does the same perf degradation occur with tip of 4.2? >>>> >>> >>> Yes, 4.2 is definitely slower. A compile test on a 4-vcpu VM that takes >>> about 12 minutes before this locking change takes more than 20 minutes >>> on the current tip of xen-unstable (I gave up at 22 minutes and rebooted >>> to test something else). >> >> Tim, >> >> Can you go into a bit more detail about what you complied on what kind of OS? > > I was compiling on Win XP sp3, 32-bit, 1vcpu, 4G ram. The compile was > the Windows DDK sample code. > > As I think I mentioned later, all my measurements are extremely suspect > as I was relying on guest wallclock time, and the 'before' case was > before the XP wallclock time was fixed. :( > >> The VM was a Debian Wheezy VM, stock kernel (3.2), PVHVM mode, 1G of >> RAM, 4 vcpus, LVM-backed 8G disk. > > I suspect the TPR access patterns of XP are not seen on linux; it's been > known for long enough now that it's super-slow on emulated platforms and > AFAIK it was only ever Windows that used the TPR so aggressively anyway. Right. IIRC w2k3 sp2 has the "lazy tpr" feature, so if I can get consistent results with that one then we can say... well, we can at least say it's not easy to reproduce. :-) -George