From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: Xen 4.3 development update Date: Wed, 3 Apr 2013 11:51:20 +0100 Message-ID: <515C09A8.8010104@eu.citrix.com> References: <515B186F02000078000CA1A7@nat28.tlf.novell.com> <20130402163440.GB17022@ocelot.phlegethon.org> <515B101A.7050501@amd.com> <515B6E4B.8080108@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <515B6E4B.8080108@amd.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Suravee Suthikulanit Cc: "Tim (Xen.org)" , Andres Lagar-Cavilla , Jan Beulich , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 03/04/13 00:48, Suravee Suthikulanit wrote: > On 4/2/2013 12:06 PM, Suravee Suthikulpanit wrote: >> On 4/2/2013 11:34 AM, Tim Deegan wrote: >>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote: >>>>>>> On 02.04.13 at 16:07, George Dunlap >>>>>>> wrote: >>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7 >>>>> owner: ? >>>>> Reference: http://marc.info/?l=xen-devel&m=135075376805215 >>>> This is supposedly fixed with the RTC changes Tim committed the >>>> other day. Suravee, is that correct? >>> This is a separate problem. IIRC the AMD XP perf issue is caused by the >>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking >>> patches. XP doesn't have 'lazy IRQL' or support for CR8, so it takes a >>> _lot_ of vmexits for IRQL reads and writes. >> Is there any tools or good ways to count the number of VMexit in Xen? >> > Tim/Jan, > > I have used iperf benchmark to compare network performance (bandwidth) > between the two versions of the hypervisor: > 1. good: 24769:730f6ed72d70 > 2. bad: 24770:7f79475d3de7 > > In the "bad" case, I am seeing that the network bandwidth has dropped > about 13-15%. > > However, when I uses the xentrace utility to trace the number of VMEXIT, > I actually see about 25% more number of VMEXIT in the good case. This > is inconsistent with the statement that Tim mentioned above. I was going to say, what I remember from my little bit of investigation back in November, was that it had all the earmarks of micro-architectural "drag", which happens when the TLB or the caches can't be effective. Suvaree, if you look at xenalyze, a microarchitectural "drag" looks like: * fewer VMEXITs, but * time for each vmexit takes longer If you post the results of "xenalyze --svm-mode -s" for both traces, I can tell you what I see. -George