On 4/3/2013 5:51 AM, George Dunlap wrote: > On 03/04/13 00:48, Suravee Suthikulanit wrote: >> On 4/2/2013 12:06 PM, Suravee Suthikulpanit wrote: >>> On 4/2/2013 11:34 AM, Tim Deegan wrote: >>>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote: >>>>>>>> On 02.04.13 at 16:07, George Dunlap >>>>>>>> wrote: >>>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7 >>>>>> owner: ? >>>>>> Reference: http://marc.info/?l=xen-devel&m=135075376805215 >>>>> This is supposedly fixed with the RTC changes Tim committed the >>>>> other day. Suravee, is that correct? >>>> This is a separate problem. IIRC the AMD XP perf issue is caused >>>> by the >>>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking >>>> patches. XP doesn't have 'lazy IRQL' or support for CR8, so it >>>> takes a >>>> _lot_ of vmexits for IRQL reads and writes. >>> Is there any tools or good ways to count the number of VMexit in Xen? >>> >> Tim/Jan, >> >> I have used iperf benchmark to compare network performance (bandwidth) >> between the two versions of the hypervisor: >> 1. good: 24769:730f6ed72d70 >> 2. bad: 24770:7f79475d3de7 >> >> In the "bad" case, I am seeing that the network bandwidth has dropped >> about 13-15%. >> >> However, when I uses the xentrace utility to trace the number of VMEXIT, >> I actually see about 25% more number of VMEXIT in the good case. This >> is inconsistent with the statement that Tim mentioned above. > > I was going to say, what I remember from my little bit of > investigation back in November, was that it had all the earmarks of > micro-architectural "drag", which happens when the TLB or the caches > can't be effective. > > Suvaree, if you look at xenalyze, a microarchitectural "drag" looks like: > * fewer VMEXITs, but > * time for each vmexit takes longer > > If you post the results of "xenalyze --svm-mode -s" for both traces, I > can tell you what I see. > > -George > George, Here is the two set of data from xenalyze. Suravee