On 4/3/2013 5:51 AM, George Dunlap wrote:
> On 03/04/13 00:48, Suravee Suthikulanit wrote:
>> On 4/2/2013 12:06 PM, Suravee Suthikulpanit wrote:
>>> On 4/2/2013 11:34 AM, Tim Deegan wrote:
>>>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com>
>>>>>>>> wrote:
>>>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>>>>     owner: ?
>>>>>>     Reference: http://marc.info/?l=xen-devel&m=135075376805215
>>>>> This is supposedly fixed with the RTC changes Tim committed the
>>>>> other day. Suravee, is that correct?
>>>> This is a separate problem.  IIRC the AMD XP perf issue is caused 
>>>> by the
>>>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>>>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it 
>>>> takes a
>>>> _lot_ of vmexits for IRQL reads and writes.
>>> Is there any tools or good ways to count the number of VMexit in Xen?
>>>
>> Tim/Jan,
>>
>> I have used iperf benchmark to compare network performance (bandwidth)
>> between the two versions of the hypervisor:
>> 1. good: 24769:730f6ed72d70
>> 2. bad: 24770:7f79475d3de7
>>
>> In the "bad" case, I am seeing that the network bandwidth has dropped
>> about 13-15%.
>>
>> However, when I uses the xentrace utility to trace the number of VMEXIT,
>> I actually see about 25% more number of VMEXIT in the good case.  This
>> is inconsistent with the statement that Tim mentioned above.
>
> I was going to say, what I remember from my little bit of 
> investigation back in November, was that it had all the earmarks of 
> micro-architectural "drag", which happens when the TLB or the caches 
> can't be effective.
>
> Suvaree, if you look at xenalyze, a microarchitectural "drag" looks like:
> * fewer VMEXITs, but
> * time for each vmexit takes longer
>
> If you post the results of "xenalyze --svm-mode -s" for both traces, I 
> can tell you what I see.
>
>  -George
>
George,

Here is the two set of data from xenalyze.

Suravee