On 4/3/2013 5:51 AM, George Dunlap wrote:
> On 03/04/13 00:48, Suravee Suthikulanit wrote:
>> On 4/2/2013 12:06 PM, Suravee Suthikulpanit wrote:
>>> On 4/2/2013 11:34 AM, Tim Deegan wrote:
>>>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com>
>>>>>>>> wrote:
>>>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>>>>     owner: ?
>>>>>>     Reference: http://marc.info/?l=xen-devel&m=135075376805215
>>>>> This is supposedly fixed with the RTC changes Tim committed the
>>>>> other day. Suravee, is that correct?
>>>> This is a separate problem.  IIRC the AMD XP perf issue is caused 
>>>> by the
>>>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>>>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it 
>>>> takes a
>>>> _lot_ of vmexits for IRQL reads and writes.
>>> Is there any tools or good ways to count the number of VMexit in Xen?
>>>
>> Tim/Jan,
>>
>> I have used iperf benchmark to compare network performance (bandwidth)
>> between the two versions of the hypervisor:
>> 1. good: 24769:730f6ed72d70
>> 2. bad: 24770:7f79475d3de7
>>
>> In the "bad" case, I am seeing that the network bandwidth has dropped
>> about 13-15%.
>>
>> However, when I uses the xentrace utility to trace the number of VMEXIT,
>> I actually see about 25% more number of VMEXIT in the good case.  This
>> is inconsistent with the statement that Tim mentioned above.
>
> I was going to say, what I remember from my little bit of 
> investigation back in November, was that it had all the earmarks of 
> micro-architectural "drag", which happens when the TLB or the caches 
> can't be effective.
>
> Suvaree, if you look at xenalyze, a microarchitectural "drag" looks like:
> * fewer VMEXITs, but
> * time for each vmexit takes longer
>
> If you post the results of "xenalyze --svm-mode -s" for both traces, I 
> can tell you what I see.
>
>  -George
>

Here's another version of the outputs from xenalyze with only VMEXIT.  
In this case, I pin all the VCPUs (4) and pin my application process to 
VCPU 3.

NOTE: This measurement is without the RTC bug.

BAD:
-- v3 --
  Runstates:
    running:       1  4.51s 10815429411 
{10815429411|10815429411|10815429411}
  cpu affinity:       1 10816540697 {10816540697|10816540697|10816540697}

    [7]:       1 10816540697 {10816540697|10816540697|10816540697}
Exit reasons:
  VMEXIT_CR0_READ           633  0.00s  0.00%  1503 cyc { 1092| 1299| 2647}
  VMEXIT_CR4_READ             3  0.00s  0.00%  1831 cyc { 1309| 1659| 2526}
  VMEXIT_CR0_WRITE          305  0.00s  0.00%  1660 cyc { 1158| 1461| 2507}
  VMEXIT_CR4_WRITE            6  0.00s  0.00% 19771 cyc { 1738| 5031|79600}
  VMEXIT_EXCEPTION_NM         1  0.00s  0.00%  2272 cyc { 2272| 2272| 2272}
  VMEXIT_INTR                28  0.00s  0.00%  3374 cyc { 1225| 3770| 6095}
  VMEXIT_VINTR              388  0.00s  0.00%  1023 cyc {  819|  901| 1744}
  VMEXIT_PAUSE               33  0.00s  0.00%  7476 cyc { 4881| 6298|18941}
  VMEXIT_HLT                388  3.35s 14.84% 20701800 cyc 
{169589|3848166|55770601}
  VMEXIT_IOIO              5581  0.19s  0.85% 82514 cyc { 4250|81909|146439}
  VMEXIT_NPF             108072  0.71s  3.14% 15702 cyc { 6362| 6865|37280}
Guest interrupt counts:
Emulate eip list

GOOD:
-- v3 --
  Runstates:
    running:       4 12.10s 7257234016 {18132721625|18132721625|18132721625}
       lost:      12  1.24s 248210482 {188636654|719488416|719488416}
  cpu affinity:       1 32007462122 {32007462122|32007462122|32007462122}
    [7]:       1 32007462122 {32007462122|32007462122|32007462122}
Exit reasons:
  VMEXIT_CR0_READ          4748  0.00s  0.01%  1275 cyc { 1007| 1132| 1878}
  VMEXIT_CR4_READ             6  0.00s  0.00%  1752 cyc { 1189| 1629| 2600}
  VMEXIT_CR0_WRITE         3099  0.00s  0.01%  1541 cyc { 1157| 1420| 2151}
  VMEXIT_CR4_WRITE           12  0.00s  0.00%  4105 cyc { 1885| 4380| 5515}
  VMEXIT_EXCEPTION_NM        18  0.00s  0.00%  2169 cyc { 1973| 2152| 2632}
  VMEXIT_INTR               258  0.00s  0.00%  4622 cyc { 1358| 4235| 8987}
  VMEXIT_VINTR             2552  0.00s  0.00%   971 cyc {  850|  928| 1131}
  VMEXIT_PAUSE              370  0.00s  0.00%  5758 cyc { 4381| 5688| 7933}
  VMEXIT_HLT               1505  6.14s 27.19% 9788981 cyc 
{268573|3768704|56331182}
  VMEXIT_IOIO             53835  1.97s  8.74% 87959 cyc { 4996|82423|144207}
  VMEXIT_NPF             855101  2.06s  9.13%  5787 cyc { 4903| 5328| 8572}
Guest interrupt counts:
Emulate eip list

Suravee