On 4/3/2013 5:51 AM, George Dunlap wrote: > On 03/04/13 00:48, Suravee Suthikulanit wrote: >> On 4/2/2013 12:06 PM, Suravee Suthikulpanit wrote: >>> On 4/2/2013 11:34 AM, Tim Deegan wrote: >>>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote: >>>>>>>> On 02.04.13 at 16:07, George Dunlap >>>>>>>> wrote: >>>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7 >>>>>> owner: ? >>>>>> Reference: http://marc.info/?l=xen-devel&m=135075376805215 >>>>> This is supposedly fixed with the RTC changes Tim committed the >>>>> other day. Suravee, is that correct? >>>> This is a separate problem. IIRC the AMD XP perf issue is caused >>>> by the >>>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking >>>> patches. XP doesn't have 'lazy IRQL' or support for CR8, so it >>>> takes a >>>> _lot_ of vmexits for IRQL reads and writes. >>> Is there any tools or good ways to count the number of VMexit in Xen? >>> >> Tim/Jan, >> >> I have used iperf benchmark to compare network performance (bandwidth) >> between the two versions of the hypervisor: >> 1. good: 24769:730f6ed72d70 >> 2. bad: 24770:7f79475d3de7 >> >> In the "bad" case, I am seeing that the network bandwidth has dropped >> about 13-15%. >> >> However, when I uses the xentrace utility to trace the number of VMEXIT, >> I actually see about 25% more number of VMEXIT in the good case. This >> is inconsistent with the statement that Tim mentioned above. > > I was going to say, what I remember from my little bit of > investigation back in November, was that it had all the earmarks of > micro-architectural "drag", which happens when the TLB or the caches > can't be effective. > > Suvaree, if you look at xenalyze, a microarchitectural "drag" looks like: > * fewer VMEXITs, but > * time for each vmexit takes longer > > If you post the results of "xenalyze --svm-mode -s" for both traces, I > can tell you what I see. > > -George > Here's another version of the outputs from xenalyze with only VMEXIT. In this case, I pin all the VCPUs (4) and pin my application process to VCPU 3. NOTE: This measurement is without the RTC bug. BAD: -- v3 -- Runstates: running: 1 4.51s 10815429411 {10815429411|10815429411|10815429411} cpu affinity: 1 10816540697 {10816540697|10816540697|10816540697} [7]: 1 10816540697 {10816540697|10816540697|10816540697} Exit reasons: VMEXIT_CR0_READ 633 0.00s 0.00% 1503 cyc { 1092| 1299| 2647} VMEXIT_CR4_READ 3 0.00s 0.00% 1831 cyc { 1309| 1659| 2526} VMEXIT_CR0_WRITE 305 0.00s 0.00% 1660 cyc { 1158| 1461| 2507} VMEXIT_CR4_WRITE 6 0.00s 0.00% 19771 cyc { 1738| 5031|79600} VMEXIT_EXCEPTION_NM 1 0.00s 0.00% 2272 cyc { 2272| 2272| 2272} VMEXIT_INTR 28 0.00s 0.00% 3374 cyc { 1225| 3770| 6095} VMEXIT_VINTR 388 0.00s 0.00% 1023 cyc { 819| 901| 1744} VMEXIT_PAUSE 33 0.00s 0.00% 7476 cyc { 4881| 6298|18941} VMEXIT_HLT 388 3.35s 14.84% 20701800 cyc {169589|3848166|55770601} VMEXIT_IOIO 5581 0.19s 0.85% 82514 cyc { 4250|81909|146439} VMEXIT_NPF 108072 0.71s 3.14% 15702 cyc { 6362| 6865|37280} Guest interrupt counts: Emulate eip list GOOD: -- v3 -- Runstates: running: 4 12.10s 7257234016 {18132721625|18132721625|18132721625} lost: 12 1.24s 248210482 {188636654|719488416|719488416} cpu affinity: 1 32007462122 {32007462122|32007462122|32007462122} [7]: 1 32007462122 {32007462122|32007462122|32007462122} Exit reasons: VMEXIT_CR0_READ 4748 0.00s 0.01% 1275 cyc { 1007| 1132| 1878} VMEXIT_CR4_READ 6 0.00s 0.00% 1752 cyc { 1189| 1629| 2600} VMEXIT_CR0_WRITE 3099 0.00s 0.01% 1541 cyc { 1157| 1420| 2151} VMEXIT_CR4_WRITE 12 0.00s 0.00% 4105 cyc { 1885| 4380| 5515} VMEXIT_EXCEPTION_NM 18 0.00s 0.00% 2169 cyc { 1973| 2152| 2632} VMEXIT_INTR 258 0.00s 0.00% 4622 cyc { 1358| 4235| 8987} VMEXIT_VINTR 2552 0.00s 0.00% 971 cyc { 850| 928| 1131} VMEXIT_PAUSE 370 0.00s 0.00% 5758 cyc { 4381| 5688| 7933} VMEXIT_HLT 1505 6.14s 27.19% 9788981 cyc {268573|3768704|56331182} VMEXIT_IOIO 53835 1.97s 8.74% 87959 cyc { 4996|82423|144207} VMEXIT_NPF 855101 2.06s 9.13% 5787 cyc { 4903| 5328| 8572} Guest interrupt counts: Emulate eip list Suravee