Xen 4.3 development update

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* Xen 4.3 development update
@ 2013-04-02 14:07 George Dunlap
  2013-04-02 15:42 ` Jan Beulich
                   ` (4 more replies)
  0 siblings, 5 replies; 53+ messages in thread
From: George Dunlap @ 2013-04-02 14:07 UTC (permalink / raw)
  To: xen-devel@lists.xen.org

This information will be mirrored on the Xen 4.3 Roadmap wiki page:
 http://wiki.xen.org/wiki/Xen_Roadmap/4.3

A couple of notes:

- I have moved the "Code freezing point" to 15 April, since one of the
toolstack maintainers (Ian Campbell) is away until the 8th.

- As we focus on getting a release for the 4.3 codebase, I have
removed items from the list that are either "not for 4.3" or are
purely external (e.g., Linux kernel or libvirt).

- Please start suggesting bug reports to put on this list.

= Timeline =

We are planning on a 9-month release cycle.  Based on that, below are
our estimated dates:
* Feature freeze: 25 March 2013
* Code freezing point: 15 April 2013
* First RC: 6 May 2013
* Release: 17 June 2013

The RCs and release will of course depend on stability and bugs, and
will therefore be fairly unpredictable.  Each new feature will be
considered on a case-by-case basis; but the general rule will be as
follows:

* Between feature freeze and code freeze, only features which have had
a v1 posted before the feature freeze, or are on this list, will be
considered for inclusion.

* Between the "code freezing point" and the first RC, any new code
will need to be justified, and it will become progressively more
difficult to get non-bugfix patches accepted.  Critera will include
the size of the patch, the importance of the codepath, whether it's
new functionality being added or existing functionality being changed,
and so on.

Last updated: 2 January 2013

= Feature tracking =

Below is a list of features we're tracking for this release. Please
respond to this mail with any updates to the status.

There are a number of items whose owners are marked as "?".  If you
are working on this, or know who is working on it, please respond and
let me know.  Alternately, if you would *like* to work on it, please
let me know as well.

And if there is something you're working on you'd like tracked, please
respond, and I will add it to the list.

NB: Several of the items on this list are from external projects:
linux, qemu, and libvirt.  These are not part of the Xen tree, but are
directly related to our users' experience (e.g., work in Linux or
qemu) or to integration with other important projects (e.g., libvirt
bindings).  Since all of these are part of the Xen community work, and
comes from the same pool of labor, it makes sense to track the
progress here, even though they won't explicitly be released as part
of 4.3.

Meanings of prognoses:
- Excellent: It would be very unlikely for this not to be finished in time.
- Good: Everything is on track, and is likely to make it.
- Fair: A pretty good chance of making it, but not as certain
- Poor: Likely not to make it unless intervention is made
- Not for 4.3: Self-explanatory

== Completed ==

* Serial console improvements
  -EHCI debug port

* Default to QEMU upstream (partial)
 - pci pass-thru (external)
 - enable dirtybit tracking during migration (external)
 - xl cd-{insert,eject} (external)

* CPUID-based idle (don't rely on ACPI info f/ dom0)

* Persistent grants for blk (external)
 - Linux
 - qemu

* Allow XSM to override IS_PRIV checks in the hypervisor

* Scalability: 16TiB of RAM

* xl QXL Spice support

== Bugs ==

* xl, compat mode, and older kernels
  owner: ?
  Many older 32-bit PV kernels that can run on a 64-bit hypervisor with
  xend do not work when started with xl.  The following work-around seems to
  work:
    xl create -p lightning.cfg
    xenstore-write /local/domain/$(xl domid
lightning)/device/vbd/51713/protocol x86_32-abi
    xl unpause lightning
  This node is normally written by the guest kernel, but for older kernels
  seems not to be.  xend must have a work-around; port this work-around to xl.

* AMD NPT performance regression after c/s 24770:7f79475d3de7
  owner: ?
  Reference: http://marc.info/?l=xen-devel&m=135075376805215

* qemu-upstream: cd-insert and cd-eject not working
  http://marc.info/?l=xen-devel&m=135850249808960

* Install into /usr/local by default
  owner: Ian Campbell

== Not yet complete ==

* PVH mode (w/ Linux)
  owner: mukesh@oracle
  status (Linux): 3rd draft patches posted.
  status (Xen): RFC submitted
  prognosis: Tech preview only

* Event channel scalability
  owner: wei@citrix or david@citrix
  status: RFC v5 submitted
  prognosis: Deciding whether to shoot for 3-level (4.3) or FIFO (4.4)
  Increase limit on event channels (currently 1024 for 32-bit guests,
  4096 for 64-bit guests)

* ARM v7 server port
  owner: ijc@citrix
  prognosis: Excellent
  status: SMP support missing.

* ARM v8 server port (tech preview)
  owner: ijc@citrix
  status: ?
  prognosis: Tech preview only

* NUMA scheduler affinity
  critical
  owner: dario@citrix
  status: Patches posted
  prognosis: Excellent

* NUMA Memory migration
  owner: dario@citrix
  status: in progress
  prognosis: Fair

* Default to QEMU upstream
 > Add "intel-hda" to xmexample file, since it works with 64-bit Win7/8
 - qemu-based stubdom (Linux or BSD libc)
   owner: anthony@citrix
   status: in progress
   prognosis: ?
   qemu-upstream needs a more fully-featured libc than exists in
   mini-os.  Either work on a minimalist linux-based stubdom with
   glibc, or port one of the BSD libcs to minios.

* Multi-vector PCI MSI (support at least for Dom0)
  owner: jan@suse
  status: Draft hypervisor side done, linux side in progress.
  prognosis: Fair

* vTPM updates
  owner: Matthew Fioravante @ Johns Hopkins
  status: some patches submitted, more in progress
  prognosis: Good
  - Allow all vTPM components to run in stub domains for increased security
  - Update vtpm to 0.7.4
  - Remove dom0-based vtpmd

* V4V: Inter-domain communication
  owner (Xen): dominic.curran@citrix.com
  status (Xen): patches submitted
  prognosis: Fair
  owner (Linux driver):  stefano.panella@citrix
  status (Linux driver): in progress

* xl PVUSB pass-through for PV guests
* xl PVUSB pass-through for HVM guests
  owner: George
  status: ?
  prognosis: Poor
  xm/xend supports PVUSB pass-through to guests with PVUSB drivers
(both PV and HVM guests).
  - port the xm/xend functionality to xl.
  - this PVUSB feature does not require support or emulation from Qemu.
  - upstream the Linux frontend/backend drivers. Current
work-in-progress versions are in Konrad's git tree.
  - James Harper's GPLPV drivers for Windows include PVUSB frontend drivers.

* xl USB pass-through for HVM guests using Qemu USB emulation
  owner: George
  status: Config file pass-through submitted.
  prognosis: Fair
  xm/xend with qemu-traditional supports USB passthrough to HVM guests
using the Qemu emulated USB controller.
  The HVM guest does not need any special drivers for this feature.
  So basicly the qemu cmdline needs to have:
     -usb -usbdevice host:xxxx:yyyy
  - port the xm/xend functionality to xl.
  - make sure USB passthrough with xl works with both qemu-traditional
and qemu-upstream.

* xl: passing more defaults in configuration in xl.conf
  owner: ?
  There are a number of options for which it might be useful to pass a
  default in xl.conf.  For example, if we could have a default
  "backend" parameter for vifs, then it would be easy to switch back
  and forth between a backend in a driver domain and a backend in dom0.

* Remove hardcoded mobprobe's in xencommons
  owner: ?
  status: ?
  prognosis: Poor.

* openvswitch toostack integration
  owner: ?
  prognosis: Poor
  status: Sample script posted by Bastian ("[RFC] openvswitch support script")
  - See if we can engage Bastian to do a more fully-featured script?

* Rationalized backend scripts
  owner: roger@citrix
  status: libxl hotplug sumbmitted.  Protocol still needs to be finalized.
  prognosis: Good

* Scripts for driver domains (depends on backend scripts)
  owner: roger@citrix
  status:
  prognosis: Fair

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-02 14:07 Xen 4.3 development update George Dunlap
@ 2013-04-02 15:42 ` Jan Beulich
  2013-04-02 15:45   ` Suravee Suthikulanit
  2013-04-02 16:34   ` Tim Deegan
  2013-04-09  2:03 ` Xen 4.3 development update Dario Faggioli
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 53+ messages in thread
From: Jan Beulich @ 2013-04-02 15:42 UTC (permalink / raw)
  To: George Dunlap; +Cc: suravee.suthikulpanit, xen-devel@lists.xen.org

>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>   owner: ?
>   Reference: http://marc.info/?l=xen-devel&m=135075376805215 

This is supposedly fixed with the RTC changes Tim committed the
other day. Suravee, is that correct?

> * Remove hardcoded mobprobe's in xencommons
>   owner: ?
>   status: ?
>   prognosis: Poor.

So before 4.2 got released it was promised to get dealt with, and
now it's "poor" with no owner and status? Disappointing.

Jan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-02 15:42 ` Jan Beulich
@ 2013-04-02 15:45   ` Suravee Suthikulanit
  2013-04-02 15:51     ` George Dunlap
  2013-04-02 16:34   ` Tim Deegan
  1 sibling, 1 reply; 53+ messages in thread
From: Suravee Suthikulanit @ 2013-04-02 15:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel@lists.xen.org

On 4/2/2013 10:42 AM, Jan Beulich wrote:
>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>    owner: ?
>>    Reference: http://marc.info/?l=xen-devel&m=135075376805215
> This is supposedly fixed with the RTC changes Tim committed the
> other day. Suravee, is that correct?
Let me verify this again with the new changes.  I was looking at the 
clock drifting issue on the 64-bit XP which was running fine.  Let me 
check 32-bit and get back to you today.

>> * Remove hardcoded mobprobe's in xencommons
>>    owner: ?
>>    status: ?
>>    prognosis: Poor.
> So before 4.2 got released it was promised to get dealt with, and
> now it's "poor" with no owner and status? Disappointing.
I was not aware of this issue.  Could you give me some context of this?

Suravee
>
> Jan
>
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-02 15:45   ` Suravee Suthikulanit
@ 2013-04-02 15:51     ` George Dunlap
  0 siblings, 0 replies; 53+ messages in thread
From: George Dunlap @ 2013-04-02 15:51 UTC (permalink / raw)
  To: Suravee Suthikulanit; +Cc: Jan Beulich, xen-devel@lists.xen.org

On 02/04/13 16:45, Suravee Suthikulanit wrote:
> On 4/2/2013 10:42 AM, Jan Beulich wrote:
>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>     owner: ?
>>>     Reference: http://marc.info/?l=xen-devel&m=135075376805215
>> This is supposedly fixed with the RTC changes Tim committed the
>> other day. Suravee, is that correct?
> Let me verify this again with the new changes.  I was looking at the
> clock drifting issue on the 64-bit XP which was running fine.  Let me
> check 32-bit and get back to you today.
>
>>> * Remove hardcoded mobprobe's in xencommons
>>>     owner: ?
>>>     status: ?
>>>     prognosis: Poor.
>> So before 4.2 got released it was promised to get dealt with, and
>> now it's "poor" with no owner and status? Disappointing.
> I was not aware of this issue.  Could you give me some context of this?

Just to be clear, this doesn't have anything to do with AMD -- it's a 
separate subject Jan is complaining about. :-)

Jan: "Poor" just means that it needs intervention; namely, we need 
someone to step up and volunteer to do it.  I'll ask for volunteers 
sometime this week.

  -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-02 15:42 ` Jan Beulich
  2013-04-02 15:45   ` Suravee Suthikulanit
@ 2013-04-02 16:34   ` Tim Deegan
  2013-04-02 16:47     ` Suravee Suthikulpanit
                       ` (2 more replies)
  1 sibling, 3 replies; 53+ messages in thread
From: Tim Deegan @ 2013-04-02 16:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Andres Lagar-Cavilla, suravee.suthikulpanit,
	xen-devel@lists.xen.org

At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
> >>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
> > * AMD NPT performance regression after c/s 24770:7f79475d3de7
> >   owner: ?
> >   Reference: http://marc.info/?l=xen-devel&m=135075376805215 
> 
> This is supposedly fixed with the RTC changes Tim committed the
> other day. Suravee, is that correct?

This is a separate problem.  IIRC the AMD XP perf issue is caused by the
emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
_lot_ of vmexits for IRQL reads and writes.

Tim.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-02 16:34   ` Tim Deegan
@ 2013-04-02 16:47     ` Suravee Suthikulpanit
  2013-04-04 10:57       ` Tim Deegan
  2013-04-02 17:06     ` Suravee Suthikulpanit
  2013-04-03  7:27     ` Xen 4.3 development update Jan Beulich
  2 siblings, 1 reply; 53+ messages in thread
From: Suravee Suthikulpanit @ 2013-04-02 16:47 UTC (permalink / raw)
  To: Tim Deegan
  Cc: George Dunlap, Andres Lagar-Cavilla, Jan Beulich,
	xen-devel@lists.xen.org

On 4/2/2013 11:34 AM, Tim Deegan wrote:
> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>    owner: ?
>>>    Reference: http://marc.info/?l=xen-devel&m=135075376805215
>> This is supposedly fixed with the RTC changes Tim committed the
>> other day. Suravee, is that correct?
> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
> _lot_ of vmexits for IRQL reads and writes.
Is this only for 32-bit XP? or also 64-bit ?

Suravee
> Tim.
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-02 16:34   ` Tim Deegan
  2013-04-02 16:47     ` Suravee Suthikulpanit
@ 2013-04-02 17:06     ` Suravee Suthikulpanit
  2013-04-02 23:48       ` Suravee Suthikulanit
  2013-04-03  8:37       ` Christoph Egger
  2013-04-03  7:27     ` Xen 4.3 development update Jan Beulich
  2 siblings, 2 replies; 53+ messages in thread
From: Suravee Suthikulpanit @ 2013-04-02 17:06 UTC (permalink / raw)
  To: Tim Deegan
  Cc: George Dunlap, Andres Lagar-Cavilla, Jan Beulich,
	xen-devel@lists.xen.org

On 4/2/2013 11:34 AM, Tim Deegan wrote:
> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>    owner: ?
>>>    Reference: http://marc.info/?l=xen-devel&m=135075376805215
>> This is supposedly fixed with the RTC changes Tim committed the
>> other day. Suravee, is that correct?
> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
> _lot_ of vmexits for IRQL reads and writes.
Is there any tools or good ways to count the number of VMexit in Xen?

Thanks,
Suravee
>
> Tim.
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-02 17:06     ` Suravee Suthikulpanit
@ 2013-04-02 23:48       ` Suravee Suthikulanit
  2013-04-03 10:51         ` George Dunlap
  2013-04-03  8:37       ` Christoph Egger
  1 sibling, 1 reply; 53+ messages in thread
From: Suravee Suthikulanit @ 2013-04-02 23:48 UTC (permalink / raw)
  To: Tim Deegan
  Cc: George Dunlap, Andres Lagar-Cavilla, Jan Beulich,
	xen-devel@lists.xen.org

On 4/2/2013 12:06 PM, Suravee Suthikulpanit wrote:
> On 4/2/2013 11:34 AM, Tim Deegan wrote:
>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com> 
>>>>>> wrote:
>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>>    owner: ?
>>>>    Reference: http://marc.info/?l=xen-devel&m=135075376805215
>>> This is supposedly fixed with the RTC changes Tim committed the
>>> other day. Suravee, is that correct?
>> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
>> _lot_ of vmexits for IRQL reads and writes.
> Is there any tools or good ways to count the number of VMexit in Xen?
>
Tim/Jan,

I have used iperf benchmark to compare network performance (bandwidth) 
between the two versions of the hypervisor:
1. good: 24769:730f6ed72d70
2. bad: 24770:7f79475d3de7

In the "bad" case, I am seeing that the network bandwidth has dropped 
about 13-15%.

However, when I uses the xentrace utility to trace the number of VMEXIT, 
I actually see about 25% more number of VMEXIT in the good case.  This 
is inconsistent with the statement that Tim mentioned above.

Suravee

> Thanks,
> Suravee
>>
>> Tim.
>>
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-02 16:34   ` Tim Deegan
  2013-04-02 16:47     ` Suravee Suthikulpanit
  2013-04-02 17:06     ` Suravee Suthikulpanit
@ 2013-04-03  7:27     ` Jan Beulich
  2013-04-03 10:53       ` George Dunlap
  2 siblings, 1 reply; 53+ messages in thread
From: Jan Beulich @ 2013-04-03  7:27 UTC (permalink / raw)
  To: Tim Deegan
  Cc: George Dunlap, Andres Lagar-Cavilla, suravee.suthikulpanit,
	xen-devel@lists.xen.org

>>> On 02.04.13 at 18:34, Tim Deegan <tim@xen.org> wrote:
> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>> >>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>> > * AMD NPT performance regression after c/s 24770:7f79475d3de7
>> >   owner: ?
>> >   Reference: http://marc.info/?l=xen-devel&m=135075376805215 
>> 
>> This is supposedly fixed with the RTC changes Tim committed the
>> other day. Suravee, is that correct?
> 
> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
> _lot_ of vmexits for IRQL reads and writes.

Ah, okay, sorry for mixing this up. But how is this a regression
then?

Jan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-02 17:06     ` Suravee Suthikulpanit
  2013-04-02 23:48       ` Suravee Suthikulanit
@ 2013-04-03  8:37       ` Christoph Egger
  2013-04-03 10:49         ` George Dunlap
  1 sibling, 1 reply; 53+ messages in thread
From: Christoph Egger @ 2013-04-03  8:37 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: George Dunlap, Tim Deegan, Andres Lagar-Cavilla, Jan Beulich,
	xen-devel@lists.xen.org

On 02.04.13 19:06, Suravee Suthikulpanit wrote:
> On 4/2/2013 11:34 AM, Tim Deegan wrote:
>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com>
>>>>>> wrote:
>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>>    owner: ?
>>>>    Reference: http://marc.info/?l=xen-devel&m=135075376805215
>>> This is supposedly fixed with the RTC changes Tim committed the
>>> other day. Suravee, is that correct?
>> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
>> _lot_ of vmexits for IRQL reads and writes.
> Is there any tools or good ways to count the number of VMexit in Xen?

xentrace -e 0x8f000 > xentrace.out
[Hit ^C to abort]
xentrace_format formats < xentrace.out > xentrace.dump

You need to manually install 'formats' from tools/xentrace/formats
to a proper place.

Christoph

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-03  8:37       ` Christoph Egger
@ 2013-04-03 10:49         ` George Dunlap
  2013-04-04 12:19           ` xenalyze (was: Re: Xen 4.3 development update) Christoph Egger
  0 siblings, 1 reply; 53+ messages in thread
From: George Dunlap @ 2013-04-03 10:49 UTC (permalink / raw)
  To: Christoph Egger
  Cc: Jan Beulich, Tim (Xen.org), Andres Lagar-Cavilla,
	Suravee Suthikulpanit, xen-devel@lists.xen.org

On 03/04/13 09:37, Christoph Egger wrote:
> On 02.04.13 19:06, Suravee Suthikulpanit wrote:
>> On 4/2/2013 11:34 AM, Tim Deegan wrote:
>>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com>
>>>>>>> wrote:
>>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>>>     owner: ?
>>>>>     Reference: http://marc.info/?l=xen-devel&m=135075376805215
>>>> This is supposedly fixed with the RTC changes Tim committed the
>>>> other day. Suravee, is that correct?
>>> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
>>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
>>> _lot_ of vmexits for IRQL reads and writes.
>> Is there any tools or good ways to count the number of VMexit in Xen?
> xentrace -e 0x8f000 > xentrace.out
> [Hit ^C to abort]
> xentrace_format formats < xentrace.out > xentrace.dump
>
> You need to manually install 'formats' from tools/xentrace/formats
> to a proper place.

Even better is to use xenalyze "summary" mode:

http://xenbits.xen.org/ext/xenalyze

Build, then run:

xenalyze --svm-mode -s [trace file] > summary

  -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-02 23:48       ` Suravee Suthikulanit
@ 2013-04-03 10:51         ` George Dunlap
  2013-04-04 15:29           ` Suravee Suthikulanit
  2013-04-04 17:14           ` Suravee Suthikulanit
  0 siblings, 2 replies; 53+ messages in thread
From: George Dunlap @ 2013-04-03 10:51 UTC (permalink / raw)
  To: Suravee Suthikulanit
  Cc: Tim (Xen.org), Andres Lagar-Cavilla, Jan Beulich,
	xen-devel@lists.xen.org

On 03/04/13 00:48, Suravee Suthikulanit wrote:
> On 4/2/2013 12:06 PM, Suravee Suthikulpanit wrote:
>> On 4/2/2013 11:34 AM, Tim Deegan wrote:
>>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com>
>>>>>>> wrote:
>>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>>>     owner: ?
>>>>>     Reference: http://marc.info/?l=xen-devel&m=135075376805215
>>>> This is supposedly fixed with the RTC changes Tim committed the
>>>> other day. Suravee, is that correct?
>>> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
>>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
>>> _lot_ of vmexits for IRQL reads and writes.
>> Is there any tools or good ways to count the number of VMexit in Xen?
>>
> Tim/Jan,
>
> I have used iperf benchmark to compare network performance (bandwidth)
> between the two versions of the hypervisor:
> 1. good: 24769:730f6ed72d70
> 2. bad: 24770:7f79475d3de7
>
> In the "bad" case, I am seeing that the network bandwidth has dropped
> about 13-15%.
>
> However, when I uses the xentrace utility to trace the number of VMEXIT,
> I actually see about 25% more number of VMEXIT in the good case.  This
> is inconsistent with the statement that Tim mentioned above.

I was going to say, what I remember from my little bit of investigation 
back in November, was that it had all the earmarks of 
micro-architectural "drag", which happens when the TLB or the caches 
can't be effective.

Suvaree, if you look at xenalyze, a microarchitectural "drag" looks like:
* fewer VMEXITs, but
* time for each vmexit takes longer

If you post the results of "xenalyze --svm-mode -s" for both traces, I 
can tell you what I see.

  -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-03  7:27     ` Xen 4.3 development update Jan Beulich
@ 2013-04-03 10:53       ` George Dunlap
  2013-04-03 15:34         ` Andres Lagar-Cavilla
  0 siblings, 1 reply; 53+ messages in thread
From: George Dunlap @ 2013-04-03 10:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim (Xen.org), Andres Lagar-Cavilla,
	suravee.suthikulpanit@amd.com, xen-devel@lists.xen.org

On 03/04/13 08:27, Jan Beulich wrote:
>>>> On 02.04.13 at 18:34, Tim Deegan <tim@xen.org> wrote:
>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>>    owner: ?
>>>>    Reference: http://marc.info/?l=xen-devel&m=135075376805215
>>> This is supposedly fixed with the RTC changes Tim committed the
>>> other day. Suravee, is that correct?
>> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
>> _lot_ of vmexits for IRQL reads and writes.
> Ah, okay, sorry for mixing this up. But how is this a regression
> then?

My sense, when I looked at this back whenever that there was much more 
to this.  The XP IRQL updating is a problem, but it's made terribly 
worse by the changset in question.  It seemed to me like the kind of 
thing that would be caused by TLB or caches suddenly becoming much less 
effective.

  -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-03 10:53       ` George Dunlap
@ 2013-04-03 15:34         ` Andres Lagar-Cavilla
  2013-04-04 15:23           ` Tim Deegan
  2013-04-25 13:51           ` Xen 4.3 development update / winxp AMD performance regression Pasi Kärkkäinen
  0 siblings, 2 replies; 53+ messages in thread
From: Andres Lagar-Cavilla @ 2013-04-03 15:34 UTC (permalink / raw)
  To: George Dunlap
  Cc: Tim (Xen.org), suravee.suthikulpanit, Jan Beulich, xen-devel

On Apr 3, 2013, at 6:53 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:

> On 03/04/13 08:27, Jan Beulich wrote:
>>>>> On 02.04.13 at 18:34, Tim Deegan <tim@xen.org> wrote:
>>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>>>   owner: ?
>>>>>   Reference: http://marc.info/?l=xen-devel&m=135075376805215
>>>> This is supposedly fixed with the RTC changes Tim committed the
>>>> other day. Suravee, is that correct?
>>> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
>>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
>>> _lot_ of vmexits for IRQL reads and writes.
>> Ah, okay, sorry for mixing this up. But how is this a regression
>> then?
> 
> My sense, when I looked at this back whenever that there was much more to this.  The XP IRQL updating is a problem, but it's made terribly worse by the changset in question.  It seemed to me like the kind of thing that would be caused by TLB or caches suddenly becoming much less effective.

The commit in question does not add p2m mutations, so it doesn't nuke the NPT/EPT TLBs. It introduces a spin lock in the hot path and that is the problem. Later in the 4.2 cycle we changed the common case to use an rwlock. Does the same perf degradation occur with tip of 4.2?

Andres
> 
> -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-02 16:47     ` Suravee Suthikulpanit
@ 2013-04-04 10:57       ` Tim Deegan
  0 siblings, 0 replies; 53+ messages in thread
From: Tim Deegan @ 2013-04-04 10:57 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: George Dunlap, Andres Lagar-Cavilla, Jan Beulich,
	xen-devel@lists.xen.org

At 11:47 -0500 on 02 Apr (1364903259), Suravee Suthikulpanit wrote:
> On 4/2/2013 11:34 AM, Tim Deegan wrote:
> >At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
> >>>>>On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com> 
> >>>>>wrote:
> >>>* AMD NPT performance regression after c/s 24770:7f79475d3de7
> >>>   owner: ?
> >>>   Reference: http://marc.info/?l=xen-devel&m=135075376805215
> >>This is supposedly fixed with the RTC changes Tim committed the
> >>other day. Suravee, is that correct?
> >This is a separate problem.  IIRC the AMD XP perf issue is caused by the
> >emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
> >patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
> >_lot_ of vmexits for IRQL reads and writes.
> Is this only for 32-bit XP? or also 64-bit ?

I don't have a 64-bit XP image handy to test, but a bit of googling
suggests that 64-bit XP has 'lazy IRQL', so this TPR problem should only
affect 32-bit XP (and earlier Windowses).

Tim.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* xenalyze (was: Re:  Xen 4.3 development update)
  2013-04-03 10:49         ` George Dunlap
@ 2013-04-04 12:19           ` Christoph Egger
  2013-04-04 12:51             ` xenalyze George Dunlap
  0 siblings, 1 reply; 53+ messages in thread
From: Christoph Egger @ 2013-04-04 12:19 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel@lists.xen.org

On 03.04.13 12:49, George Dunlap wrote:
> On 03/04/13 09:37, Christoph Egger wrote:
>> On 02.04.13 19:06, Suravee Suthikulpanit wrote:
>>> Is there any tools or good ways to count the number of VMexit in Xen?
>> xentrace -e 0x8f000 > xentrace.out
>> [Hit ^C to abort]
>> xentrace_format formats < xentrace.out > xentrace.dump
>>
>> You need to manually install 'formats' from tools/xentrace/formats
>> to a proper place.
>
> Even better is to use xenalyze "summary" mode:
>
> http://xenbits.xen.org/ext/xenalyze
>
> Build, then run:
>
> xenalyze --svm-mode -s [trace file] > summary

It does not build for me. argp.h is not portable.

Christoph

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: xenalyze
  2013-04-04 12:19           ` xenalyze (was: Re: Xen 4.3 development update) Christoph Egger
@ 2013-04-04 12:51             ` George Dunlap
  0 siblings, 0 replies; 53+ messages in thread
From: George Dunlap @ 2013-04-04 12:51 UTC (permalink / raw)
  To: Christoph Egger; +Cc: xen-devel@lists.xen.org

On 04/04/2013 01:19 PM, Christoph Egger wrote:
> On 03.04.13 12:49, George Dunlap wrote:
>> On 03/04/13 09:37, Christoph Egger wrote:
>>> On 02.04.13 19:06, Suravee Suthikulpanit wrote:
>>>> Is there any tools or good ways to count the number of VMexit in Xen?
>>> xentrace -e 0x8f000 > xentrace.out
>>> [Hit ^C to abort]
>>> xentrace_format formats < xentrace.out > xentrace.dump
>>>
>>> You need to manually install 'formats' from tools/xentrace/formats
>>> to a proper place.
>>
>> Even better is to use xenalyze "summary" mode:
>>
>> http://xenbits.xen.org/ext/xenalyze
>>
>> Build, then run:
>>
>> xenalyze --svm-mode -s [trace file] > summary
>
> It does not build for me. argp.h is not portable.\


It's a shame NetBSD (which I think is what you use) hasn't implemented 
it yet -- it's a lot nicer interface to use.

You could always run it in a Debian VM if you were really keen. :-)

  -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-03 15:34         ` Andres Lagar-Cavilla
@ 2013-04-04 15:23           ` Tim Deegan
  2013-04-04 17:05             ` Tim Deegan
  2013-04-25 15:20             ` George Dunlap
  2013-04-25 13:51           ` Xen 4.3 development update / winxp AMD performance regression Pasi Kärkkäinen
  1 sibling, 2 replies; 53+ messages in thread
From: Tim Deegan @ 2013-04-04 15:23 UTC (permalink / raw)
  To: Andres Lagar-Cavilla
  Cc: George Dunlap, Jan Beulich, suravee.suthikulpanit, xen-devel

At 11:34 -0400 on 03 Apr (1364988853), Andres Lagar-Cavilla wrote:
> On Apr 3, 2013, at 6:53 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> 
> > On 03/04/13 08:27, Jan Beulich wrote:
> >>>>> On 02.04.13 at 18:34, Tim Deegan <tim@xen.org> wrote:
> >>> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
> >>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
> >>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
> >>> _lot_ of vmexits for IRQL reads and writes.
> >> Ah, okay, sorry for mixing this up. But how is this a regression
> >> then?
> > 
> > My sense, when I looked at this back whenever that there was much more to this.  The XP IRQL updating is a problem, but it's made terribly worse by the changset in question.  It seemed to me like the kind of thing that would be caused by TLB or caches suddenly becoming much less effective.
> 
> The commit in question does not add p2m mutations, so it doesn't nuke the NPT/EPT TLBs. It introduces a spin lock in the hot path and that is the problem. Later in the 4.2 cycle we changed the common case to use an rwlock. Does the same perf degradation occur with tip of 4.2?
> 

Yes, 4.2 is definitely slower.  A compile test on a 4-vcpu VM that takes
about 12 minutes before this locking change takes more than 20 minutes
on the current tip of xen-unstable (I gave up at 22 minutes and rebooted
to test something else).

Tim.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-03 10:51         ` George Dunlap
@ 2013-04-04 15:29           ` Suravee Suthikulanit
  2013-04-04 17:14           ` Suravee Suthikulanit
  1 sibling, 0 replies; 53+ messages in thread
From: Suravee Suthikulanit @ 2013-04-04 15:29 UTC (permalink / raw)
  To: George Dunlap
  Cc: Tim (Xen.org), Andres Lagar-Cavilla, Jan Beulich,
	xen-devel@lists.xen.org

[-- Attachment #1: Type: text/plain, Size: 2038 bytes --]

On 4/3/2013 5:51 AM, George Dunlap wrote:
> On 03/04/13 00:48, Suravee Suthikulanit wrote:
>> On 4/2/2013 12:06 PM, Suravee Suthikulpanit wrote:
>>> On 4/2/2013 11:34 AM, Tim Deegan wrote:
>>>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com>
>>>>>>>> wrote:
>>>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>>>>     owner: ?
>>>>>>     Reference: http://marc.info/?l=xen-devel&m=135075376805215
>>>>> This is supposedly fixed with the RTC changes Tim committed the
>>>>> other day. Suravee, is that correct?
>>>> This is a separate problem.  IIRC the AMD XP perf issue is caused 
>>>> by the
>>>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>>>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it 
>>>> takes a
>>>> _lot_ of vmexits for IRQL reads and writes.
>>> Is there any tools or good ways to count the number of VMexit in Xen?
>>>
>> Tim/Jan,
>>
>> I have used iperf benchmark to compare network performance (bandwidth)
>> between the two versions of the hypervisor:
>> 1. good: 24769:730f6ed72d70
>> 2. bad: 24770:7f79475d3de7
>>
>> In the "bad" case, I am seeing that the network bandwidth has dropped
>> about 13-15%.
>>
>> However, when I uses the xentrace utility to trace the number of VMEXIT,
>> I actually see about 25% more number of VMEXIT in the good case.  This
>> is inconsistent with the statement that Tim mentioned above.
>
> I was going to say, what I remember from my little bit of 
> investigation back in November, was that it had all the earmarks of 
> micro-architectural "drag", which happens when the TLB or the caches 
> can't be effective.
>
> Suvaree, if you look at xenalyze, a microarchitectural "drag" looks like:
> * fewer VMEXITs, but
> * time for each vmexit takes longer
>
> If you post the results of "xenalyze --svm-mode -s" for both traces, I 
> can tell you what I see.
>
>  -George
>
George,

Here is the two set of data from xenalyze.

Suravee

[-- Attachment #2: xp32.xenalyze.bad --]
[-- Type: text/plain, Size: 30741 bytes --]

Total time: 22.59 seconds (using cpu speed 2.40 GHz)
--- Log volume summary ---
 - cpu 0 -
 gen   :       4432
 sched :   20018848
 +-verbose:   13298176
 hvm   :       8856
 +-vmentry:       1560
 +-vmexit :       2600
 +-handler:       4696
 mem   :      59920
 pv    :   53597144
 hw    :     962064
 - cpu 1 -
 gen   :        120
 sched :    1039488
 +-verbose:     720992
 hvm   :     220868
 +-vmentry:      33780
 +-vmexit :      56300
 +-handler:     130788
 mem   :       8288
 pv    :    1926432
 - cpu 2 -
 gen   :       1148
 sched :    5914952
 +-verbose:    4055272
 hvm   :   11612680
 +-vmentry:    1730052
 +-vmexit :    2883440
 +-handler:    6999188
 mem   :       1764
 - cpu 3 -
 gen   :       2188
 sched :    7445484
 +-verbose:    5117964
 hvm   :   30056844
 +-vmentry:    4953012
 +-vmexit :    8255020
 +-handler:   16848812
 mem   :      21420
 - cpu 4 -
 gen   :       2212
 sched :    6582972
 +-verbose:    4521036
 hvm   :   25864568
 +-vmentry:    4260900
 +-vmexit :    7101500
 +-handler:   14502168
 - cpu 5 -
 gen   :       1588
 sched :    7603852
 +-verbose:    5208844
 hvm   :   16474932
 +-vmentry:    2492040
 +-vmexit :    4153400
 +-handler:    9829492
 mem   :      36428
 - cpu 6 -
 gen   :       1864
 sched :    6255140
 +-verbose:    4298740
 hvm   :   23009808
 +-vmentry:    3724908
 +-vmexit :    6208180
 +-handler:   13076720
 mem   :       1736
 - cpu 7 -
 gen   :       2256
 sched :    7870304
 +-verbose:    5403952
 hvm   :   28549636
 +-vmentry:    4685724
 +-vmexit :    7809540
 +-handler:   16054372
 mem   :       6860
|-- Domain 0 --|
 Runstates:
   blocked:   44148 10.21s 555125 {108479|155486694|248356929}
  partial run:   50119  7.21s 345405 {129497|1333387|2928983}
  full run:   17525  2.54s 348236 {299563|1790764|7066546}
  partial contention:   45261  0.27s  14238 { 10773| 13006| 73012}
  concurrency_hazard:    4547  0.03s  13229 { 10970| 15165|1177644}
  full_contention:     374  0.00s  11151 {  8896| 11307| 15006}
      lost:   10847  2.33s 515324 {306157|28855150|141256451}
 Grant table ops:
  Done by:
  Done for:
 Populate-on-demand:
  Populated:
  Reclaim order:
  Reclaim contexts:
-- v0 --
 Runstates:
   running:   58927  9.58s 390254 {132692|1909464|2327327}
  runnable:   47234  0.28s  14234 { 10701| 11515| 51458}
        wake:   45719  0.27s  14272 { 10737| 11058|172695}
     preempt:    1515  0.01s  13092 { 12448| 12682| 73560}
   blocked:   45719 10.36s 543610 {292043|53406338|135507023}
      lost:   11040  2.37s 515094 {135256|1060714|799578557}
 cpu affinity:       1 54132589292 {54132589292|54132589292|54132589292}
   [0]:       1 54132589292 {54132589292|54132589292|54132589292}
PV events:
  hypercall  2214975
    mmu_update                   [ 1]:    256
    fpu_taskswitch               [ 5]:  17763
    multicall                    [13]: 191143
    xen_version                  [17]:  94734
    iret                         [23]: 1344442
    vcpu_op                      [24]: 200633
    set_segment_base             [25]:  89715
    mmuext_op                    [26]:    257
    sched_op                     [29]:  59952
    evtchn_op                    [32]: 179178
    physdev_op                   [33]:  26749
    hvm_op                       [34]:  10152
    sysctl                       [35]:      1
  trap  17402
    [7] 17402
  page_fault  2
  ptwr  4
-- v1 --
 Runstates:
   running:    3532  1.02s 690145 {172003|1235132|77302995}
  runnable:    2890  0.02s  12687 { 10444| 11823| 17584}
        wake:    2841  0.02s  12723 { 10647| 13968| 17584}
     preempt:      49  0.00s  10580 {  8941|  9281| 14053}
   blocked:    2842 17.47s 14749632 {21223148|29532634|935990733}
      lost:     642  4.02s 15026971 {16870260|28840163|70200355}
 cpu affinity:       1 54208745113 {54208745113|54208745113|54208745113}
   [1]:       1 54208745113 {54208745113|54208745113|54208745113}
PV events:
  hypercall  79908
    fpu_taskswitch               [ 5]:    379
    multicall                    [13]:   6665
    xen_version                  [17]:     86
    iret                         [23]:  45995
    vcpu_op                      [24]:  10266
    set_segment_base             [25]:   3359
    mmuext_op                    [26]:    521
    sched_op                     [29]:   3533
    evtchn_op                    [32]:   8358
    physdev_op                   [33]:    441
    hvm_op                       [34]:    305
  trap  325
    [7] 325
  page_fault  9
  math state restore  4
  ptwr  14
Emulate eip list
|-- Domain 1 --|
 Runstates:
   blocked:   64149  8.66s 324023 { 64673|55640460|55690341}
  partial run:  239656  8.58s  85948 { 14012|546799|1622852}
  full run:   59698  2.50s 100612 {  7113|545084|2071684}
  partial contention:   65761  0.32s  11572 { 11184| 20532| 20490}
  concurrency_hazard:   90457  0.43s  11313 { 10801| 11004| 31283}
      lost:   15616  2.06s 317146 { 66963|808433212|808433212}
 Grant table ops:
  Done by:
  Done for:
 Populate-on-demand:
  Populated:
  Reclaim order:
  Reclaim contexts:
-- v0 --
 Runstates:
   running:   56933  3.59s 151376 {105664|2833123|70767274}
  runnable:   43718  0.22s  11814 { 10693| 11365| 31861}
        wake:   43718  0.22s  11814 { 10693| 11365| 31861}
   blocked:    2050 13.75s 16095210 {3203771|55894186|55930072}
   offline:   41668  1.46s  84365 { 73273|149414|222234}
      lost:   10864  3.53s 780658 { 74258|26082059|1950198168}
 cpu affinity:      24 2257304640 {3990327|871380860|7740249797}
   [0]:       2 4562574811 {3168555177|5956594445|5956594445}
   [1]:       4 1485672893 {103084141|2068871789|3546389382}
   [2]:       3 2603785034 {3990327|67114980|7740249797}
   [3]:       3 13154678 {2645456|7564693|29253887}
   [4]:       2 834792898 {9560777|1660025019|1660025019}
   [5]:       6 3702083782 {17333993|1725346202|14424324512}
   [6]:       1 12452271 {12452271|12452271|12452271}
   [7]:       3 2454036762 {391579116|870861738|6099669434}
Exit reasons:
 VMEXIT_CR0_READ           304  0.00s  0.00%  2423 cyc { 1440| 2035| 5267}
 VMEXIT_CR4_READ             2  0.00s  0.00%  1534 cyc { 1531| 1538| 1538}
 VMEXIT_CR0_WRITE           32  0.00s  0.00%  2699 cyc { 1483| 2375| 4886}
 VMEXIT_CR4_WRITE            4  0.00s  0.00% 38328 cyc { 2184| 7737|136518}
 VMEXIT_INTR                74  0.00s  0.00%  6720 cyc { 1832| 6576| 9629}
 VMEXIT_VINTR              211  0.00s  0.00%  1414 cyc {  853|  994| 2772}
 VMEXIT_PAUSE             2351  0.01s  0.04%  8228 cyc { 6195| 8035|12106}
 VMEXIT_HLT               2050 13.76s 60.90% 16104823 cyc {1203038|8036871|55508055}
 VMEXIT_IOIO             50151  1.79s  7.93% 85749 cyc { 3631|87460|155814}
 VMEXIT_NPF             180241  2.26s  9.99% 30038 cyc { 7344|28231|69476}
   mmio   180241  2.26s  9.99% 30038 cyc { 7344|28231|69476}
Guest interrupt counts:
  [ 61] 45
   * intr to halt      :      20  0.14s 17221100 {8208815|128026833|128026833}
  [ 65] 151
   * intr to halt      :     134  0.35s 6260008 {23965644|113526251|132234485}
  [131] 1682
   * wake to halt alone:    1138  3.09s 6510349 {5788096|7542542|10520569}
   * wake to halt any  :    1318  3.81s 6935552 {5744625|8261682|25879018}
   * intr to halt      :      13  0.13s 24216093 {7818781|126461333|126461333}
  [163] 36
   * wake to halt alone:      36  0.01s 520242 {476357|587995|825700}
   * wake to halt any  :      36  0.01s 520242 {476357|587995|825700}
  [209] 896
   * wake to halt alone:     550  0.36s 1557585 {1575209|2008866|6163613}
   * wake to halt any  :     591  0.45s 1815026 {448471|3028564|9443213}
   * intr to halt      :     254  0.65s 6177342 {2180388|11850603|125688471}
  [225] 127
   * wake to halt alone:      12  0.00s 770241 {669847|850915|950958}
   * wake to halt any  :     103  0.35s 8157820 {3423923|96580338|132593119}
   * intr to halt      :       7  0.02s 6129073 {4714664|18240492|18240492}
IO address summary:
      60:[r]       36  0.00s  0.01% 122069 cyc {109189|114767|185590}
      64:[r]       72  0.00s  0.01% 86747 cyc {58858|84768|136730}
      70:[w]     1792  0.00s  0.02%  6271 cyc { 2214| 3180|24849}
      71:[r]     1792  0.01s  0.03%  7898 cyc { 2276| 3062|23592}
    b008:[r]     4911  0.03s  0.15% 16146 cyc { 3601| 4512|78558}
    c110:[r]     1276  0.05s  0.24% 100101 cyc {69367|88643|159204}
    c110:[w]     1276  0.07s  0.32% 134423 cyc {99950|132771|197151}
    c114:[r]     1288  0.10s  0.45% 187637 cyc {69695|95165|162070}
    c114:[w]     1288  0.07s  0.30% 126890 cyc {98307|105749|192223}
    c118:[r]     1278  0.05s  0.24% 99905 cyc {69333|88859|163042}
    c118:[w]     1278  0.07s  0.32% 133778 cyc {100204|132290|195353}
    c11c:[r]     1281  0.06s  0.24% 103142 cyc {69424|93583|161794}
    c11c:[w]     1281  0.07s  0.29% 122745 cyc {98196|105572|175157}
    c120:[w]     1276  0.04s  0.18% 76922 cyc {58702|62968|128506}
    c124:[w]     1288  0.04s  0.19% 78417 cyc {58685|62768|121319}
    c128:[w]     1278  0.04s  0.18% 76669 cyc {58558|63208|127031}
    c12c:[w]     1281  0.04s  0.18% 75433 cyc {58721|62697|127453}
    c137:[r]     3975  0.14s  0.62% 84324 cyc {59108|85614|129030}
    c138:[w]     1384  0.06s  0.26% 103005 cyc {84852|100341|146715}
    c13c:[w]     2695  0.11s  0.49% 99174 cyc {80867|89223|152552}
    c13e:[r]     4151  0.16s  0.70% 91435 cyc {58921|90367|142105}
    c13e:[w]     2801  0.10s  0.43% 83765 cyc {59637|68949|137329}
    c158:[r]        3  0.00s  0.00% 171078 cyc {142682|160531|210022}
    c160:[r]    11170  0.47s  2.10% 101989 cyc {57749|91308|164421}
-- v1 --
 Runstates:
   running:   43626  3.29s 180977 {458861|1457715|108551263}
  runnable:   33342  0.17s  12051 { 10763| 11274| 49965}
        wake:   33338  0.17s  12029 { 10859| 11322| 46346}
     preempt:       4  0.00s 196501 {215151|229615|229615}
   blocked:    1594 14.37s 21642474 {5315376|55242729|55730552}
   offline:   31746  1.11s  84016 { 56306| 94334|254297}
      lost:    8572  3.61s 1010816 {125819|55256941|2798674006}
 cpu affinity:       9 6019464142 {17535733|121593097|18685574584}
   [0]:       2 8635656704 {113902316|17157411092|17157411092}
   [1]:       2 3414990162 {91007856|6738972468|6738972468}
   [2]:       1 17535733 {17535733|17535733|17535733}
   [3]:       3 3790257744 {58547951|121593097|11190632185}
   [4]:       1 18685574584 {18685574584|18685574584|18685574584}
Exit reasons:
 VMEXIT_CR0_READ          1058  0.00s  0.00%  1984 cyc { 1377| 1782| 2844}
 VMEXIT_CR4_READ             2  0.00s  0.00%  2735 cyc { 1787| 3683| 3683}
 VMEXIT_CR0_WRITE          116  0.00s  0.00%  2633 cyc { 1516| 2259| 4879}
   cr0      116  0.00s  0.00%  2633 cyc { 1516| 2259| 4879}
 VMEXIT_CR4_WRITE            4  0.00s  0.00% 49964 cyc {11869|48768|122312}
   cr4        4  0.00s  0.00% 49964 cyc {11869|48768|122312}
 VMEXIT_EXCEPTION_NM         3  0.00s  0.00%  2818 cyc { 2056| 2863| 3537}
 VMEXIT_INTR               184  0.00s  0.01% 15623 cyc { 1710| 5984|60475}
            [  0]: 184
 VMEXIT_VINTR              529  0.00s  0.00%  1048 cyc {  843|  937| 1958}
 VMEXIT_CPUID                1  0.00s  0.00%  4985 cyc { 4985| 4985| 4985}
   cpuid        1  0.00s  0.00%  4985 cyc { 4985| 4985| 4985}
 VMEXIT_PAUSE             1708  0.01s  0.02%  7660 cyc { 5593| 7440|11526}
 VMEXIT_HLT               1592 14.37s 63.62% 21662549 cyc {202128|15856366|55603627}
 VMEXIT_IOIO             35134  1.27s  5.62% 86740 cyc {13924|86402|156122}
 VMEXIT_NPF             196987  1.96s  8.68% 23888 cyc { 7035| 8537|73812}
   npf     1723  0.02s  0.08% 24837 cyc { 7267| 7392|132037}
Guest interrupt counts:
  [ 61] 39
   * intr to halt      :      38  0.15s 9263928 {2909024|239910837|239910837}
  [ 65] 477
   * intr to halt      :     468  0.56s 2890935 {559924|43553694|242292122}
  [ 98] 1
   * wake to halt alone:       1  0.00s 3747339 {3747339|3747339|3747339}
   * wake to halt any  :       1  0.00s 3747339 {3747339|3747339|3747339}
  [131] 655
   * wake to halt alone:     370  2.08s 13477930 {7908304|14496832|17621256}
   * wake to halt any  :     509  3.06s 14437135 {13432838|15393311|21590889}
   * intr to halt      :       8  0.04s 11412831 {7907444|24575561|24575561}
  [163] 39
   * wake to halt alone:      35  0.01s 536004 {508953|587653|723791}
   * wake to halt any  :      37  0.01s 651262 {508953|543191|2678487}
   * intr to halt      :       1  0.00s 9882452 {9882452|9882452|9882452}
  [209] 895
   * wake to halt alone:     629  0.26s 980127 {710985|981324|1747404}
   * wake to halt any  :     632  0.26s 1005716 {698747|1389270|8484811}
   * intr to halt      :     225  0.76s 8088881 {1130407|16521241|237041123}
  [225] 553
   * wake to halt alone:      49  0.01s 661294 {409545|746781|1282089}
   * wake to halt any  :     413  0.64s 3743417 {11056031|43956693|242696434}
   * intr to halt      :      90  0.24s 6500158 {1664770|11531383|15619654}
IO address summary:
      60:[r]       39  0.00s  0.01% 123255 cyc {101561|121797|192328}
      64:[r]       78  0.00s  0.01% 82092 cyc {60363|83695|137178}
     1f1:[r]        1  0.00s  0.00% 102708 cyc {102708|102708|102708}
     1f2:[r]        1  0.00s  0.00% 64517 cyc {64517|64517|64517}
     1f3:[r]        1  0.00s  0.00% 76212 cyc {76212|76212|76212}
     1f4:[r]        1  0.00s  0.00% 68837 cyc {68837|68837|68837}
     1f5:[r]        1  0.00s  0.00% 165166 cyc {165166|165166|165166}
     1f6:[r]        1  0.00s  0.00% 61870 cyc {61870|61870|61870}
     1f7:[r]        3  0.00s  0.00% 80742 cyc {72597|73707|95923}
    b008:[r]     3673  0.03s  0.15% 22555 cyc { 3233| 9488|84403}
    c110:[r]     1092  0.04s  0.18% 90867 cyc {71738|86451|141421}
    c110:[w]     1092  0.06s  0.25% 122528 cyc {101108|106371|175180}
    c114:[r]     1151  0.05s  0.20% 96492 cyc {71618|88406|151274}
    c114:[w]     1151  0.06s  0.25% 119514 cyc {99875|104780|173354}
    c118:[r]     1070  0.04s  0.18% 90383 cyc {71766|86617|132004}
    c118:[w]     1070  0.06s  0.24% 123448 cyc {101107|106267|180971}
    c11c:[r]     1132  0.05s  0.20% 97050 cyc {71769|88643|155133}
    c11c:[w]     1132  0.06s  0.24% 117222 cyc {100001|104037|172917}
    c120:[w]     1092  0.03s  0.15% 72058 cyc {60733|63201|123564}
    c124:[w]     1151  0.03s  0.15% 72428 cyc {60559|63176|120269}
    c128:[w]     1070  0.03s  0.14% 72131 cyc {60627|63169|119545}
    c12c:[w]     1132  0.03s  0.15% 72963 cyc {60454|63204|120223}
    c137:[r]     3633  0.13s  0.58% 86434 cyc {61139|86246|132178}
    c138:[w]     1921  0.08s  0.34% 96845 cyc {83070|89674|138837}
    c13c:[w]     1041  0.06s  0.24% 127147 cyc {70652|111658|242928}
    c13e:[r]     2076  0.09s  0.39% 101374 cyc {61852|93269|162977}
    c13e:[w]     1555  0.06s  0.25% 88343 cyc {60981|76247|153084}
    c160:[r]     7770  0.29s  1.29% 89726 cyc {58642|84207|155716}
    c200:[w]        1  0.00s  0.00% 127476 cyc {127476|127476|127476}
    c202:[r]        2  0.00s  0.00% 119623 cyc {97682|141564|141564}
    c202:[w]        1  0.00s  0.00% 67061 cyc {67061|67061|67061}
-- v2 --
 Runstates:
   running:   28990  2.37s 196305 { 39306|1565031|21329471}
  runnable:   22834  0.12s  12220 { 11007| 11281| 36469}
        wake:   22834  0.12s  12220 { 11007| 11281| 36469}
   blocked:    1500 15.56s 24890523 {12097016|55643314|55766824}
   offline:   21336  0.74s  83739 { 48809|155589|1739725}
      lost:    5452  3.76s 1657117 {285576|799578557|2805133885}
 cpu affinity:       8 6771896397 {2445003|3837773138|28342153403}
   [0]:       1 894774006 {894774006|894774006|894774006}
   [1]:       2 1928081647 {18390157|3837773138|3837773138}
   [2]:       2 6703669176 {2445003|13404893349|13404893349}
   [4]:       1 28342153403 {28342153403|28342153403|28342153403}
   [5]:       1 7609450341 {7609450341|7609450341|7609450341}
   [6]:       1 65291779 {65291779|65291779|65291779}
Exit reasons:
 VMEXIT_CR0_READ           670  0.00s  0.00%  2260 cyc { 1409| 2071| 4036}
   cr_read      671  0.00s  0.00%  2264 cyc { 1409| 2071| 4041}
 VMEXIT_CR4_READ             1  0.00s  0.00%  5179 cyc { 5179| 5179| 5179}
 VMEXIT_CR0_WRITE           26  0.00s  0.00%  2987 cyc { 1544| 2935| 4845}
 VMEXIT_CR4_WRITE            2  0.00s  0.00% 26600 cyc { 7772|45428|45428}
 VMEXIT_EXCEPTION_NM         4  0.00s  0.00%  3011 cyc { 2370| 3184| 4067}
 VMEXIT_INTR                75  0.00s  0.00%  7248 cyc { 2198| 6012|21383}
 VMEXIT_VINTR              349  0.00s  0.00%  1102 cyc {  866|  952| 2109}
   (no handler)     1060  0.00s  0.01%  5706 cyc {  884| 7098| 9602}
 VMEXIT_PAUSE              707  0.00s  0.01%  7994 cyc { 5968| 7917|10874}
 VMEXIT_HLT               1498 15.56s 68.87% 24923502 cyc {135109|21100938|55643238}
   hlt     1498 15.56s 68.87% 24923502 cyc {135109|21100938|55643238}
 VMEXIT_IOIO             24694  0.87s  3.86% 84744 cyc { 5221|85995|153896}
 VMEXIT_NPF             122913  1.49s  6.60% 29122 cyc { 7482|17078|94148}
Guest interrupt counts:
  [ 61] 20
   * intr to halt      :       6  0.07s 29455372 {51574150|101231124|101231124}
  [ 65] 327
   * intr to halt      :     320  0.37s 2741590 {1241527|25242455|120635892}
  [131] 525
   * wake to halt alone:     320  1.59s 11913648 {11263932|11690370|19018894}
   * wake to halt any  :     403  2.07s 12333712 {10575431|16299682|17884623}
   * intr to halt      :       8  0.09s 25669083 {14370394|100508476|100508476}
  [147] 1
   * wake to halt any  :       1  0.00s 2784104 {2784104|2784104|2784104}
  [163] 39
   * wake to halt alone:      39  0.02s 1394855 {519839|1906284|2021523}
   * wake to halt any  :      39  0.02s 1394855 {519839|1906284|2021523}
  [209] 896
   * wake to halt alone:     687  0.28s 980556 {757552|931540|1365184}
   * wake to halt any  :     689  0.29s 1006140 {691214|1156595|1680665}
   * intr to halt      :     178  0.38s 5171004 {1086343|15216383|72994802}
  [225] 422
   * wake to halt alone:      92  0.03s 714445 {517483|955075|1211915}
   * wake to halt any  :     366  0.42s 2761067 {874500|17282306|122538202}
   * intr to halt      :      37  0.09s 5748382 {2202532|17256145|17873933}
IO address summary:
      60:[r]       40  0.00s  0.01% 108129 cyc {97343|106828|128857}
      64:[r]       80  0.00s  0.01% 86394 cyc {59402|86913|174739}
    b008:[r]     3411  0.04s  0.16% 24709 cyc { 3630| 8760|98863}
    c110:[r]      679  0.03s  0.11% 90975 cyc {71308|86235|142038}
    c110:[w]      679  0.04s  0.16% 124868 cyc {100549|105947|189964}
    c114:[r]      678  0.03s  0.12% 99597 cyc {71027|88577|160398}
    c114:[w]      678  0.03s  0.15% 118008 cyc {99684|103813|170294}
    c118:[r]      706  0.03s  0.12% 89676 cyc {71554|86172|132802}
    c118:[w]      706  0.04s  0.16% 122240 cyc {100310|105426|179166}
    c11c:[r]      712  0.03s  0.13% 99038 cyc {71396|88099|162811}
    c11c:[w]      712  0.03s  0.15% 117103 cyc {99301|104023|167368}
    c120:[w]      679  0.02s  0.09% 71771 cyc {60579|63090|120278}
    c124:[w]      678  0.02s  0.09% 73731 cyc {59750|63002|118495}
    c128:[w]      706  0.02s  0.09% 70842 cyc {60166|63012|117806}
    c12c:[w]      712  0.02s  0.09% 72164 cyc {60145|62879|116238}
    c137:[r]     2622  0.10s  0.42% 87654 cyc {60166|86817|134937}
    c138:[w]     1368  0.06s  0.25% 98541 cyc {83112|93535|139715}
    c13c:[w]      840  0.05s  0.20% 130214 cyc {70259|126757|215964}
    c13e:[r]     1560  0.06s  0.27% 93779 cyc {60546|89935|145220}
    c13e:[w]     1140  0.04s  0.17% 81969 cyc {60329|64046|143139}
    c160:[r]     5308  0.20s  0.90% 91498 cyc {58849|85111|153525}
-- v3 --
 Runstates:
   running:   64567  9.22s 342781 {1484508|7262935|7902677}
  runnable:   52471  0.26s  11927 { 11080| 11520| 31809}
        wake:   52469  0.26s  11927 { 11043| 11357| 32113}
     preempt:       2  0.00s  12553 { 12661| 12661| 12661}
   blocked:    1785  8.92s 11986838 {1431653|55762304|55877572}
   offline:   50684  1.70s  80692 { 53736|165673|259901}
      lost:   11621  2.45s 506113 { 87548|2801145618|2801145618}
 cpu affinity:      14 3869655081 {5481925|1239831917|13708491898}
   [2]:       1 5806981051 {5806981051|5806981051|5806981051}
   [3]:       2 262734342 {40827394|484641291|484641291}
   [4]:       3 5234128266 {608429866|6677267277|8416687655}
   [5]:       1 320790286 {320790286|320790286|320790286}
   [6]:       3 4003420874 {5481925|1239831917|10764948781}
   [7]:       4 4952320925 {11233815|5989508990|13708491898}
Exit reasons:
 VMEXIT_CR0_READ          5643  0.00s  0.02%  1727 cyc { 1377| 1518| 2387}
 VMEXIT_CR4_READ             1  0.00s  0.00%  1431 cyc { 1431| 1431| 1431}
 VMEXIT_CR0_WRITE         3601  0.00s  0.01%  1836 cyc { 1463| 1670| 2355}
 VMEXIT_CR4_WRITE            2  0.00s  0.00% 21661 cyc {10362|32960|32960}
 VMEXIT_EXCEPTION_NM       115  0.00s  0.00%  2204 cyc { 1872| 2139| 2829}
 VMEXIT_INTR               285  0.00s  0.00%  5640 cyc { 1619| 5174| 8227}
 VMEXIT_VINTR             3683  0.00s  0.01%   955 cyc {  843|  907| 1052}
 VMEXIT_PAUSE              455  0.00s  0.01%  6134 cyc { 4773| 6120| 7396}
 VMEXIT_HLT               1800  8.92s 39.50% 11896143 cyc {238144|4061739|55593253}
 VMEXIT_IOIO             54852  1.92s  8.48% 83847 cyc { 6882|81138|147937}
 VMEXIT_NPF            1092889  6.17s 27.31% 13546 cyc { 6959| 7514|33800}
Guest interrupt counts:
  [ 61] 1684
   * intr to halt      :     548  5.13s 22450831 {3778265|14133616|218628338}
  [ 65] 1991
   * intr to halt      :    1427  8.36s 14057030 {4011117|71361455|229241448}
  [131] 1770
   * wake to halt alone:       7  0.02s 5288873 {5931229|14074711|14074711}
   * wake to halt any  :     191  2.79s 35009497 {14959633|83439334|206431891}
   * intr to halt      :     270  4.33s 38487841 {14319853|39022575|225105516}
  [147] 1
   * intr to halt      :       1  0.01s 15635859 {15635859|15635859|15635859}
  [163] 38
   * wake to halt alone:      38  0.01s 896225 {451118|1890318|2031855}
   * wake to halt any  :      38  0.01s 896225 {451118|1890318|2031855}
  [209] 893
   * wake to halt alone:     397  0.16s 937934 {698539|1174542|1286623}
   * wake to halt any  :     397  0.16s 937934 {698539|1174542|1286623}
   * intr to halt      :     375  3.90s 24979495 {3137757|185889856|230353901}
  [225] 1477
   * wake to halt alone:      64  0.02s 630251 {274884|756355|1123376}
   * wake to halt any  :    1158  6.48s 13438571 {4728774|37912998|270993904}
   * intr to halt      :     191  0.44s 5512547 {4290276|9106573|58931777}
IO address summary:
      60:[r]       39  0.00s  0.01% 113874 cyc {97805|106626|177149}
      64:[r]       78  0.00s  0.01% 87313 cyc {61237|76462|167140}
     1f2:[w]        1  0.00s  0.00% 122371 cyc {122371|122371|122371}
     1f3:[w]        1  0.00s  0.00% 194170 cyc {194170|194170|194170}
     1f4:[w]        1  0.00s  0.00% 100258 cyc {100258|100258|100258}
     1f5:[w]        1  0.00s  0.00% 120200 cyc {120200|120200|120200}
     1f6:[w]        2  0.00s  0.00% 172200 cyc {166877|177523|177523}
     1f7:[r]        1  0.00s  0.00% 77447 cyc {77447|77447|77447}
     1f7:[w]        1  0.00s  0.00% 122382 cyc {122382|122382|122382}
    b008:[r]     4243  0.02s  0.11% 13427 cyc { 2870| 5204|61455}
    c110:[r]     1360  0.05s  0.22% 88450 cyc {70270|83951|138920}
    c110:[w]     1360  0.07s  0.32% 125564 cyc {102596|108502|174654}
    c114:[r]     1291  0.05s  0.21% 88115 cyc {70087|83422|131137}
    c114:[w]     1291  0.06s  0.29% 119702 cyc {100134|105541|169273}
    c118:[r]     1354  0.05s  0.22% 87680 cyc {71388|83572|138890}
    c118:[w]     1354  0.07s  0.31% 125796 cyc {102833|110831|173015}
    c11c:[r]     1283  0.05s  0.21% 86995 cyc {69391|81444|129250}
    c11c:[w]     1283  0.06s  0.28% 119668 cyc {99986|105647|168919}
    c120:[w]     1360  0.04s  0.18% 70165 cyc {61811|63330|109565}
    c124:[w]     1291  0.04s  0.16% 67755 cyc {61477|63103|95240}
    c128:[w]     1354  0.04s  0.18% 70471 cyc {61650|63315|111335}
    c12c:[w]     1283  0.04s  0.16% 68790 cyc {61440|63096|97771}
    c137:[r]     7860  0.27s  1.18% 81041 cyc {61215|83695|126899}
    c138:[w]     3853  0.15s  0.67% 94432 cyc {80112|86583|145820}
    c13c:[w]     2881  0.14s  0.61% 114011 cyc {71643|100126|192250}
    c13e:[r]     5222  0.18s  0.81% 83899 cyc {61201|76475|130288}
    c13e:[w]     3781  0.12s  0.53% 76432 cyc {61198|63450|130049}
    c160:[r]    11019  0.41s  1.81% 89092 cyc {60972|80952|147024}
    c200:[w]        2  0.00s  0.02% 5656527 cyc {86679|11226375|11226375}
    c202:[w]        1  0.00s  0.00% 194932 cyc {194932|194932|194932}
    c204:[w]        1  0.00s  0.00% 120778 cyc {120778|120778|120778}
Emulate eip list
|-- Domain 32767 --|
 Runstates:
  full run:   20158  8.07s 960540 {24508284|48303504|55612316}
  concurrency_hazard:  480836 14.51s  72437 {138395|407730|1838647}
  full_contention:     303  0.01s  49247 { 26856|109628|2204652}
 Grant table ops:
  Done by:
  Done for:
 Populate-on-demand:
  Populated:
  Reclaim order:
  Reclaim contexts:
-- v0 --
 Runstates:
   running:   58270 11.36s 467824 {250489|117260040|173318764}
  runnable:   58269  9.65s 397613 {134437|1788201|5532268}
     preempt:   58269  9.65s 397613 {134437|1788201|5532268}
      lost:       7  1.57s 538127830 {2797511285|2797511285|2797511285}
 cpu affinity:       1 54196364345 {54196364345|54196364345|54196364345}
   [0]:       1 54196364345 {54196364345|54196364345|54196364345}
-- v1 --
 Runstates:
   running:    3932 21.45s 13094693 {17977173|37250684|932076417}
  runnable:    3932  1.06s 648642 {188105|6674733|77304619}
     preempt:    3932  1.06s 648642 {188105|6674733|77304619}
      lost:       1  0.00s 1239161 {1239161|1239161|1239161}
 cpu affinity:       1 54208693122 {54208693122|54208693122|54208693122}
   [1]:       1 54208693122 {54208693122|54208693122|54208693122}
-- v2 --
 Runstates:
   running:   26548 18.95s 1712748 {13902109|55802074|8141459106}
  runnable:   26548  2.08s 188469 {125571|8539282|5728579}
     preempt:   26548  2.08s 188469 {125571|8539282|5728579}
      lost:       3  1.52s 1217863293 {2799301747|2799301747|2799301747}
 cpu affinity:       1 54173840372 {54173840372|54173840372|54173840372}
   [2]:       1 54173840372 {54173840372|54173840372|54173840372}
-- v3 --
 Runstates:
   running:   33856 14.59s 1034379 {599353|55608439|3607433530}
  runnable:   33856  3.84s 272094 { 14781|9932325|8455743}
     preempt:   33856  3.84s 272094 { 14781|9932325|8455743}
      lost:       5  4.12s 1979027777 {8990071379|8990071379|8990071379}
 cpu affinity:       1 54174156751 {54174156751|54174156751|54174156751}
   [3]:       1 54174156751 {54174156751|54174156751|54174156751}
-- v4 --
 Runstates:
   running:   29955  9.06s 726119 { 74110|55290934|1193159250}
  runnable:   29954  3.13s 251109 {263459|4451443|8985833}
     preempt:   29954  3.13s 251109 {263459|4451443|8985833}
      lost:       7  1.58s 540515178 {2805130741|2805130741|2805130741}
 cpu affinity:       1 47497264656 {47497264656|47497264656|47497264656}
   [4]:       1 47497264656 {47497264656|47497264656|47497264656}
-- v5 --
 Runstates:
   running:   33959 18.65s 1318293 {17195293|56077117|7408987039}
  runnable:   33959  2.72s 192490 { 63669|1010012|96588042}
     preempt:   33959  2.72s 192490 { 63669|1010012|96588042}
      lost:       1  1.18s 2821575661 {2821575661|2821575661|2821575661}
 cpu affinity:       1 54174031168 {54174031168|54174031168|54174031168}
   [5]:       1 54174031168 {54174031168|54174031168|54174031168}
-- v6 --
 Runstates:
   running:   28470 12.43s 1048033 {2122133|55724138|5423959831}
  runnable:   28469  3.05s 256745 {443128|2395786|6979486}
     preempt:   28469  3.05s 256745 {443128|2395786|6979486}
      lost:       2  1.50s 1801322649 {2803066742|2803066742|2803066742}
 cpu affinity:       1 54174064971 {54174064971|54174064971|54174064971}
   [6]:       1 54174064971 {54174064971|54174064971|54174064971}
-- v7 --
 Runstates:
   running:   35656 16.37s 1102177 {102957|55742941|55759256}
  runnable:   35656  3.63s 244535 { 29088|2769742|8102412}
     preempt:   35656  3.63s 244535 { 29088|2769742|8102412}
      lost:       1  0.00s 8154621 {8154621|8154621|8154621}
 cpu affinity:       1 48074953440 {48074953440|48074953440|48074953440}
   [7]:       1 48074953440 {48074953440|48074953440|48074953440}
Emulate eip list
generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
..... LOTS OF THESE ...............
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
read_record: read returned zero, deactivating pcpu 2
update_cycles: cycles 0! Not updating...
deactivate_pcpu: setting d32767v2 to state LOST
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
read_record: read returned zero, deactivating pcpu 3
update_cycles: cycles 0! Not updating...
deactivate_pcpu: setting d32767v3 to state LOST
deactivate_pcpu: Setting max_active_pcpu to 0
read_record: read returned zero, deactivating pcpu 0
deactivate_pcpu: setting d0v0 to state LOST
deactivate_pcpu: Setting max_active_pcpu to -1

[-- Attachment #3: xp32.xenalyze.good --]
[-- Type: text/plain, Size: 31433 bytes --]

Total time: 22.59 seconds (using cpu speed 2.40 GHz)
--- Log volume summary ---
 - cpu 0 -
 gen   :       4888
 sched :   17558548
 +-verbose:   11348260
 mem   :      23772
 pv    :   60012224
 hw    :    1015656
 - cpu 1 -
 gen   :        352
 sched :    1136804
 +-verbose:     780036
 mem   :      32256
 pv    :    2757004
 - cpu 2 -
 gen   :       7540
 sched :   15839172
 +-verbose:   10916804
 hvm   :   99457748
 +-vmentry:   16743228
 +-vmexit :   27905340
 +-handler:   54809180
 mem   :       3136
 - cpu 4 -
 gen   :       2100
 sched :   13423880
 +-verbose:    9271912
 hvm   :   18204504
 +-vmentry:    3143316
 +-vmexit :    5238860
 +-handler:    9822328
 mem   :       2324
 - cpu 5 -
 gen   :       1552
 sched :    7832748
 +-verbose:    5396876
 hvm   :   16070272
 +-vmentry:    2382024
 +-vmexit :    3970040
 +-handler:    9718208
 mem   :      13356
 - cpu 6 -
 gen   :         76
 sched :      29000
 +-verbose:      20792
 hvm   :     233140
 +-vmentry:      39828
 +-vmexit :      66380
 +-handler:     126932
 - cpu 7 -
 gen   :       2052
 sched :   10999720
 +-verbose:    7565912
 hvm   :   24083932
 +-vmentry:    3530976
 +-vmexit :    5884960
 +-handler:   14667996
 mem   :      25396
|-- Domain 0 --|
 Runstates:
   blocked:   37707  8.00s 509405 { 94685|102458625|187440398}
  partial run:   47714  9.59s 482262 {126844|1270915|3483701}
  full run:    6258  1.19s 455558 {189850|255081|34097723}
  partial contention:   40511  0.20s  11576 {  7607| 12577| 51029}
  concurrency_hazard:    5382  0.03s  12866 { 10391| 13254| 50500}
  full_contention:     149  0.00s  12024 {  6603| 13367| 14010}
      lost:     582  3.59s 14792935 {109511|44724936|643897262}
 Grant table ops:
  Done by:
  Done for:
 Populate-on-demand:
  Populated:
  Reclaim order:
  Reclaim contexts:
-- v0 --
 Runstates:
   running:   44727 10.60s 568889 {708394|2740075|5231817}
  runnable:   42474  0.21s  11690 { 10741| 12172| 38490}
        wake:   39482  0.19s  11594 { 10669| 20202| 45589}
     preempt:    2992  0.02s  12958 { 11601| 12857| 22195}
   blocked:   39483  8.05s 489519 { 84893|203851337|203851337}
      lost:     178  3.72s 50218875 {1784246130|3711119605|3711119605}
 cpu affinity:       1 54208612701 {54208612701|54208612701|54208612701}
   [0]:       1 54208612701 {54208612701|54208612701|54208612701}
PV events:
  hypercall  2482441
    fpu_taskswitch               [ 5]:  18383
    multicall                    [13]: 210521
    xen_version                  [17]: 128264
    iret                         [23]: 1563583
    vcpu_op                      [24]: 169754
    set_segment_base             [25]:  93004
    mmuext_op                    [26]:    259
    sched_op                     [29]:  46988
    evtchn_op                    [32]: 209196
    physdev_op                   [33]:  30280
    hvm_op                       [34]:  12209
  trap  18065
    [7] 18065
  page_fault  1
  math state restore  1
  ptwr  1
-- v1 --
 Runstates:
   running:    4020  1.27s 755530 {172097|7102331|75237478}
  runnable:    3526  0.02s  12372 { 10494| 13367| 14516}
        wake:    3445  0.02s  12423 { 10397| 14704|100354}
     preempt:      81  0.00s  10205 {  9063| 11025| 12711}
   blocked:    3448 17.68s 12308978 { 76757|20792831|605815115}
      lost:     461  3.62s 18842719 {28052095|30555280|643897262}
 cpu affinity:       1 54186195323 {54186195323|54186195323|54186195323}
   [1]:       1 54186195323 {54186195323|54186195323|54186195323}
PV events:
  hypercall  113536
    fpu_taskswitch               [ 5]:    598
    multicall                    [13]:   9757
    xen_version                  [17]:     53
    iret                         [23]:  64282
    vcpu_op                      [24]:  16186
    set_segment_base             [25]:   4746
    mmuext_op                    [26]:    100
    sched_op                     [29]:   4018
    evtchn_op                    [32]:  12684
    physdev_op                   [33]:    572
    hvm_op                       [34]:    539
    sysctl                       [35]:      1
  trap  491
    [7] 491
  page_fault  6
  math state restore  1
  ptwr  12
Emulate eip list
|-- Domain 1 --|
 Runstates:
   blocked:   94046  8.92s 227649 { 52621|55704205|56195487}
  partial run:  344548  8.85s  61668 { 14486|171379|1036858}
  full run:    3281  0.11s  77947 { 83526|425008|2636720}
  partial contention:   96663  0.46s  11369 { 10599| 11117| 20357}
  concurrency_hazard:  128577  0.60s  11154 { 10626| 11095| 40381}
      lost:    1521  3.64s 5743223 {1159241792|3712118700|3712118700}
 Grant table ops:
  Done by:
  Done for:
 Populate-on-demand:
  Populated:
  Reclaim order:
  Reclaim contexts:
-- v0 --
 Runstates:
   running:   51500  2.13s  99294 {136795|517828|2602314}
  runnable:   49715  0.23s  11249 { 10587| 11066| 32194}
        wake:   49714  0.23s  11249 { 10540| 13004| 31718}
     preempt:       1  0.00s  14798 { 14798| 14798| 14798}
   blocked:    1698 14.93s 21106130 {5657249|56082706|56262055}
   offline:   48018  1.59s  79308 { 49546|126081|221193}
      lost:     414  3.69s 21406206 {1051018576|3744856787|3744856787}
 cpu affinity:       1 54197489301 {54197489301|54197489301|54197489301}
   [7]:       1 54197489301 {54197489301|54197489301|54197489301}
Exit reasons:
 VMEXIT_CR0_READ           803  0.00s  0.00%  1824 cyc { 1357| 1557| 2442}
   cr_read      824  0.00s  0.00%  1815 cyc { 1358| 1536| 2439}
 VMEXIT_CR4_READ            21  0.00s  0.00%  1452 cyc { 1376| 1436| 1528}
 VMEXIT_CR0_WRITE           88  0.00s  0.00%  2106 cyc { 1373| 1552| 4453}
   cr0       88  0.00s  0.00%  2106 cyc { 1373| 1552| 4453}
 VMEXIT_CR4_WRITE           42  0.00s  0.00%  2039 cyc { 1457| 1628| 4664}
   cr4       42  0.00s  0.00%  2039 cyc { 1457| 1628| 4664}
 VMEXIT_INTR               143  0.00s  0.00%  6178 cyc { 2090| 6774| 9113}
 VMEXIT_VINTR              407  0.00s  0.00%  1087 cyc {  937| 1023| 1220}
   (no handler)     1786  0.00s  0.02%  6274 cyc {  965| 7536| 8658}
 VMEXIT_CPUID                1  0.00s  0.00%  4990 cyc { 4990| 4990| 4990}
   cpuid        1  0.00s  0.00%  4990 cyc { 4990| 4990| 4990}
 VMEXIT_PAUSE             1379  0.00s  0.02%  7805 cyc { 6382| 8007| 8728}
 VMEXIT_HLT               1697 14.93s 66.11% 21119485 cyc {501986|14677973|56232632}
   hlt     1697 14.93s 66.11% 21119485 cyc {501986|14677973|56232632}
 VMEXIT_IOIO             54604  1.76s  7.81% 77554 cyc { 2997|78235|144619}
 VMEXIT_NPF             234658  0.73s  3.25%  7508 cyc { 5939| 6761| 9752}
   mmio   233751  0.73s  3.23%  7492 cyc { 5935| 6760| 9375}
Guest interrupt counts:
  [ 61] 11
   * intr to halt      :      11  0.07s 14884188 {158810550|158810550|158810550}
  [ 65] 386
   * intr to halt      :     375  0.27s 1745814 {485056|40319522|159388476}
  [ 98] 5
   * wake to halt alone:       5  0.00s 1532618 {1482054|1494620|1715913}
   * wake to halt any  :       5  0.00s 1532618 {1482054|1494620|1715913}
  [131] 783
   * wake to halt alone:     591  2.69s 10922331 {5281437|14242986|14259572}
   * wake to halt any  :     750  3.47s 11114028 {7066184|11797347|15261712}
   * intr to halt      :      25  0.09s 8574736 {5599075|8098635|18095042}
  [147] 1
   * wake to halt any  :       1  0.00s 944531 {944531|944531|944531}
  [163] 60
   * wake to halt alone:      40  0.01s 553121 {382819|640684|791927}
   * wake to halt any  :      52  0.01s 651699 {407835|926356|1268681}
   * intr to halt      :       8  0.03s 7571050 {9842648|11271680|11524653}
  [209] 809
   * wake to halt alone:     537  0.08s 367147 {143253|699853|1894333}
   * wake to halt any  :     646  0.18s 668181 {211042|1109208|14118195}
   * intr to halt      :     158  0.47s 7111509 {2067661|11764312|138584144}
  [225] 271
   * wake to halt alone:      16  0.00s 106031 { 80919|125989|131098}
   * wake to halt any  :     241  0.24s 2356611 {2318042|159518661|159518661}
   * intr to halt      :      17  0.05s 6356261 {4628798|11182151|11182151}
IO address summary:
      60:[r]       61  0.00s  0.02% 157906 cyc {96928|170193|250975}
      64:[r]      122  0.01s  0.02% 110303 cyc {58012|95173|208007}
      70:[w]     1618  0.00s  0.01%  2512 cyc { 2010| 2396| 3143}
      71:[r]     1618  0.00s  0.01%  2600 cyc { 2057| 2699| 3084}
     1f1:[r]        5  0.00s  0.00% 64030 cyc {56500|60289|83998}
     1f2:[r]        5  0.00s  0.00% 69287 cyc {56164|59761|89044}
     1f3:[r]        5  0.00s  0.00% 62313 cyc {56768|59896|75234}
     1f4:[r]        5  0.00s  0.00% 59904 cyc {56393|59006|64464}
     1f5:[r]        5  0.00s  0.00% 59017 cyc {56236|59591|60881}
     1f6:[r]        5  0.00s  0.00% 58502 cyc {56373|58452|60032}
     1f7:[r]       15  0.00s  0.00% 75320 cyc {65519|70159|101163}
    b008:[r]     3422  0.01s  0.02%  3818 cyc { 2818| 3816| 4681}
    c110:[r]     1632  0.06s  0.26% 86852 cyc {69677|83259|140984}
    c110:[w]     1632  0.08s  0.36% 118961 cyc {97938|103766|170211}
    c114:[r]     1700  0.07s  0.29% 92105 cyc {70652|86425|142737}
    c114:[w]     1700  0.08s  0.36% 114275 cyc {97297|101446|166441}
    c118:[r]     1625  0.06s  0.26% 87866 cyc {69678|83532|134399}
    c118:[w]     1625  0.08s  0.36% 120321 cyc {97819|104359|171048}
    c11c:[r]     1692  0.07s  0.29% 92485 cyc {70610|86482|146087}
    c11c:[w]     1692  0.08s  0.36% 114608 cyc {97100|101669|164394}
    c120:[w]     1632  0.05s  0.21% 68614 cyc {57789|60514|118140}
    c124:[w]     1700  0.05s  0.22% 70647 cyc {57605|60663|124841}
    c128:[w]     1625  0.05s  0.20% 67452 cyc {57753|60439|111780}
    c12c:[w]     1692  0.05s  0.21% 68687 cyc {57711|60709|114398}
    c137:[r]     5800  0.20s  0.87% 80945 cyc {57774|83406|127565}
    c138:[w]     2984  0.11s  0.51% 91895 cyc {78703|85208|138853}
    c13c:[w]     1555  0.08s  0.33% 116735 cyc {68645|103653|209848}
    c13e:[r]     3303  0.12s  0.51% 84444 cyc {58232|74837|135806}
    c13e:[w]     2523  0.08s  0.37% 78776 cyc {58039|61580|144991}
    c158:[r]        3  0.00s  0.00% 150248 cyc {148291|149854|152601}
    c160:[r]    11583  0.40s  1.75% 81949 cyc {56943|76444|136284}
    c200:[w]        5  0.00s  0.00% 64682 cyc {58660|60042|84083}
    c202:[r]       10  0.00s  0.00% 79329 cyc {60392|83486|114284}
    c202:[w]        5  0.00s  0.00% 65693 cyc {58073|59710|91231}
-- v1 --
 Runstates:
   running:   37045  1.53s  99013 { 14936|1956030|3880406}
  runnable:   35604  0.17s  11741 { 10907| 11099| 50604}
        wake:   35604  0.17s  11741 { 10907| 11099| 50604}
   blocked:    1668 16.03s 23065720 {11755968|56253424|56386607}
   offline:   33939  1.16s  82120 { 53101|114582|287620}
      lost:     508  3.68s 17391174 {1159241792|3744980564|3744980564}
 cpu affinity:       2 27098670516 {7760167239|46437173793|46437173793}
   [6]:       2 27098670516 {7760167239|46437173793|46437173793}
Exit reasons:
 VMEXIT_CR0_READ           713  0.00s  0.00%  1918 cyc { 1310| 1803| 3095}
 VMEXIT_CR4_READ             4  0.00s  0.00%  2037 cyc { 1446| 2135| 2646}
 VMEXIT_CR0_WRITE          146  0.00s  0.00%  2548 cyc { 1538| 2360| 4399}
 VMEXIT_CR4_WRITE            8  0.00s  0.00%  4300 cyc { 1967| 4372| 7849}
 VMEXIT_EXCEPTION_NM         5  0.00s  0.00%  2470 cyc { 2208| 2319| 3162}
 VMEXIT_INTR               183  0.00s  0.00%  5186 cyc { 1360| 6017| 8999}
 VMEXIT_VINTR              386  0.00s  0.00%  1069 cyc {  856|  978| 1635}
 VMEXIT_PAUSE              941  0.00s  0.01%  7444 cyc { 5660| 7645| 8551}
 VMEXIT_HLT               1665 16.03s 70.96% 23101733 cyc {324109|17708086|56277631}
 VMEXIT_IOIO             37112  1.27s  5.61% 81998 cyc { 4137|79880|146311}
 VMEXIT_NPF             160158  0.53s  2.36%  7980 cyc { 5894| 6825|10858}
   npf      477  0.00s  0.01% 15012 cyc { 6759| 7386|42828}
Guest interrupt counts:
  [ 61] 84
   * intr to halt      :      70  0.12s 4147961 {3508243|17008608|25897341}
  [ 65] 298
   * intr to halt      :     285  0.27s 2235728 {2396785|17946490|37942638}
  [ 98] 8
   * wake to halt alone:       6  0.00s 1814363 {1518830|2982541|2982541}
   * wake to halt any  :       8  0.01s 2025239 {1843934|1918858|3471801}
  [131] 683
   * wake to halt alone:     511  1.95s 9175550 {8269849|8631495|14327617}
   * wake to halt any  :     640  2.54s 9534582 {8289700|13199021|26546649}
   * intr to halt      :      17  0.06s 8032405 {5943005|15563673|15563673}
  [147] 1
   * intr to halt      :       1  0.01s 15010688 {15010688|15010688|15010688}
  [163] 61
   * wake to halt alone:      42  0.01s 524509 {407353|431518|725336}
   * wake to halt any  :      55  0.03s 1178037 {441251|3424283|8366160}
   * intr to halt      :       5  0.04s 18662534 {16886309|41796612|41796612}
  [209] 813
   * wake to halt alone:     690  0.04s 132857 {111839|137343|198118}
   * wake to halt any  :     691  0.04s 149667 { 92545|134877|11748513}
   * intr to halt      :     122  0.27s 5313065 {3050219|43209675|43209675}
  [225] 318
   * wake to halt alone:      47  0.00s 132008 {103342|132699|213651}
   * wake to halt any  :     271  0.19s 1693566 {765848|17650191|28219848}
   * intr to halt      :      36  0.10s 6596486 {6881188|13128293|19939986}
IO address summary:
      60:[r]       62  0.00s  0.02% 156883 cyc {97321|144241|250146}
      64:[r]      124  0.01s  0.03% 118597 cyc {58133|117910|172747}
     1f1:[r]        8  0.00s  0.00% 88705 cyc {58909|60501|182439}
     1f2:[r]        8  0.00s  0.00% 69586 cyc {59163|60718|126357}
     1f2:[w]        6  0.00s  0.00% 55006 cyc {51523|52984|60523}
     1f3:[r]        8  0.00s  0.00% 77799 cyc {59270|60492|137949}
     1f3:[w]        6  0.00s  0.00% 54048 cyc {50519|52135|60040}
     1f4:[r]        8  0.00s  0.00% 69504 cyc {59052|60839|134524}
     1f4:[w]        6  0.00s  0.00% 53912 cyc {50395|51417|60312}
     1f5:[r]        8  0.00s  0.00% 83627 cyc {59075|60362|178419}
     1f5:[w]        6  0.00s  0.00% 53981 cyc {51025|51376|59627}
     1f6:[r]        8  0.00s  0.00% 70378 cyc {59155|60434|135629}
     1f6:[w]       12  0.00s  0.00% 65338 cyc {51029|72707|86372}
     1f7:[r]       30  0.00s  0.01% 91485 cyc {67932|75048|210043}
     1f7:[w]        6  0.00s  0.00% 55355 cyc {52104|55826|59922}
    b008:[r]     3410  0.01s  0.03%  4387 cyc { 2953| 4009| 6679}
    c110:[r]     1075  0.04s  0.17% 83960 cyc {70153|77574|120941}
    c110:[w]     1075  0.05s  0.24% 121141 cyc {100124|104950|172641}
    c114:[r]     1080  0.04s  0.19% 95401 cyc {71017|84731|159937}
    c114:[w]     1080  0.05s  0.23% 117164 cyc {99446|102858|169735}
    c118:[r]     1084  0.04s  0.17% 85174 cyc {70471|77978|123293}
    c118:[w]     1084  0.05s  0.24% 120101 cyc {100292|104798|171974}
    c11c:[r]     1090  0.04s  0.19% 93586 cyc {71545|83333|155126}
    c11c:[w]     1090  0.05s  0.23% 115283 cyc {99247|102485|164801}
    c120:[w]     1075  0.03s  0.14% 70450 cyc {60113|61878|122579}
    c124:[w]     1080  0.03s  0.15% 77004 cyc {60167|61884|128823}
    c128:[w]     1084  0.03s  0.14% 69253 cyc {60530|61865|115408}
    c12c:[w]     1090  0.03s  0.15% 73611 cyc {60501|61919|124304}
    c137:[r]     4175  0.14s  0.64% 82760 cyc {59821|84090|129655}
    c138:[w]     2115  0.08s  0.37% 93584 cyc {79444|85971|137570}
    c13c:[w]     1344  0.06s  0.24% 98516 cyc {70083|80909|160771}
    c13e:[r]     2508  0.09s  0.42% 89784 cyc {60542|75208|138755}
    c13e:[w]     1836  0.06s  0.27% 79788 cyc {60549|62435|136483}
    c160:[r]     8375  0.30s  1.34% 86916 cyc {57614|78014|138403}
    c200:[w]       20  0.00s  0.00% 100274 cyc {60097|109359|195309}
    c202:[r]       16  0.00s  0.00% 95498 cyc {61221|86730|267969}
    c202:[w]       14  0.00s  0.00% 80048 cyc {55803|60526|190245}
    c204:[w]        6  0.00s  0.00% 54751 cyc {51298|53503|60337}
-- v2 --
 Runstates:
   running:   73801  7.00s 227773 {128320|2530950|3592159}
  runnable:   72996  0.36s  11836 { 10862| 31370| 31888}
        wake:   72988  0.36s  11836 { 10963| 31269| 31997}
     preempt:       8  0.00s  12864 { 12716| 13009| 13039}
   blocked:    2661  9.19s 8286378 {1582803|56290731|56327725}
   offline:   70327  2.33s  79442 { 49978|119372|241629}
      lost:     219  3.70s 40504564 {1038343840|3712118700|3712118700}
 cpu affinity:       1 54197339054 {54197339054|54197339054|54197339054}
   [2]:       1 54197339054 {54197339054|54197339054|54197339054}
Exit reasons:
 VMEXIT_CR0_READ          7051  0.00s  0.02%  1689 cyc { 1351| 1462| 2364}
 VMEXIT_CR4_READ             5  0.00s  0.00%  1588 cyc { 1475| 1552| 1719}
 VMEXIT_CR0_WRITE         4345  0.00s  0.02%  1883 cyc { 1502| 1709| 2386}
 VMEXIT_CR4_WRITE           10  0.00s  0.00%  3375 cyc { 1800| 2521| 6628}
 VMEXIT_EXCEPTION_NM       132  0.00s  0.00%  2210 cyc { 2005| 2197| 2462}
 VMEXIT_INTR               508  0.00s  0.00%  5335 cyc { 1528| 5088| 8666}
 VMEXIT_VINTR             4425  0.00s  0.01%   977 cyc {  890|  951| 1053}
 VMEXIT_PAUSE              594  0.00s  0.01%  6929 cyc { 5005| 6840| 8114}
 VMEXIT_HLT               2680  9.20s 40.72% 8236727 cyc {65406|3729234|56273450}
 VMEXIT_IOIO             75721  2.59s 11.48% 82218 cyc { 4166|75834|143312}
 VMEXIT_NPF            1299591  3.46s 15.32%  6392 cyc { 5723| 6106| 7498}
Guest interrupt counts:
  [ 61] 1834
   * intr to halt      :     668  5.08s 18253740 {9229735|87105990|166298068}
  [ 65] 2587
   * intr to halt      :    2119  7.73s 8757538 {2270685|66743254|159496660}
  [ 98] 8
   * wake to halt alone:       2  0.00s 2079769 {2603547|2603547|2603547}
   * wake to halt any  :       7  0.03s 11649194 {9032566|49953990|49953990}
   * intr to halt      :       1  0.00s 3227378 {3227378|3227378|3227378}
  [131] 2003
   * wake to halt alone:      26  0.06s 5824553 {4382424|4382424|11709247}
   * wake to halt any  :     455  4.77s 25135018 {11189333|123383465|187964082}
   * intr to halt      :     268  2.99s 26745810 {11229360|95095114|133680989}
  [163] 77
   * wake to halt alone:      37  0.01s 522482 {402223|842395|1099235}
   * wake to halt any  :      40  0.01s 789796 {399016|6710515|6710515}
   * intr to halt      :      26  0.23s 21435073 {19762904|46914434|71828369}
  [209] 805
   * wake to halt alone:     380  0.02s 110608 { 94828|124305|179215}
   * wake to halt any  :     381  0.02s 113648 { 96125|146473|179215}
   * intr to halt      :     390  3.06s 18801885 {7654370|87354547|146146814}
  [225] 1836
   * wake to halt alone:      49  0.00s 117724 { 95196|133523|291503}
   * wake to halt any  :    1776  4.80s 6481582 {2455290|53929920|166589631}
   * intr to halt      :      29  0.13s 10412105 {5282717|45883600|45883600}
IO address summary:
      60:[r]       77  0.01s  0.02% 157812 cyc {104284|172362|240954}
      64:[r]      154  0.01s  0.03% 114981 cyc {59911|114167|204270}
     1f1:[r]        8  0.00s  0.00% 98713 cyc {56959|60551|229538}
     1f2:[r]        8  0.00s  0.00% 86623 cyc {59066|60523|166485}
     1f2:[w]       18  0.00s  0.00% 58446 cyc {51427|59599|83686}
     1f3:[r]        8  0.00s  0.00% 80196 cyc {58232|60731|137836}
     1f3:[w]       18  0.00s  0.00% 57441 cyc {49699|58978|88510}
     1f4:[r]        8  0.00s  0.00% 77683 cyc {59224|60417|137303}
     1f4:[w]       18  0.00s  0.00% 63050 cyc {49856|59166|157962}
     1f5:[r]        8  0.00s  0.00% 79584 cyc {58081|59782|147553}
     1f5:[w]       18  0.00s  0.00% 58695 cyc {50205|59299|86135}
     1f6:[r]        8  0.00s  0.00% 82533 cyc {58946|60127|166580}
     1f6:[w]       36  0.00s  0.00% 68651 cyc {49970|72625|87754}
     1f7:[r]       42  0.00s  0.01% 87464 cyc {67561|76813|149719}
     1f7:[w]       18  0.00s  0.00% 63264 cyc {50850|60083|128515}
    b008:[r]     5416  0.01s  0.04%  3609 cyc { 2626| 3544| 4555}
    c110:[r]     1882  0.07s  0.30% 87240 cyc {71154|78468|137183}
    c110:[w]     1882  0.10s  0.45% 128814 cyc {101685|130851|187240}
    c114:[r]     1796  0.07s  0.30% 89621 cyc {71016|77234|151873}
    c114:[w]     1796  0.09s  0.40% 119998 cyc {99261|104478|170768}
    c118:[r]     1857  0.07s  0.30% 86835 cyc {70949|79316|141367}
    c118:[w]     1857  0.10s  0.44% 128635 cyc {101956|129329|189815}
    c11c:[r]     1772  0.07s  0.29% 88507 cyc {71731|76805|148197}
    c11c:[w]     1772  0.09s  0.39% 119808 cyc {99039|104332|171020}
    c120:[w]     1882  0.06s  0.25% 71475 cyc {60240|62062|119952}
    c124:[w]     1796  0.06s  0.25% 75622 cyc {60432|62251|127483}
    c128:[w]     1857  0.05s  0.24% 70895 cyc {60131|62069|120014}
    c12c:[w]     1772  0.06s  0.25% 75870 cyc {60390|62175|128139}
    c137:[r]    10994  0.37s  1.66% 81736 cyc {60096|82795|129993}
    c138:[w]     5337  0.21s  0.93% 94378 cyc {78486|85420|144662}
    c13c:[w]     3997  0.18s  0.82% 110689 cyc {71476|84898|183921}
    c13e:[r]     7231  0.24s  1.06% 79094 cyc {59993|71094|131234}
    c13e:[w]     5232  0.16s  0.72% 74790 cyc {60065|62061|129036}
    c160:[r]    15042  0.52s  2.31% 83391 cyc {57712|71889|137839}
    c200:[w]       44  0.00s  0.01% 116310 cyc {60278|110317|190072}
    c202:[r]       16  0.00s  0.00% 89424 cyc {59102|84417|137204}
    c202:[w]       26  0.00s  0.00% 74224 cyc {58105|60662|196761}
    c204:[w]       18  0.00s  0.00% 55942 cyc {50890|58831|60805}
-- v3 --
 Runstates:
   running:   64520  1.83s  68067 { 15206|534632|49787047}
  runnable:   61653  0.31s  11930 { 10999| 11290| 32895}
        wake:   61653  0.31s  11930 { 10999| 11290| 32895}
   blocked:    2904 14.76s 12201014 {2513945|56292497|56323464}
   offline:   58751  1.99s  81185 { 52293|134530|254712}
      lost:     740  3.69s 11963184 {1047897111|3716590208|3716590208}
 cpu affinity:       1 54197341125 {54197341125|54197341125|54197341125}
   [4]:       1 54197341125 {54197341125|54197341125|54197341125}
Exit reasons:
 VMEXIT_CR0_READ           800  0.00s  0.00%  1995 cyc { 1340| 1921| 3280}
 VMEXIT_CR4_READ             4  0.00s  0.00%  2157 cyc { 1650| 2366| 2942}
 VMEXIT_CR0_WRITE           74  0.00s  0.00%  2735 cyc { 1569| 2310| 4853}
 VMEXIT_CR4_WRITE            8  0.00s  0.00%  4217 cyc { 1785| 3357| 7154}
 VMEXIT_EXCEPTION_NM         2  0.00s  0.00%  2544 cyc { 2326| 2763| 2763}
 VMEXIT_INTR               154  0.00s  0.00%  4999 cyc { 1440| 5044| 8430}
            [  0]: 154
 VMEXIT_VINTR              429  0.00s  0.00%  1128 cyc {  890| 1011| 1855}
 VMEXIT_PAUSE             2137  0.01s  0.03%  6589 cyc { 4883| 6482| 8196}
 VMEXIT_HLT               2906 14.77s 65.41% 12200943 cyc {234468|4555513|56262603}
 VMEXIT_IOIO             64630  2.22s  9.84% 82512 cyc { 3517|77709|150715}
 VMEXIT_NPF             190070  0.60s  2.67%  7622 cyc { 5794| 6978|10893}
Guest interrupt counts:
  [ 61] 42
   * intr to halt      :       8  0.05s 14639021 {48008614|49044340|49044340}
  [ 65] 380
   * intr to halt      :     375  0.25s 1614727 {317981|47929604|58149479}
  [ 98] 10
   * wake to halt alone:       7  0.00s 1503945 {1501025|1521120|1577479}
   * wake to halt any  :       9  0.01s 2126554 {1567624|5291455|5291455}
   * intr to halt      :       1  0.00s 5822988 {5822988|5822988|5822988}
  [131] 1917
   * wake to halt alone:    1666  3.27s 4714789 {3789290|5667945|10997344}
   * wake to halt any  :    1866  3.79s 4870762 {2769596|8494622|9143375}
   * intr to halt      :       9  0.04s 9569321 {5309606|19159953|19159953}
  [163] 70
   * wake to halt alone:      55  0.01s 634569 {405315|1249597|1317516}
   * wake to halt any  :      57  0.02s 636408 {392008|1249597|1317516}
   * intr to halt      :      13  0.03s 6014458 {4829739|42429900|42429900}
  [209] 812
   * wake to halt alone:     608  0.03s 130851 {104321|190422|183329}
   * wake to halt any  :     611  0.04s 141949 {118978|165200|5410073}
   * intr to halt      :     198  0.28s 3359148 {531977|3187904|47889409}
  [225] 427
   * wake to halt alone:      47  0.00s 124933 {110958|125481|215481}
   * wake to halt any  :     358  0.22s 1451895 {477463|26927080|49779665}
   * intr to halt      :      61  0.11s 4456535 {3497249|20576019|20576019}
IO address summary:
      60:[r]       70  0.00s  0.02% 153245 cyc {104387|132839|255665}
      64:[r]      140  0.01s  0.03% 108093 cyc {58574|90573|210228}
     1f1:[r]       10  0.00s  0.00% 88163 cyc {56964|60881|195997}
     1f2:[r]       10  0.00s  0.00% 77903 cyc {58828|61437|142746}
     1f2:[w]        7  0.00s  0.00% 120136 cyc {60883|134824|183082}
     1f3:[r]       10  0.00s  0.00% 86617 cyc {59143|60198|200383}
     1f3:[w]        7  0.00s  0.00% 149825 cyc {69600|135061|275779}
     1f4:[r]       10  0.00s  0.00% 76166 cyc {58704|60322|153360}
     1f4:[w]        7  0.00s  0.00% 135667 cyc {61161|135197|219237}
     1f5:[r]       10  0.00s  0.00% 74153 cyc {58337|59343|135541}
     1f5:[w]        7  0.00s  0.00% 135469 cyc {61457|136123|197743}
     1f6:[r]       10  0.00s  0.00% 87503 cyc {59042|59938|178135}
     1f6:[w]       14  0.00s  0.00% 109208 cyc {59237|95281|204147}
     1f7:[r]       37  0.00s  0.01% 113425 cyc {67606|81163|231901}
     1f7:[w]        7  0.00s  0.00% 183934 cyc {63874|179526|306405}
    b008:[r]     5916  0.01s  0.04%  3650 cyc { 2710| 3384| 5710}
    c110:[r]     1777  0.06s  0.27% 82280 cyc {68605|73492|133740}
    c110:[w]     1777  0.10s  0.43% 131607 cyc {100726|129987|197400}
    c114:[r]     1792  0.07s  0.30% 90987 cyc {69030|76954|149001}
    c114:[w]     1792  0.09s  0.39% 119260 cyc {99473|104167|174218}
    c118:[r]     1801  0.06s  0.28% 84164 cyc {68916|73597|133284}
    c118:[w]     1801  0.10s  0.43% 129525 cyc {100677|129017|191171}
    c11c:[r]     1812  0.07s  0.30% 90093 cyc {69069|76960|148080}
    c11c:[w]     1812  0.09s  0.40% 118780 cyc {99526|103983|171325}
    c120:[w]     1777  0.05s  0.24% 71739 cyc {59778|61974|121014}
    c124:[w]     1792  0.06s  0.24% 73708 cyc {59848|62132|124206}
    c128:[w]     1801  0.05s  0.23% 70720 cyc {59744|61991|120292}
    c12c:[w]     1812  0.06s  0.24% 73178 cyc {59982|62114|122598}
    c137:[r]     4919  0.16s  0.70% 77636 cyc {59694|63645|123683}
    c138:[w]     1720  0.07s  0.30% 95856 cyc {80665|86772|133894}
    c13c:[w]     3782  0.14s  0.61% 87725 cyc {69082|82648|142145}
    c13e:[r]     6141  0.22s  0.99% 87742 cyc {59978|79221|137441}
    c13e:[w]     4251  0.14s  0.62% 78558 cyc {60398|62611|139191}
    c160:[r]    15931  0.61s  2.71% 92286 cyc {57115|81956|156159}
    c200:[w]       24  0.00s  0.01% 151156 cyc {60165|142811|401906}
    c202:[r]       20  0.00s  0.00% 99093 cyc {61601|88272|207015}
    c202:[w]       17  0.00s  0.00% 108122 cyc {56371|62224|201853}
    c204:[w]        7  0.00s  0.00% 136409 cyc {62065|136245|228658}
Emulate eip list
|-- Domain 32767 --|
 Runstates:
  full run:   21513  9.22s 1028857 {23028429|46915838|152209553}
  concurrency_hazard:  514745 13.00s  60628 { 74497|113845|796716}
  full_contention:     606  0.09s 366269 { 31174|15500234|28413954}
      lost:       1  0.27s 643844470 {643844470|643844470|643844470}
 Grant table ops:
  Done by:
  Done for:
 Populate-on-demand:
  Populated:
  Reclaim order:
  Reclaim contexts:
-- v0 --
 Runstates:
   running:   42641  8.31s 467989 {114207|186229803|194929010}
  runnable:   42640 10.62s 597632 {132708|3507461|3249602}
     preempt:   42640 10.62s 597632 {132708|3507461|3249602}
      lost:      15  3.65s 584623789 {1200100067|3712133212|3712133212}
 cpu affinity:       1 54208560737 {54208560737|54208560737|54208560737}
   [0]:       1 54208560737 {54208560737|54208560737|54208560737}
-- v1 --
 Runstates:
   running:    3974 21.05s 12713160 { 69803|28184038|674513975}
  runnable:    3972  1.26s 762906 {189098|1281204|75238760}
     preempt:    3972  1.26s 762906 {189098|1281204|75238760}
      lost:       1  0.27s 644303363 {644303363|644303363|644303363}
 cpu affinity:       1 54197540646 {54197540646|54197540646|54197540646}
   [1]:       1 54197540646 {54197540646|54197540646|54197540646}
-- v2 --
 Runstates:
   running:   73202 11.85s 388544 { 74298|56286239|56334293}
  runnable:   73202  7.04s 230693 {130693|2526316|3714668}
     preempt:   73202  7.04s 230693 {130693|2526316|3714668}
      lost:      16  3.69s 553245946 {670252581|3712203794|3712203794}
 cpu affinity:       1 54197208732 {54197208732|54197208732|54197208732}
   [2]:       1 54197208732 {54197208732|54197208732|54197208732}
-- v4 --
 Runstates:
   running:   62384 17.20s 661592 {103412|31399211|56361922}
  runnable:   62383  1.86s  71430 { 15354|547218|49788439}
     preempt:   62383  1.86s  71430 { 15354|547218|49788439}
      lost:       8  3.52s 1056573758 {1047893947|3716586607|3716586607}
 cpu affinity:       1 54197172656 {54197172656|54197172656|54197172656}
   [4]:       1 54197172656 {54197172656|54197172656|54197172656}
-- v5 --
 Runstates:
   running:   35978 15.33s 1022770 {103606|48879079|56311319}
  runnable:   35978  1.54s 102758 { 14499|1644554|3881540}
     preempt:   35978  1.54s 102758 { 14499|1644554|3881540}
      lost:       5  3.24s 1553600132 {1784246130|3744977108|3744977108}
 cpu affinity:       1 54197172836 {54197172836|54197172836|54197172836}
   [5]:       1 54197172836 {54197172836|54197172836|54197172836}
-- v6 --
 Runstates:
   running:     126  2.19s 41646453 {55542297|56354174|56339083}
  runnable:     126  0.02s 335835 {145335|5448876|5448876}
     preempt:     126  0.02s 335835 {145335|5448876|5448876}
      lost:       2  1.02s 1227237133 {1784246130|1784246130|1784246130}
 cpu affinity:       1 7759192442 {7759192442|7759192442|7759192442}
   [6]:       1 7759192442 {7759192442|7759192442|7759192442}
-- v7 --
 Runstates:
   running:   50120 16.80s 804406 {153440|56286851|56268674}
  runnable:   50120  2.15s 103064 { 15941|2128496|2789109}
     preempt:   50120  2.15s 103064 { 15941|2128496|2789109}
      lost:      11  3.62s 790826046 {1200100067|3744848624|3744848624}
 cpu affinity:       1 54196452955 {54196452955|54196452955|54196452955}
   [7]:       1 54196452955 {54196452955|54196452955|54196452955}
Emulate eip list
Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
..... .LOTS OF THESE
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
hvm_generic_postprocess_init: Strange, h->postprocess set!
read_record: read returned zero, deactivating pcpu 7
update_cycles: cycles 0! Not updating...
deactivate_pcpu: setting d32767v7 to state LOST
deactivate_pcpu: Setting max_active_pcpu to 6
read_record: read returned zero, deactivating pcpu 6
update_cycles: cycles 0! Not updating...
deactivate_pcpu: setting d32767v6 to state LOST
deactivate_pcpu: Setting max_active_pcpu to 4
read_record: read returned zero, deactivating pcpu 2
update_cycles: cycles 0! Not updating...
deactivate_pcpu: setting d32767v2 to state LOST
read_record: read returned zero, deactivating pcpu 4
update_cycles: cycles 0! Not updating...
deactivate_pcpu: setting d32767v4 to state LOST
deactivate_pcpu: Setting max_active_pcpu to 1
read_record: read returned zero, deactivating pcpu 0
deactivate_pcpu: setting d0v0 to state LOST
read_record: read returned zero, deactivating pcpu 1
deactivate_pcpu: setting d0v1 to state LOST
deactivate_pcpu: Setting max_active_pcpu to -1

[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-04 15:23           ` Tim Deegan
@ 2013-04-04 17:05             ` Tim Deegan
  2013-04-29 13:21               ` Peter Maloney
  2013-04-25 15:20             ` George Dunlap
  1 sibling, 1 reply; 53+ messages in thread
From: Tim Deegan @ 2013-04-04 17:05 UTC (permalink / raw)
  To: Andres Lagar-Cavilla
  Cc: George Dunlap, suravee.suthikulpanit, Jan Beulich, xen-devel

At 16:23 +0100 on 04 Apr (1365092601), Tim Deegan wrote:
> At 11:34 -0400 on 03 Apr (1364988853), Andres Lagar-Cavilla wrote:
> > On Apr 3, 2013, at 6:53 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> Yes, 4.2 is definitely slower.  A compile test on a 4-vcpu VM that takes
> about 12 minutes before this locking change takes more than 20 minutes
> on the current tip of xen-unstable (I gave up at 22 minutes and rebooted
> to test something else).

I did a bit of prodding at this, but messed up my measurements in a
bunch of different ways over the afternoon. :(  I'm going to be away
from my test boxes for a couple of weeks now, so all I can say is, if
you're investigating this bug, beware that:

 - the revision before this change still has the RTC bugs that were
   fixed last week, so don't measure performance based on guest
   wallclock time, or your 'before' perf will look too good.
 - the current unstable tip has test code to exercise the new
   map_domain_page(), which will badly affect all the many memory
   accesses done in HVM emulation, so make sure you use debug=n builds
   for measurement.

Also, if there is still a bad slowdown, caused by the p2m lookups, this
might help a little bit:

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 38e87ce..7bd8646 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1361,6 +1361,18 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
         }
     }
 
+
+    /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
+     * a fast path for LAPIC accesses, skipping the p2m lookup. */
+    if ( !nestedhvm_vcpu_in_guestmode(v)
+         && gfn == vlapic_base_address(vcpu_vlapic(current)) >> PAGE_SHIFT )
+    {
+        if ( !handle_mmio() )
+            hvm_inject_hw_exception(TRAP_gp_fault, 0);
+        rc = 1;
+        goto out;
+    }
+
     p2m = p2m_get_hostp2m(v->domain);
     mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 
                               P2M_ALLOC | (access_w ? P2M_UNSHARE : 0), NULL);


but in fact, the handle_mmio() will have to do GVA->GFN lookups for its
%RIP and all its operands, and each of those will involve multiple
GFN->MFN lookups for the pagetable entries, so if the GFN->MFN lookup
has got slower, eliminating just the one at the start may not be all
that great.

Cheers,

Tim.

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-03 10:51         ` George Dunlap
  2013-04-04 15:29           ` Suravee Suthikulanit
@ 2013-04-04 17:14           ` Suravee Suthikulanit
  2013-04-05 13:43             ` George Dunlap
  1 sibling, 1 reply; 53+ messages in thread
From: Suravee Suthikulanit @ 2013-04-04 17:14 UTC (permalink / raw)
  To: George Dunlap
  Cc: Tim (Xen.org), Andres Lagar-Cavilla, Jan Beulich,
	xen-devel@lists.xen.org

[-- Attachment #1: Type: text/plain, Size: 4557 bytes --]

On 4/3/2013 5:51 AM, George Dunlap wrote:
> On 03/04/13 00:48, Suravee Suthikulanit wrote:
>> On 4/2/2013 12:06 PM, Suravee Suthikulpanit wrote:
>>> On 4/2/2013 11:34 AM, Tim Deegan wrote:
>>>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com>
>>>>>>>> wrote:
>>>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>>>>     owner: ?
>>>>>>     Reference: http://marc.info/?l=xen-devel&m=135075376805215
>>>>> This is supposedly fixed with the RTC changes Tim committed the
>>>>> other day. Suravee, is that correct?
>>>> This is a separate problem.  IIRC the AMD XP perf issue is caused 
>>>> by the
>>>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>>>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it 
>>>> takes a
>>>> _lot_ of vmexits for IRQL reads and writes.
>>> Is there any tools or good ways to count the number of VMexit in Xen?
>>>
>> Tim/Jan,
>>
>> I have used iperf benchmark to compare network performance (bandwidth)
>> between the two versions of the hypervisor:
>> 1. good: 24769:730f6ed72d70
>> 2. bad: 24770:7f79475d3de7
>>
>> In the "bad" case, I am seeing that the network bandwidth has dropped
>> about 13-15%.
>>
>> However, when I uses the xentrace utility to trace the number of VMEXIT,
>> I actually see about 25% more number of VMEXIT in the good case.  This
>> is inconsistent with the statement that Tim mentioned above.
>
> I was going to say, what I remember from my little bit of 
> investigation back in November, was that it had all the earmarks of 
> micro-architectural "drag", which happens when the TLB or the caches 
> can't be effective.
>
> Suvaree, if you look at xenalyze, a microarchitectural "drag" looks like:
> * fewer VMEXITs, but
> * time for each vmexit takes longer
>
> If you post the results of "xenalyze --svm-mode -s" for both traces, I 
> can tell you what I see.
>
>  -George
>

Here's another version of the outputs from xenalyze with only VMEXIT.  
In this case, I pin all the VCPUs (4) and pin my application process to 
VCPU 3.

NOTE: This measurement is without the RTC bug.

BAD:
-- v3 --
  Runstates:
    running:       1  4.51s 10815429411 
{10815429411|10815429411|10815429411}
  cpu affinity:       1 10816540697 {10816540697|10816540697|10816540697}

    [7]:       1 10816540697 {10816540697|10816540697|10816540697}
Exit reasons:
  VMEXIT_CR0_READ           633  0.00s  0.00%  1503 cyc { 1092| 1299| 2647}
  VMEXIT_CR4_READ             3  0.00s  0.00%  1831 cyc { 1309| 1659| 2526}
  VMEXIT_CR0_WRITE          305  0.00s  0.00%  1660 cyc { 1158| 1461| 2507}
  VMEXIT_CR4_WRITE            6  0.00s  0.00% 19771 cyc { 1738| 5031|79600}
  VMEXIT_EXCEPTION_NM         1  0.00s  0.00%  2272 cyc { 2272| 2272| 2272}
  VMEXIT_INTR                28  0.00s  0.00%  3374 cyc { 1225| 3770| 6095}
  VMEXIT_VINTR              388  0.00s  0.00%  1023 cyc {  819|  901| 1744}
  VMEXIT_PAUSE               33  0.00s  0.00%  7476 cyc { 4881| 6298|18941}
  VMEXIT_HLT                388  3.35s 14.84% 20701800 cyc 
{169589|3848166|55770601}
  VMEXIT_IOIO              5581  0.19s  0.85% 82514 cyc { 4250|81909|146439}
  VMEXIT_NPF             108072  0.71s  3.14% 15702 cyc { 6362| 6865|37280}
Guest interrupt counts:
Emulate eip list

GOOD:
-- v3 --
  Runstates:
    running:       4 12.10s 7257234016 {18132721625|18132721625|18132721625}
       lost:      12  1.24s 248210482 {188636654|719488416|719488416}
  cpu affinity:       1 32007462122 {32007462122|32007462122|32007462122}
    [7]:       1 32007462122 {32007462122|32007462122|32007462122}
Exit reasons:
  VMEXIT_CR0_READ          4748  0.00s  0.01%  1275 cyc { 1007| 1132| 1878}
  VMEXIT_CR4_READ             6  0.00s  0.00%  1752 cyc { 1189| 1629| 2600}
  VMEXIT_CR0_WRITE         3099  0.00s  0.01%  1541 cyc { 1157| 1420| 2151}
  VMEXIT_CR4_WRITE           12  0.00s  0.00%  4105 cyc { 1885| 4380| 5515}
  VMEXIT_EXCEPTION_NM        18  0.00s  0.00%  2169 cyc { 1973| 2152| 2632}
  VMEXIT_INTR               258  0.00s  0.00%  4622 cyc { 1358| 4235| 8987}
  VMEXIT_VINTR             2552  0.00s  0.00%   971 cyc {  850|  928| 1131}
  VMEXIT_PAUSE              370  0.00s  0.00%  5758 cyc { 4381| 5688| 7933}
  VMEXIT_HLT               1505  6.14s 27.19% 9788981 cyc 
{268573|3768704|56331182}
  VMEXIT_IOIO             53835  1.97s  8.74% 87959 cyc { 4996|82423|144207}
  VMEXIT_NPF             855101  2.06s  9.13%  5787 cyc { 4903| 5328| 8572}
Guest interrupt counts:
Emulate eip list

Suravee

[-- Attachment #2: xp32.vmexit.xenalyze.bad --]
[-- Type: text/plain, Size: 4998 bytes --]

Total time: 22.55 seconds (using cpu speed 2.40 GHz)
--- Log volume summary ---
 - cpu 4 -
 gen   :        632
 hvm   :    9327532
 +-vmentry:    3497832
 +-vmexit :    5829700
 - cpu 5 -
 gen   :        724
 hvm   :   11119980
 +-vmentry:    4170000
 +-vmexit :    6949980
 - cpu 6 -
 gen   :        616
 hvm   :    8962956
 +-vmentry:    3361116
 +-vmexit :    5601840
 - cpu 7 -
 gen   :       3008
 hvm   :   48541004
 +-vmentry:   18202884
 +-vmexit :   30338120
|-- Domain 1 --|
 Runstates:
  full run:       7  4.51s 1545220099 {10780557145|10780557145|10780557145}
 Grant table ops:
  Done by:
  Done for:
 Populate-on-demand:
  Populated:
  Reclaim order:
  Reclaim contexts:
-- v0 --
 Runstates:
   running:       1  4.51s 10815791389 {10815791389|10815791389|10815791389}
 cpu affinity:       1 10815791389 {10815791389|10815791389|10815791389}
   [4]:       1 10815791389 {10815791389|10815791389|10815791389}
Exit reasons:
 VMEXIT_CR0_READ           183  0.00s  0.00%  1812 cyc { 1122| 1632| 3498}
 VMEXIT_CR4_READ             2  0.00s  0.00%  2062 cyc { 1926| 2199| 2199}
 VMEXIT_CR0_WRITE           78  0.00s  0.00%  1996 cyc { 1149| 1658| 4217}
 VMEXIT_CR4_WRITE            4  0.00s  0.00% 56884 cyc {15001|68359|90794}
 VMEXIT_INTR                 7  0.00s  0.00%  5646 cyc { 2429| 6054| 7887}
 VMEXIT_VINTR               76  0.00s  0.00%  1509 cyc {  878| 1060| 3594}
 VMEXIT_PAUSE              157  0.00s  0.00%  7879 cyc { 5241| 6691|15013}
 VMEXIT_HLT                370  3.94s 17.49% 25583983 cyc {829018|16636281|55835489}
 VMEXIT_IOIO              5578  0.18s  0.82% 79293 cyc { 2231|88278|152838}
 VMEXIT_NPF              19651  0.30s  1.33% 36744 cyc { 6849|25715|127962}
Guest interrupt counts:
-- v1 --
 Runstates:
   running:       1  4.49s 10785203391 {10785203391|10785203391|10785203391}
 cpu affinity:       1 10786466878 {10786466878|10786466878|10786466878}
   [5]:       1 10786466878 {10786466878|10786466878|10786466878}
Exit reasons:
 VMEXIT_CR0_READ           340  0.00s  0.00%  1415 cyc { 1071| 1272| 2171}
 VMEXIT_CR4_READ             2  0.00s  0.00%  1746 cyc { 1724| 1769| 1769}
 VMEXIT_CR0_WRITE           10  0.00s  0.00%  2283 cyc { 1134| 1912| 4267}
 VMEXIT_CR4_WRITE            4  0.00s  0.00% 29555 cyc { 4872|34069|47533}
 VMEXIT_EXCEPTION_NM         1  0.00s  0.00%  2021 cyc { 2021| 2021| 2021}
 VMEXIT_INTR                13  0.00s  0.00%  5124 cyc { 1666| 5530| 6878}
 VMEXIT_VINTR              165  0.00s  0.00%   986 cyc {  832|  883| 1302}
 VMEXIT_PAUSE              113  0.00s  0.00%  7287 cyc { 4375| 5933|13541}
 VMEXIT_HLT                396  3.89s 17.26% 23592797 cyc {176514|12738957|55766839}
 VMEXIT_IOIO              4127  0.15s  0.68% 88700 cyc { 3721|87403|151659}
 VMEXIT_NPF              25689  0.34s  1.52% 32124 cyc { 6473|16878|115309}
Guest interrupt counts:
-- v2 --
 Runstates:
   running:       1  4.49s 10781631107 {10781631107|10781631107|10781631107}
 cpu affinity:       1 10781820632 {10781820632|10781820632|10781820632}
   [6]:       1 10781820632 {10781820632|10781820632|10781820632}
Exit reasons:
 VMEXIT_CR0_READ           249  0.00s  0.00%  1755 cyc { 1120| 1584| 3082}
 VMEXIT_CR4_READ             2  0.00s  0.00%  2033 cyc { 1499| 2567| 2567}
 VMEXIT_CR0_WRITE            2  0.00s  0.00%  3977 cyc { 3327| 4628| 4628}
 VMEXIT_CR4_WRITE            4  0.00s  0.00% 72590 cyc {39319|88002|91428}
 VMEXIT_INTR                15  0.00s  0.00%  4280 cyc { 1394| 4727| 7035}
 VMEXIT_VINTR              135  0.00s  0.00%  1277 cyc {  851|  974| 2976}
 VMEXIT_PAUSE               80  0.00s  0.00%  8354 cyc { 5487| 8652|13277}
 VMEXIT_HLT                364  3.93s 17.43% 25917764 cyc {113516|17552184|55783615}
 VMEXIT_IOIO              3873  0.14s  0.63% 88434 cyc { 4299|88250|148914}
 VMEXIT_NPF              29959  0.32s  1.40% 25322 cyc { 6529|14372|104649}
Guest interrupt counts:
-- v3 --
 Runstates:
   running:       1  4.51s 10815429411 {10815429411|10815429411|10815429411}
 cpu affinity:       1 10816540697 {10816540697|10816540697|10816540697}
   [7]:       1 10816540697 {10816540697|10816540697|10816540697}
Exit reasons:
 VMEXIT_CR0_READ           633  0.00s  0.00%  1503 cyc { 1092| 1299| 2647}
 VMEXIT_CR4_READ             3  0.00s  0.00%  1831 cyc { 1309| 1659| 2526}
 VMEXIT_CR0_WRITE          305  0.00s  0.00%  1660 cyc { 1158| 1461| 2507}
 VMEXIT_CR4_WRITE            6  0.00s  0.00% 19771 cyc { 1738| 5031|79600}
 VMEXIT_EXCEPTION_NM         1  0.00s  0.00%  2272 cyc { 2272| 2272| 2272}
 VMEXIT_INTR                28  0.00s  0.00%  3374 cyc { 1225| 3770| 6095}
 VMEXIT_VINTR              388  0.00s  0.00%  1023 cyc {  819|  901| 1744}
 VMEXIT_PAUSE               33  0.00s  0.00%  7476 cyc { 4881| 6298|18941}
 VMEXIT_HLT                388  3.35s 14.84% 20701800 cyc {169589|3848166|55770601}
 VMEXIT_IOIO              5581  0.19s  0.85% 82514 cyc { 4250|81909|146439}
 VMEXIT_NPF             108072  0.71s  3.14% 15702 cyc { 6362| 6865|37280}
Guest interrupt counts:
Emulate eip list

[-- Attachment #3: xp32.vmexit.xenalyze.good --]
[-- Type: text/plain, Size: 5260 bytes --]

Total time: 22.58 seconds (using cpu speed 2.40 GHz)
--- Log volume summary ---
 - cpu 4 -
 gen   :       1424
 hvm   :   19260932
 +-vmentry:    7222872
 +-vmexit :   12038060
 - cpu 5 -
 gen   :        980
 hvm   :   12922980
 +-vmentry:    4846140
 +-vmexit :    8076840
 - cpu 6 -
 gen   :        792
 hvm   :    9098224
 +-vmentry:    3411864
 +-vmexit :    5686360
 - cpu 7 -
 gen   :       3836
 hvm   :   59030776
 +-vmentry:   22136556
 +-vmexit :   36894220
|-- Domain 1 --|
 Runstates:
  full run:      26 12.51s 1154609114 {18123228220|18123228220|18123228220}
      lost:       6  0.83s 331270856 {401481966|714242403|714242403}
 Grant table ops:
  Done by:
  Done for:
 Populate-on-demand:
  Populated:
  Reclaim order:
  Reclaim contexts:
-- v0 --
 Runstates:
   running:       4 12.31s 7386445159 {18311864874|18311864874|18311864874}
      lost:       6  0.64s 256384507 {160195345|728981821|728981821}
 cpu affinity:       1 31084149226 {31084149226|31084149226|31084149226}
   [4]:       1 31084149226 {31084149226|31084149226|31084149226}
Exit reasons:
 VMEXIT_CR0_READ           148  0.00s  0.00%  1675 cyc { 1093| 1600| 2980}
 VMEXIT_CR4_READ            11  0.00s  0.00%  1832 cyc { 1064| 1639| 2681}
 VMEXIT_CR0_WRITE           22  0.00s  0.00%  2615 cyc { 1134| 1796| 4476}
 VMEXIT_CR4_WRITE           22  0.00s  0.00%  3497 cyc { 1280| 3298| 7146}
 VMEXIT_INTR               185  0.00s  0.00%  4650 cyc { 1381| 4134| 8633}
 VMEXIT_VINTR              205  0.00s  0.00%  1398 cyc {  970| 1136| 2320}
 VMEXIT_PAUSE             1323  0.00s  0.02%  6533 cyc { 4789| 6238| 9305}
 VMEXIT_HLT               1366  8.57s 37.96% 15057220 cyc {1112579|7520597|56280627}
 VMEXIT_IOIO             46347  1.61s  7.13% 83338 cyc { 2676|83036|145300}
 VMEXIT_NPF             390000  1.05s  4.65%  6453 cyc { 5005| 5748| 9907}
Guest interrupt counts:
-- v1 --
 Runstates:
   running:       3 12.30s 9840745963 {18408860623|18408860623|18408860623}
      lost:       5  0.65s 310382375 {436938050|714242403|714242403}
 cpu affinity:       1 31074181912 {31074181912|31074181912|31074181912}
   [5]:       1 31074181912 {31074181912|31074181912|31074181912}
Exit reasons:
 VMEXIT_CR0_READ           385  0.00s  0.00%  1492 cyc { 1018| 1415| 2223}
 VMEXIT_CR4_READ             6  0.00s  0.00%  1745 cyc { 1234| 1717| 2514}
 VMEXIT_CR0_WRITE           34  0.00s  0.00%  2360 cyc { 1259| 2306| 5210}
 VMEXIT_CR4_WRITE           12  0.00s  0.00%  5338 cyc { 1703| 5089| 9430}
 VMEXIT_INTR               108  0.00s  0.00%  6004 cyc { 2082| 5121|12020}
 VMEXIT_VINTR              197  0.00s  0.00%  1138 cyc {  861|  994| 2170}
 VMEXIT_PAUSE              833  0.00s  0.01%  6431 cyc { 4328| 5904| 9108}
 VMEXIT_HLT               1158  9.86s 43.67% 20434762 cyc {358508|13214097|56352969}
 VMEXIT_IOIO             36732  1.35s  5.96% 87903 cyc { 4019|85234|146724}
 VMEXIT_NPF             130435  0.39s  1.72%  7144 cyc { 5106| 6142|11174}
Guest interrupt counts:
-- v2 --
 Runstates:
   running:       4 12.08s 7249658168 {18566248465|18566248465|18566248465}
      lost:       4  0.86s 517210853 {717049906|858313777|858313777}
 cpu affinity:       1 31067476087 {31067476087|31067476087|31067476087}
   [6]:       1 31067476087 {31067476087|31067476087|31067476087}
Exit reasons:
 VMEXIT_CR0_READ           689  0.00s  0.00%  1503 cyc { 1024| 1410| 2428}
 VMEXIT_CR4_READ             6  0.00s  0.00%  1504 cyc { 1346| 1488| 1757}
 VMEXIT_CR0_WRITE           19  0.00s  0.00%  2624 cyc { 1234| 1923| 4311}
 VMEXIT_CR4_WRITE           12  0.00s  0.00%  4285 cyc { 1975| 4454| 6885}
 VMEXIT_INTR                56  0.00s  0.00%  5723 cyc { 1282| 5202| 9900}
 VMEXIT_VINTR              358  0.00s  0.00%  1181 cyc {  905| 1010| 2202}
 VMEXIT_PAUSE              416  0.00s  0.01%  7554 cyc { 4645| 8157| 9454}
 VMEXIT_HLT               1087 10.47s 46.37% 23113320 cyc {172100|16138284|56338801}
 VMEXIT_IOIO             21645  0.78s  3.47% 86779 cyc { 3764|85054|146473}
 VMEXIT_NPF             111974  0.36s  1.57%  7609 cyc { 5131| 7397|11731}
Guest interrupt counts:
-- v3 --
 Runstates:
   running:       4 12.10s 7257234016 {18132721625|18132721625|18132721625}
      lost:      12  1.24s 248210482 {188636654|719488416|719488416}
 cpu affinity:       1 32007462122 {32007462122|32007462122|32007462122}
   [7]:       1 32007462122 {32007462122|32007462122|32007462122}
Exit reasons:
 VMEXIT_CR0_READ          4748  0.00s  0.01%  1275 cyc { 1007| 1132| 1878}
 VMEXIT_CR4_READ             6  0.00s  0.00%  1752 cyc { 1189| 1629| 2600}
 VMEXIT_CR0_WRITE         3099  0.00s  0.01%  1541 cyc { 1157| 1420| 2151}
 VMEXIT_CR4_WRITE           12  0.00s  0.00%  4105 cyc { 1885| 4380| 5515}
 VMEXIT_EXCEPTION_NM        18  0.00s  0.00%  2169 cyc { 1973| 2152| 2632}
 VMEXIT_INTR               258  0.00s  0.00%  4622 cyc { 1358| 4235| 8987}
 VMEXIT_VINTR             2552  0.00s  0.00%   971 cyc {  850|  928| 1131}
 VMEXIT_PAUSE              370  0.00s  0.00%  5758 cyc { 4381| 5688| 7933}
 VMEXIT_HLT               1505  6.14s 27.19% 9788981 cyc {268573|3768704|56331182}
 VMEXIT_IOIO             53835  1.97s  8.74% 87959 cyc { 4996|82423|144207}
 VMEXIT_NPF             855101  2.06s  9.13%  5787 cyc { 4903| 5328| 8572}
Guest interrupt counts:
Emulate eip list

[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-04 17:14           ` Suravee Suthikulanit
@ 2013-04-05 13:43             ` George Dunlap
  0 siblings, 0 replies; 53+ messages in thread
From: George Dunlap @ 2013-04-05 13:43 UTC (permalink / raw)
  To: Suravee Suthikulanit
  Cc: Tim (Xen.org), Andres Lagar-Cavilla, Jan Beulich,
	xen-devel@lists.xen.org

On 04/04/13 18:14, Suravee Suthikulanit wrote:
> On 4/3/2013 5:51 AM, George Dunlap wrote:
>> On 03/04/13 00:48, Suravee Suthikulanit wrote:
>>> On 4/2/2013 12:06 PM, Suravee Suthikulpanit wrote:
>>>> On 4/2/2013 11:34 AM, Tim Deegan wrote:
>>>>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com>
>>>>>>>>> wrote:
>>>>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>>>>>      owner: ?
>>>>>>>      Reference: http://marc.info/?l=xen-devel&m=135075376805215
>>>>>> This is supposedly fixed with the RTC changes Tim committed the
>>>>>> other day. Suravee, is that correct?
>>>>> This is a separate problem.  IIRC the AMD XP perf issue is caused
>>>>> by the
>>>>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>>>>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it
>>>>> takes a
>>>>> _lot_ of vmexits for IRQL reads and writes.
>>>> Is there any tools or good ways to count the number of VMexit in Xen?
>>>>
>>> Tim/Jan,
>>>
>>> I have used iperf benchmark to compare network performance (bandwidth)
>>> between the two versions of the hypervisor:
>>> 1. good: 24769:730f6ed72d70
>>> 2. bad: 24770:7f79475d3de7
>>>
>>> In the "bad" case, I am seeing that the network bandwidth has dropped
>>> about 13-15%.
>>>
>>> However, when I uses the xentrace utility to trace the number of VMEXIT,
>>> I actually see about 25% more number of VMEXIT in the good case.  This
>>> is inconsistent with the statement that Tim mentioned above.
>> I was going to say, what I remember from my little bit of
>> investigation back in November, was that it had all the earmarks of
>> micro-architectural "drag", which happens when the TLB or the caches
>> can't be effective.
>>
>> Suvaree, if you look at xenalyze, a microarchitectural "drag" looks like:
>> * fewer VMEXITs, but
>> * time for each vmexit takes longer
>>
>> If you post the results of "xenalyze --svm-mode -s" for both traces, I
>> can tell you what I see.
>>
>>   -George
>>
> Here's another version of the outputs from xenalyze with only VMEXIT.
> In this case, I pin all the VCPUs (4) and pin my application process to
> VCPU 3.
>
> NOTE: This measurement is without the RTC bug.
>
> BAD:
> -- v3 --
>    VMEXIT_CR0_WRITE          305  0.00s  0.00%  1660 cyc { 1158| 1461| 2507}
>    VMEXIT_CR4_WRITE            6  0.00s  0.00% 19771 cyc { 1738| 5031|79600}
[snip]
>    VMEXIT_IOIO              5581  0.19s  0.85% 82514 cyc { 4250|81909|146439}
>    VMEXIT_NPF             108072  0.71s  3.14% 15702 cyc { 6362| 6865|37280}

> GOOD:
> -- v3 --
>    VMEXIT_CR0_WRITE         3099  0.00s  0.01%  1541 cyc { 1157| 1420| 2151}
>    VMEXIT_CR4_WRITE           12  0.00s  0.00%  4105 cyc { 1885| 4380| 5515}
[snip]
>    VMEXIT_IOIO             53835  1.97s  8.74% 87959 cyc { 4996|82423|144207}
>    VMEXIT_NPF             855101  2.06s  9.13%  5787 cyc { 4903| 5328| 8572}
[snip]

So in the good run, we have 855k NPF exits, each of which takes about 
5.7k cycles.  In the bad run, we have only 108k NPF exits, each of which 
takes an average of 15k cycles.  (Although the 50th percentile is still 
only 6.8k cycles -- so most are about the same, but a few take a lot 
longer.)

It's a bit strange -- the reduced number of NPF exits is consistent with 
the idea of some micro-architectural thing slowing down the processing 
of the guest.  However, in my experience usually this also has an effect 
on other processing as well -- i.e., the time to process an IOIO would 
also go up, because dom0 would be slowed down as well; and time to 
process any random VMEXIT (say, the CR0 writes) would also go up.

But maybe it only has an effect inside the guest, because of the tagged 
TLBs or something?

Suravee, could you run this one again, but:
* Trace everything, not just vmexits
* Send me the trace files somehow (FTP or Dropbox), and/or add 
"--with-interrupt-eip-enumeration=249 --with-mmio-enumeration" when you 
run the summary?

That will give us an idea where the guest is spending its time 
statistically, and what kinds of MMIO it is doing, which may give us a 
clearer picture of what's going on.

Thanks,
  -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-02 14:07 Xen 4.3 development update George Dunlap
  2013-04-02 15:42 ` Jan Beulich
@ 2013-04-09  2:03 ` Dario Faggioli
  2013-04-10 12:12 ` Ian Campbell
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 53+ messages in thread
From: Dario Faggioli @ 2013-04-09  2:03 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 1348 bytes --]

On mar, 2013-04-02 at 15:07 +0100, George Dunlap wrote:
> == Not yet complete ==
> 
> [..]
>
> * NUMA scheduler affinity
>   critical
>   owner: dario@citrix
>   status: Patches posted
>   prognosis: Excellent
> 
I have all the Ack-s I need on last version I posted... Will take care
of the minor issues that people pointed out there and repost the,
hopefully, final version of this ASAP.

> * NUMA Memory migration
>   owner: dario@citrix
>   status: in progress
>   prognosis: Fair
> 
Well, I'm afraid I have to back up from this. I already told I wasn't
sure about making it during one of the latest "development update", and
I'm afraid I have to confirm that.

I'm sending something out right now, but it's in RFC status, so not
really suitable for being considered for 4.3. Unfortunately, I got
distracted by many other things while working on this, and it also was
more challenging than I originally thought... I of course will continue
working on it (despite the release), but I guess we should queue it for
4.4. :-/

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-02 14:07 Xen 4.3 development update George Dunlap
  2013-04-02 15:42 ` Jan Beulich
  2013-04-09  2:03 ` Xen 4.3 development update Dario Faggioli
@ 2013-04-10 12:12 ` Ian Campbell
  2013-04-10 12:15 ` Ian Campbell
  2013-04-10 16:41 ` Konrad Rzeszutek Wilk
  4 siblings, 0 replies; 53+ messages in thread
From: Ian Campbell @ 2013-04-10 12:12 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel@lists.xen.org

On Tue, 2013-04-02 at 15:07 +0100, George Dunlap wrote:
> 
> * Install into /usr/local by default
>   owner: Ian Campbell 

This needs someone to review and ack/nack
<1360081193-17948-1-git-send-email-ian.campbell@citrix.com>

The same person could also look at 
<1360081193-17948-2-git-send-email-ian.campbell@citrix.com> but that in
theory was acked the first time before it was reverted (it's not sure if
Acks should stand in that case!)

Ian.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-02 14:07 Xen 4.3 development update George Dunlap
                   ` (2 preceding siblings ...)
  2013-04-10 12:12 ` Ian Campbell
@ 2013-04-10 12:15 ` Ian Campbell
  2013-04-10 16:41 ` Konrad Rzeszutek Wilk
  4 siblings, 0 replies; 53+ messages in thread
From: Ian Campbell @ 2013-04-10 12:15 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel@lists.xen.org

On Tue, 2013-04-02 at 15:07 +0100, George Dunlap wrote:
> 
> * ARM v7 server port
>   owner: ijc@citrix
>   prognosis: Excellent
>   status: SMP support missing.

Patches have been posted for both host and guest SMP.

> * ARM v8 server port (tech preview)
>   owner: ijc@citrix
>   status: ?
>   prognosis: Tech preview only

Not sure who well the statuses map to tech preview stuff but "good" I
think.

One thing which I think is going to miss for both v7 and v8 is migration
support.

My big remaining concern is declaring the hypercall ABI stable/frozen.
Rally that's a must before we remove the tech preview label from v7 or
v8 IMHO. Given that we've run 32 and 64 bit guests on a 64-bit h/v
perhaps that's enough to make the call. 

Ian.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-02 14:07 Xen 4.3 development update George Dunlap
                   ` (3 preceding siblings ...)
  2013-04-10 12:15 ` Ian Campbell
@ 2013-04-10 16:41 ` Konrad Rzeszutek Wilk
  2013-04-11  9:28   ` George Dunlap
  4 siblings, 1 reply; 53+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-04-10 16:41 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel@lists.xen.org

> = Timeline =
> 
> We are planning on a 9-month release cycle.  Based on that, below are
> our estimated dates:
> * Feature freeze: 25 March 2013
> * Code freezing point: 15 April 2013

Is it possible to extend this? One of the reviewers (Ian) is just
now able to look at code and review. That means the developers have
only 3 business days to repost the changes.

> * First RC: 6 May 2013
> * Release: 17 June 2013
> 
> The RCs and release will of course depend on stability and bugs, and
> will therefore be fairly unpredictable.  Each new feature will be
> considered on a case-by-case basis; but the general rule will be as
> follows:
> 
> * Between feature freeze and code freeze, only features which have had
> a v1 posted before the feature freeze, or are on this list, will be
> considered for inclusion.
> 
> * Between the "code freezing point" and the first RC, any new code
> will need to be justified, and it will become progressively more
> difficult to get non-bugfix patches accepted.  Critera will include

I am hoping you explain what is meant by 'new code'. Say a patchset is
being posted where only one of the patches is being modified by the
reviewer (case a). Or the reviewers would like new code to be written to
handle a different case (so new code - case b).

The case b) would fall in 'new code will need to be justified'.. But if
the new code does not meet the criteria is the full patchset going to
languish to the next window? Or can parts of it that pass the reviewer's
muster be committed?

The case a) I would think would not be a problem.

> the size of the patch, the importance of the codepath, whether it's
> new functionality being added or existing functionality being changed,
> and so on.

Sorry about asking these questions - but I am being increasingly pressed
to fix upstream bugs, upstream new material for v3.10, upstream the claim
toolstack changes, and also go on vacation. Hence I am trying to figure out
what I need to focus on to meet these deadlines.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-10 16:41 ` Konrad Rzeszutek Wilk
@ 2013-04-11  9:28   ` George Dunlap
  2013-04-11  9:33     ` Ian Campbell
  0 siblings, 1 reply; 53+ messages in thread
From: George Dunlap @ 2013-04-11  9:28 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel@lists.xen.org

On 10/04/13 17:41, Konrad Rzeszutek Wilk wrote:
>> = Timeline =
>>
>> We are planning on a 9-month release cycle.  Based on that, below are
>> our estimated dates:
>> * Feature freeze: 25 March 2013
>> * Code freezing point: 15 April 2013
> Is it possible to extend this? One of the reviewers (Ian) is just
> now able to look at code and review. That means the developers have
> only 3 business days to repost the changes.

So when I said "freezing point", I meant, "we will start rejecting 
features".  Each feature will need to be considered individually.  I 
think, for example, that PVH is not going to make it -- it touches too 
much code, and is just not in good enough shape to get in as it is.  But 
AFAICT the TMEM stuff should be fine next week.

IanC knows that he's on the hot path, and so will be working double-time 
over the next few days to review / commit patches.

>> * First RC: 6 May 2013
>> * Release: 17 June 2013
>>
>> The RCs and release will of course depend on stability and bugs, and
>> will therefore be fairly unpredictable.  Each new feature will be
>> considered on a case-by-case basis; but the general rule will be as
>> follows:
>>
>> * Between feature freeze and code freeze, only features which have had
>> a v1 posted before the feature freeze, or are on this list, will be
>> considered for inclusion.
>>
>> * Between the "code freezing point" and the first RC, any new code
>> will need to be justified, and it will become progressively more
>> difficult to get non-bugfix patches accepted.  Critera will include
> I am hoping you explain what is meant by 'new code'. Say a patchset is
> being posted where only one of the patches is being modified by the
> reviewer (case a). Or the reviewers would like new code to be written to
> handle a different case (so new code - case b).
>
> The case b) would fall in 'new code will need to be justified'.. But if
> the new code does not meet the criteria is the full patchset going to
> languish to the next window? Or can parts of it that pass the reviewer's
> muster be committed?
>
> The case a) I would think would not be a problem.

I mean "code that does new things", as opposed to code that fixes bugs.  
Any code that is not already in xen.git and is not fixing a bug is "new 
code", whether it was posted yesterday or 6 months ago.

The point is that every new bit of functionality introduces the risk of 
bugs; and each additional bug at this point will risk slipping the 
release date.  So for each new bit of code, we will have to do a risk 
analysis.  Criteria will include:
* Risk of the code having bugs in itself
* Risk of the code introducing bugs in other key functionality
* Cost of bugs
* Value of the new code

The tmem toolstack stuff, for instance, if I understand correctly, is 
mostly about paths that only tmem users use.  If you're not using tmem, 
the risk of having a bug should be low; and the cost of fixing toolstack 
bugs in a point release should also be low.  So I would think that 
sometime next week would be fine.

PVH stuff, however, touches a lot of core hypervisor code in really 
quirky ways.  It has a very high risk of introducing bugs in other bits 
of functionality, which would have a very high cost.  Also, since it has 
a fairly high risk of still being immature itself, we couldn't really 
recommend that people use it in 4.3; and thus it has low value as a 
release feature at this point.  (Hope that makes sense -- as a mature 
feature, it's really important; but as a "tech preview only" feature 
it's not much overall value to customers.)  So I doubt that PVH will get in.

We had a talk yesterday about the Linux stubdomain stuff -- that's a bit 
less clear.  Changes to libxl should be in "linux-stubdom-only" paths, 
so little risk of breaking libxl functionality.  On the other hand, it 
makes a lot of changes to the build system, adding moderate risk to an 
important component; and it hasn't had wide public testing yet, so 
there's no telling how reliable it will be.  On the other other hand, 
it's a blocker for being able to switch to qemu-upstream by default, 
which was one of our key goals for 4.3; that may or may not be worth 
risking slipping the release a bit for.

Does that give you an idea of what the criteria are and how I'm going to 
be applying them?

Basically, my job at this point is to make sure that the release only slips:
1. On purpose, because we've considered the benefit worth the cost.
2. Because of completely unforseeable circumstances

If I ACK a patch, thinking that it won't slip, and then it does, and I 
might reasonably have known that that was a risk, then I have... well, 
"failed" is kind of a strong word, but yeah, I haven't done the job as 
well as I should have done. :-)

Does that make sense?

>> the size of the patch, the importance of the codepath, whether it's
>> new functionality being added or existing functionality being changed,
>> and so on.
> Sorry about asking these questions - but I am being increasingly pressed
> to fix upstream bugs, upstream new material for v3.10, upstream the claim
> toolstack changes, and also go on vacation. Hence I am trying to figure out
> what I need to focus on to meet these deadlines.

Sure.  What patches do you have outstanding, BTW?  There's the tmem 
stuff; the PVH stuff as I said seems pretty unlikely to make it (unless 
there's an amazing patch series posted in the next day or two).  Is 
there anything else you'd like my opinion on?

  -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-11  9:28   ` George Dunlap
@ 2013-04-11  9:33     ` Ian Campbell
  2013-04-11  9:43       ` George Dunlap
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Campbell @ 2013-04-11  9:33 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel@lists.xen.org, Konrad Rzeszutek Wilk

On Thu, 2013-04-11 at 10:28 +0100, George Dunlap wrote:
> On 10/04/13 17:41, Konrad Rzeszutek Wilk wrote:
> >> = Timeline =
> >>
> >> We are planning on a 9-month release cycle.  Based on that, below are
> >> our estimated dates:
> >> * Feature freeze: 25 March 2013
> >> * Code freezing point: 15 April 2013
> > Is it possible to extend this? One of the reviewers (Ian) is just
> > now able to look at code and review. That means the developers have
> > only 3 business days to repost the changes.
> 
> So when I said "freezing point", I meant, "we will start rejecting 
> features".  Each feature will need to be considered individually.  I 
> think, for example, that PVH is not going to make it -- it touches too 
> much code, and is just not in good enough shape to get in as it is.  But 
> AFAICT the TMEM stuff should be fine next week.
> 
> IanC knows that he's on the hot path, and so will be working double-time 
> over the next few days to review / commit patches.

FWIW for the tmem stuff specifically IanJ seems to have a handle on the
review so it's not high in my queue.

> We had a talk yesterday about the Linux stubdomain stuff -- that's a bit 
> less clear.  Changes to libxl should be in "linux-stubdom-only" paths, 
> so little risk of breaking libxl functionality.  On the other hand, it 
> makes a lot of changes to the build system, adding moderate risk to an 
> important component; and it hasn't had wide public testing yet, so 
> there's no telling how reliable it will be.  On the other other hand, 
> it's a blocker for being able to switch to qemu-upstream by default, 
> which was one of our key goals for 4.3; that may or may not be worth 
> risking slipping the release a bit for.

I think we switched the default for the non-stubdom case already (or
were planning too!). I think this is sufficient for 4.3.

Ian.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-11  9:33     ` Ian Campbell
@ 2013-04-11  9:43       ` George Dunlap
  2013-04-11  9:49         ` Ian Campbell
  0 siblings, 1 reply; 53+ messages in thread
From: George Dunlap @ 2013-04-11  9:43 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel@lists.xen.org, Konrad Rzeszutek Wilk

On 11/04/13 10:33, Ian Campbell wrote:
> On Thu, 2013-04-11 at 10:28 +0100, George Dunlap wrote:
>> On 10/04/13 17:41, Konrad Rzeszutek Wilk wrote:
>>>> = Timeline =
>>>>
>>>> We are planning on a 9-month release cycle.  Based on that, below are
>>>> our estimated dates:
>>>> * Feature freeze: 25 March 2013
>>>> * Code freezing point: 15 April 2013
>>> Is it possible to extend this? One of the reviewers (Ian) is just
>>> now able to look at code and review. That means the developers have
>>> only 3 business days to repost the changes.
>> So when I said "freezing point", I meant, "we will start rejecting
>> features".  Each feature will need to be considered individually.  I
>> think, for example, that PVH is not going to make it -- it touches too
>> much code, and is just not in good enough shape to get in as it is.  But
>> AFAICT the TMEM stuff should be fine next week.
>>
>> IanC knows that he's on the hot path, and so will be working double-time
>> over the next few days to review / commit patches.
> FWIW for the tmem stuff specifically IanJ seems to have a handle on the
> review so it's not high in my queue.
>
>> We had a talk yesterday about the Linux stubdomain stuff -- that's a bit
>> less clear.  Changes to libxl should be in "linux-stubdom-only" paths,
>> so little risk of breaking libxl functionality.  On the other hand, it
>> makes a lot of changes to the build system, adding moderate risk to an
>> important component; and it hasn't had wide public testing yet, so
>> there's no telling how reliable it will be.  On the other other hand,
>> it's a blocker for being able to switch to qemu-upstream by default,
>> which was one of our key goals for 4.3; that may or may not be worth
>> risking slipping the release a bit for.
> I think we switched the default for the non-stubdom case already (or
> were planning too!). I think this is sufficient for 4.3.

I've been thinking about this -- wouldn't this mean that if you do the 
"default" thing and just install a Windows guest in an HVM domain (not 
thinking about qemu or whatever), that now I"m stuck and I *can't* use 
stubdoms (because Windows doesn't like the hardware changing that much 
under its feet)?  That doesn't sound like a very good user experience to me.

  -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-11  9:43       ` George Dunlap
@ 2013-04-11  9:49         ` Ian Campbell
  0 siblings, 0 replies; 53+ messages in thread
From: Ian Campbell @ 2013-04-11  9:49 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel@lists.xen.org, Konrad Rzeszutek Wilk

On Thu, 2013-04-11 at 10:43 +0100, George Dunlap wrote:
> On 11/04/13 10:33, Ian Campbell wrote:
> > On Thu, 2013-04-11 at 10:28 +0100, George Dunlap wrote:
> >> On 10/04/13 17:41, Konrad Rzeszutek Wilk wrote:
> >>>> = Timeline =
> >>>>
> >>>> We are planning on a 9-month release cycle.  Based on that, below are
> >>>> our estimated dates:
> >>>> * Feature freeze: 25 March 2013
> >>>> * Code freezing point: 15 April 2013
> >>> Is it possible to extend this? One of the reviewers (Ian) is just
> >>> now able to look at code and review. That means the developers have
> >>> only 3 business days to repost the changes.
> >> So when I said "freezing point", I meant, "we will start rejecting
> >> features".  Each feature will need to be considered individually.  I
> >> think, for example, that PVH is not going to make it -- it touches too
> >> much code, and is just not in good enough shape to get in as it is.  But
> >> AFAICT the TMEM stuff should be fine next week.
> >>
> >> IanC knows that he's on the hot path, and so will be working double-time
> >> over the next few days to review / commit patches.
> > FWIW for the tmem stuff specifically IanJ seems to have a handle on the
> > review so it's not high in my queue.
> >
> >> We had a talk yesterday about the Linux stubdomain stuff -- that's a bit
> >> less clear.  Changes to libxl should be in "linux-stubdom-only" paths,
> >> so little risk of breaking libxl functionality.  On the other hand, it
> >> makes a lot of changes to the build system, adding moderate risk to an
> >> important component; and it hasn't had wide public testing yet, so
> >> there's no telling how reliable it will be.  On the other other hand,
> >> it's a blocker for being able to switch to qemu-upstream by default,
> >> which was one of our key goals for 4.3; that may or may not be worth
> >> risking slipping the release a bit for.
> > I think we switched the default for the non-stubdom case already (or
> > were planning too!). I think this is sufficient for 4.3.
> 
> I've been thinking about this -- wouldn't this mean that if you do the 
> "default" thing and just install a Windows guest in an HVM domain (not 
> thinking about qemu or whatever), that now I"m stuck and I *can't* use 
> stubdoms (because Windows doesn't like the hardware changing that much 
> under its feet)?  That doesn't sound like a very good user experience to me.

For that one domain, yes. You could reinstall that VM (or a new one).
However I think using stubdoms is not yet, sadly, the common case and
those who are using it are likely to use it from the point of
installation.

On balance I think making the switch for the non-stubdom case is the
least bad option for most use cases.

Aside: The Windows not liking the change of hardware thing is mostly
supposition which no one has proved or disproved one way or the other
AFAICT.

Ian.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update / winxp AMD performance regression
  2013-04-03 15:34         ` Andres Lagar-Cavilla
  2013-04-04 15:23           ` Tim Deegan
@ 2013-04-25 13:51           ` Pasi Kärkkäinen
  2013-04-25 14:00             ` George Dunlap
  1 sibling, 1 reply; 53+ messages in thread
From: Pasi Kärkkäinen @ 2013-04-25 13:51 UTC (permalink / raw)
  To: Andres Lagar-Cavilla
  Cc: suravee.suthikulpanit, George Dunlap, Tim (Xen.org), xen-devel,
	Jan Beulich, Peter Maloney

On Wed, Apr 03, 2013 at 11:34:13AM -0400, Andres Lagar-Cavilla wrote:
> On Apr 3, 2013, at 6:53 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> 
> > On 03/04/13 08:27, Jan Beulich wrote:
> >>>>> On 02.04.13 at 18:34, Tim Deegan <tim@xen.org> wrote:
> >>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
> >>>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
> >>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
> >>>>>   owner: ?
> >>>>>   Reference: http://marc.info/?l=xen-devel&m=135075376805215
> >>>> This is supposedly fixed with the RTC changes Tim committed the
> >>>> other day. Suravee, is that correct?
> >>> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
> >>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
> >>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
> >>> _lot_ of vmexits for IRQL reads and writes.
> >> Ah, okay, sorry for mixing this up. But how is this a regression
> >> then?
> > 
> > My sense, when I looked at this back whenever that there was much more to this.  The XP IRQL updating is a problem, but it's made terribly worse by the changset in question.  It seemed to me like the kind of thing that would be caused by TLB or caches suddenly becoming much less effective.
> 
> The commit in question does not add p2m mutations, so it doesn't nuke the NPT/EPT TLBs. It introduces a spin lock in the hot path and that is the problem. Later in the 4.2 cycle we changed the common case to use an rwlock. Does the same perf degradation occur with tip of 4.2?
> 

Adding Peter to CC who reported the original winxp performance problem/regression on AMD.

Peter: Can you try Xen 4.2.2 please and report if it has the performance problem or not? 

Thanks,

-- Pasi

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update / winxp AMD performance regression
  2013-04-25 13:51           ` Xen 4.3 development update / winxp AMD performance regression Pasi Kärkkäinen
@ 2013-04-25 14:00             ` George Dunlap
  2013-04-25 14:24               ` Andres Lagar-Cavilla
  0 siblings, 1 reply; 53+ messages in thread
From: George Dunlap @ 2013-04-25 14:00 UTC (permalink / raw)
  To: Pasi Kärkkäinen
  Cc: suravee.suthikulpanit@amd.com, Tim (Xen.org),
	xen-devel@lists.xen.org, Jan Beulich, Andres Lagar-Cavilla,
	Peter Maloney

On 04/25/2013 02:51 PM, Pasi Kärkkäinen wrote:
> On Wed, Apr 03, 2013 at 11:34:13AM -0400, Andres Lagar-Cavilla wrote:
>> On Apr 3, 2013, at 6:53 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>
>>> On 03/04/13 08:27, Jan Beulich wrote:
>>>>>>> On 02.04.13 at 18:34, Tim Deegan <tim@xen.org> wrote:
>>>>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>>>>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>>>>>    owner: ?
>>>>>>>    Reference: http://marc.info/?l=xen-devel&m=135075376805215
>>>>>> This is supposedly fixed with the RTC changes Tim committed the
>>>>>> other day. Suravee, is that correct?
>>>>> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
>>>>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>>>>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
>>>>> _lot_ of vmexits for IRQL reads and writes.
>>>> Ah, okay, sorry for mixing this up. But how is this a regression
>>>> then?
>>>
>>> My sense, when I looked at this back whenever that there was much more to this.  The XP IRQL updating is a problem, but it's made terribly worse by the changset in question.  It seemed to me like the kind of thing that would be caused by TLB or caches suddenly becoming much less effective.
>>
>> The commit in question does not add p2m mutations, so it doesn't nuke the NPT/EPT TLBs. It introduces a spin lock in the hot path and that is the problem. Later in the 4.2 cycle we changed the common case to use an rwlock. Does the same perf degradation occur with tip of 4.2?
>>
>
> Adding Peter to CC who reported the original winxp performance problem/regression on AMD.
>
> Peter: Can you try Xen 4.2.2 please and report if it has the performance problem or not?

Do you want to compare 4.2.2 to 4.2.1, or 4.3?

The changeset in question was included in the initial release of 4.2, so 
unless you think it's been fixed since, I would expect 4.2 to have this 
regression.

  -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update / winxp AMD performance regression
  2013-04-25 14:00             ` George Dunlap
@ 2013-04-25 14:24               ` Andres Lagar-Cavilla
  2013-04-28 10:18                 ` Peter Maloney
  0 siblings, 1 reply; 53+ messages in thread
From: Andres Lagar-Cavilla @ 2013-04-25 14:24 UTC (permalink / raw)
  To: George Dunlap
  Cc: suravee.suthikulpanit@amd.com, Tim (Xen.org),
	xen-devel@lists.xen.org, Jan Beulich, Peter Maloney

On Apr 25, 2013, at 10:00 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:

> On 04/25/2013 02:51 PM, Pasi Kärkkäinen wrote:
>> On Wed, Apr 03, 2013 at 11:34:13AM -0400, Andres Lagar-Cavilla wrote:
>>> On Apr 3, 2013, at 6:53 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>> 
>>>> On 03/04/13 08:27, Jan Beulich wrote:
>>>>>>>> On 02.04.13 at 18:34, Tim Deegan <tim@xen.org> wrote:
>>>>>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>>>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>>>>>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>>>>>>   owner: ?
>>>>>>>>   Reference: http://marc.info/?l=xen-devel&m=135075376805215
>>>>>>> This is supposedly fixed with the RTC changes Tim committed the
>>>>>>> other day. Suravee, is that correct?
>>>>>> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
>>>>>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>>>>>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
>>>>>> _lot_ of vmexits for IRQL reads and writes.
>>>>> Ah, okay, sorry for mixing this up. But how is this a regression
>>>>> then?
>>>> 
>>>> My sense, when I looked at this back whenever that there was much more to this.  The XP IRQL updating is a problem, but it's made terribly worse by the changset in question.  It seemed to me like the kind of thing that would be caused by TLB or caches suddenly becoming much less effective.
>>> 
>>> The commit in question does not add p2m mutations, so it doesn't nuke the NPT/EPT TLBs. It introduces a spin lock in the hot path and that is the problem. Later in the 4.2 cycle we changed the common case to use an rwlock. Does the same perf degradation occur with tip of 4.2?
>>> 
>> 
>> Adding Peter to CC who reported the original winxp performance problem/regression on AMD.
>> 
>> Peter: Can you try Xen 4.2.2 please and report if it has the performance problem or not?
> 
> Do you want to compare 4.2.2 to 4.2.1, or 4.3?
> 
> The changeset in question was included in the initial release of 4.2, so unless you think it's been fixed since, I would expect 4.2 to have this regression.

I believe you will see this 4.2 onwards. 4.2 includes the rwlock optimization. Nothing has been added to the tree in that regard recently.

Andres
> 
> -George
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-04 15:23           ` Tim Deegan
  2013-04-04 17:05             ` Tim Deegan
@ 2013-04-25 15:20             ` George Dunlap
  2013-04-25 15:26               ` George Dunlap
                                 ` (2 more replies)
  1 sibling, 3 replies; 53+ messages in thread
From: George Dunlap @ 2013-04-25 15:20 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Andres Lagar-Cavilla, Suravee Suthikulpanit, Jan Beulich,
	xen-devel@lists.xen.org

On Thu, Apr 4, 2013 at 4:23 PM, Tim Deegan <tim@xen.org> wrote:
> At 11:34 -0400 on 03 Apr (1364988853), Andres Lagar-Cavilla wrote:
>> On Apr 3, 2013, at 6:53 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>
>> > On 03/04/13 08:27, Jan Beulich wrote:
>> >>>>> On 02.04.13 at 18:34, Tim Deegan <tim@xen.org> wrote:
>> >>> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
>> >>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>> >>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
>> >>> _lot_ of vmexits for IRQL reads and writes.
>> >> Ah, okay, sorry for mixing this up. But how is this a regression
>> >> then?
>> >
>> > My sense, when I looked at this back whenever that there was much more to this.  The XP IRQL updating is a problem, but it's made terribly worse by the changset in question.  It seemed to me like the kind of thing that would be caused by TLB or caches suddenly becoming much less effective.
>>
>> The commit in question does not add p2m mutations, so it doesn't nuke the NPT/EPT TLBs. It introduces a spin lock in the hot path and that is the problem. Later in the 4.2 cycle we changed the common case to use an rwlock. Does the same perf degradation occur with tip of 4.2?
>>
>
> Yes, 4.2 is definitely slower.  A compile test on a 4-vcpu VM that takes
> about 12 minutes before this locking change takes more than 20 minutes
> on the current tip of xen-unstable (I gave up at 22 minutes and rebooted
> to test something else).

Tim,

Can you go into a bit more detail about what you complied on what kind of OS?

I just managed to actually find a c/s from which I could build the
tools (git 914e61c), and then compared that with just rebuilding xen
on accused changeset (6b719c3).

The VM was a Debian Wheezy VM, stock kernel (3.2), PVHVM mode, 1G of
RAM, 4 vcpus, LVM-backed 8G disk.

Host is an AMD Barcelona (I think), 8 cores, 4G RAM.

The test was "make -C xen clean && make -j 6 XEN_TARGET_ARCH=x86_64 xen".

Time was measured on the "test controller" machine -- i.e., my dev
box, which is not running Xen.  (This means there's some potential for
timing variance with ssh and the network, but no potential for timing
variance due to virtual time issues.)

"Good" (c/s 914e61c):
334.92
312.22
311.21
311.71
315.87

"Bad" (c/s 6b719c3)
326.50
295.77
288.50
296.43
276.66

In the "Good" run I had a vnc display going, whereas in the "bad" run
I didn't; that could account for the speed-up.  But so far it
contradicts the idea of a systematic problem in c/s 6b719c3.

I'm going to try some other combinations as well...

 -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-25 15:20             ` George Dunlap
@ 2013-04-25 15:26               ` George Dunlap
  2013-04-25 15:46               ` Tim Deegan
  2013-05-03  9:35               ` George Dunlap
  2 siblings, 0 replies; 53+ messages in thread
From: George Dunlap @ 2013-04-25 15:26 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Andres Lagar-Cavilla, Suravee Suthikulpanit, Jan Beulich,
	xen-devel@lists.xen.org

[And remembering to cc everyone this time]

On Thu, Apr 25, 2013 at 4:20 PM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> "Good" (c/s 914e61c):
> 334.92
> 312.22
> 311.21
> 311.71
> 315.87
>
> "Bad" (c/s 6b719c3)
> 326.50
> 295.77
> 288.50
> 296.43
> 276.66

Sorry, this is "seconds to complete", lower is better.

 -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-25 15:20             ` George Dunlap
  2013-04-25 15:26               ` George Dunlap
@ 2013-04-25 15:46               ` Tim Deegan
  2013-04-25 15:50                 ` George Dunlap
  2013-05-03  9:35               ` George Dunlap
  2 siblings, 1 reply; 53+ messages in thread
From: Tim Deegan @ 2013-04-25 15:46 UTC (permalink / raw)
  To: George Dunlap
  Cc: Andres Lagar-Cavilla, Suravee Suthikulpanit, Jan Beulich,
	xen-devel@lists.xen.org

At 16:20 +0100 on 25 Apr (1366906804), George Dunlap wrote:
> On Thu, Apr 4, 2013 at 4:23 PM, Tim Deegan <tim@xen.org> wrote:
> > At 11:34 -0400 on 03 Apr (1364988853), Andres Lagar-Cavilla wrote:
> >> On Apr 3, 2013, at 6:53 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> >>
> >> > On 03/04/13 08:27, Jan Beulich wrote:
> >> >>>>> On 02.04.13 at 18:34, Tim Deegan <tim@xen.org> wrote:
> >> >>> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
> >> >>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
> >> >>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
> >> >>> _lot_ of vmexits for IRQL reads and writes.
> >> >> Ah, okay, sorry for mixing this up. But how is this a regression
> >> >> then?
> >> >
> >> > My sense, when I looked at this back whenever that there was much more to this.  The XP IRQL updating is a problem, but it's made terribly worse by the changset in question.  It seemed to me like the kind of thing that would be caused by TLB or caches suddenly becoming much less effective.
> >>
> >> The commit in question does not add p2m mutations, so it doesn't nuke the NPT/EPT TLBs. It introduces a spin lock in the hot path and that is the problem. Later in the 4.2 cycle we changed the common case to use an rwlock. Does the same perf degradation occur with tip of 4.2?
> >>
> >
> > Yes, 4.2 is definitely slower.  A compile test on a 4-vcpu VM that takes
> > about 12 minutes before this locking change takes more than 20 minutes
> > on the current tip of xen-unstable (I gave up at 22 minutes and rebooted
> > to test something else).
> 
> Tim,
> 
> Can you go into a bit more detail about what you complied on what kind of OS?

I was compiling on Win XP sp3, 32-bit, 1vcpu, 4G ram.  The compile was
the Windows DDK sample code. 

As I think I mentioned later, all my measurements are extremely suspect
as I was relying on guest wallclock time, and the 'before' case was
before the XP wallclock time was fixed. :(

> The VM was a Debian Wheezy VM, stock kernel (3.2), PVHVM mode, 1G of
> RAM, 4 vcpus, LVM-backed 8G disk.

I suspect the TPR access patterns of XP are not seen on linux; it's been
known for long enough now that it's super-slow on emulated platforms and
AFAIK it was only ever Windows that used the TPR so aggressively anyway.

Tim.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-25 15:46               ` Tim Deegan
@ 2013-04-25 15:50                 ` George Dunlap
  0 siblings, 0 replies; 53+ messages in thread
From: George Dunlap @ 2013-04-25 15:50 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Andres Lagar-Cavilla, Suravee Suthikulpanit, Jan Beulich,
	xen-devel@lists.xen.org

On 04/25/2013 04:46 PM, Tim Deegan wrote:
> At 16:20 +0100 on 25 Apr (1366906804), George Dunlap wrote:
>> On Thu, Apr 4, 2013 at 4:23 PM, Tim Deegan <tim@xen.org> wrote:
>>> At 11:34 -0400 on 03 Apr (1364988853), Andres Lagar-Cavilla wrote:
>>>> On Apr 3, 2013, at 6:53 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>>>
>>>>> On 03/04/13 08:27, Jan Beulich wrote:
>>>>>>>>> On 02.04.13 at 18:34, Tim Deegan <tim@xen.org> wrote:
>>>>>>> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
>>>>>>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>>>>>>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
>>>>>>> _lot_ of vmexits for IRQL reads and writes.
>>>>>> Ah, okay, sorry for mixing this up. But how is this a regression
>>>>>> then?
>>>>>
>>>>> My sense, when I looked at this back whenever that there was much more to this.  The XP IRQL updating is a problem, but it's made terribly worse by the changset in question.  It seemed to me like the kind of thing that would be caused by TLB or caches suddenly becoming much less effective.
>>>>
>>>> The commit in question does not add p2m mutations, so it doesn't nuke the NPT/EPT TLBs. It introduces a spin lock in the hot path and that is the problem. Later in the 4.2 cycle we changed the common case to use an rwlock. Does the same perf degradation occur with tip of 4.2?
>>>>
>>>
>>> Yes, 4.2 is definitely slower.  A compile test on a 4-vcpu VM that takes
>>> about 12 minutes before this locking change takes more than 20 minutes
>>> on the current tip of xen-unstable (I gave up at 22 minutes and rebooted
>>> to test something else).
>>
>> Tim,
>>
>> Can you go into a bit more detail about what you complied on what kind of OS?
>
> I was compiling on Win XP sp3, 32-bit, 1vcpu, 4G ram.  The compile was
> the Windows DDK sample code.
>
> As I think I mentioned later, all my measurements are extremely suspect
> as I was relying on guest wallclock time, and the 'before' case was
> before the XP wallclock time was fixed. :(
>
>> The VM was a Debian Wheezy VM, stock kernel (3.2), PVHVM mode, 1G of
>> RAM, 4 vcpus, LVM-backed 8G disk.
>
> I suspect the TPR access patterns of XP are not seen on linux; it's been
> known for long enough now that it's super-slow on emulated platforms and
> AFAIK it was only ever Windows that used the TPR so aggressively anyway.

Right.  IIRC w2k3 sp2 has the "lazy tpr" feature, so if I can get 
consistent results with that one then we can say... well, we can at 
least say it's not easy to reproduce. :-)

  -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update / winxp AMD performance regression
  2013-04-25 14:24               ` Andres Lagar-Cavilla
@ 2013-04-28 10:18                 ` Peter Maloney
  2013-04-29  9:01                   ` George Dunlap
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Maloney @ 2013-04-28 10:18 UTC (permalink / raw)
  To: Andres Lagar-Cavilla
  Cc: suravee.suthikulpanit@amd.com, George Dunlap, Tim (Xen.org),
	xen-devel@lists.xen.org, Jan Beulich

On 04/25/2013 04:24 PM, Andres Lagar-Cavilla wrote:
> On Apr 25, 2013, at 10:00 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>
>> On 04/25/2013 02:51 PM, Pasi Kärkkäinen wrote:
>>> On Wed, Apr 03, 2013 at 11:34:13AM -0400, Andres Lagar-Cavilla wrote:
>>>> On Apr 3, 2013, at 6:53 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>>>
>>>>> On 03/04/13 08:27, Jan Beulich wrote:
>>>>>>>>> On 02.04.13 at 18:34, Tim Deegan <tim@xen.org> wrote:
>>>>>>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>>>>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>>>>>>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>>>>>>>   owner: ?
>>>>>>>>>   Reference: http://marc.info/?l=xen-devel&m=135075376805215
>>>>>>>> This is supposedly fixed with the RTC changes Tim committed the
>>>>>>>> other day. Suravee, is that correct?
>>>>>>> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
>>>>>>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>>>>>>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
>>>>>>> _lot_ of vmexits for IRQL reads and writes.
>>>>>> Ah, okay, sorry for mixing this up. But how is this a regression
>>>>>> then?
>>>>> My sense, when I looked at this back whenever that there was much more to this.  The XP IRQL updating is a problem, but it's made terribly worse by the changset in question.  It seemed to me like the kind of thing that would be caused by TLB or caches suddenly becoming much less effective.
>>>> The commit in question does not add p2m mutations, so it doesn't nuke the NPT/EPT TLBs. It introduces a spin lock in the hot path and that is the problem. Later in the 4.2 cycle we changed the common case to use an rwlock. Does the same perf degradation occur with tip of 4.2?
>>>>
>>> Adding Peter to CC who reported the original winxp performance problem/regression on AMD.
>>>
>>> Peter: Can you try Xen 4.2.2 please and report if it has the performance problem or not?
>> Do you want to compare 4.2.2 to 4.2.1, or 4.3?
>>
>> The changeset in question was included in the initial release of 4.2, so unless you think it's been fixed since, I would expect 4.2 to have this regression.
> I believe you will see this 4.2 onwards. 4.2 includes the rwlock optimization. Nothing has been added to the tree in that regard recently.
>
> Andres
Bad news... It is very slow still. With 7 vcpus, it took very long to
get to the login screen, then I hit the login button at 10:30:30 and at
10.32:10 I can watch my icons starting to appear one by one very slowly.
When the icons are all there, I see a blue bar instead of the taskbar.
10:32:47 the taskbar looks normal finally, but systray is still empty. I
clicked the start menu at 10:33:40 (still empty systray). At 10:33:54,
the start menu opened. At 10:34:20, the first systray icon appeared. at
10:36 I managed to get Task manager loaded, and it shows 88-95% CPU
usage in 7 cpus, but doesn't show any processes using much. (xming using
16, System using 11, taskmgr.exe using 9, CCC.exe using 5, explorer and
services using 4%, etc.) xm top shows the domain at 646.9% CPU.


xentop - 10:39:37   Xen 4.2.2
2 domains: 2 running, 0 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown
Mem: 16757960k total, 12768800k used, 3989160k free    CPUs: 8 @ 4499MHz
      NAME  STATE   CPU(sec) CPU(%)     MEM(k) MEM(%)  MAXMEM(k)
MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS   VBD_OO   VBD_RD   VBD_WR 
VBD_RSECT  VBD_WSECT SSID
  Domain-0 -----r       1184   25.4    8344320   49.8    8388608     
50.1     8    0        0        0    0        0        0       
0          0          0    0
windowsxp2 -----r       3853  587.3    4197220   25.0    4198400     
25.1     7    1      392       20    1        0    17657     4661    
806510      58398    0

(about 8% of the dom0 stuff is the qemu-dm process, and the rest is
unrelated things)

And this was expected, since I already tested 4.2.1, and you said that
this fix should be in 4.2 onwards, so I would have already tested it in
4.2.1.


Here's xm vcpu-list
Name                                ID  VCPU   CPU State   Time(s) CPU
Affinity
Domain-0                             0     0     2   -b-     461.7 any cpu
Domain-0                             0     1     4   -b-     340.1 any cpu
Domain-0                             0     2     5   -b-     182.8 any cpu
Domain-0                             0     3     3   -b-      84.9 any cpu
Domain-0                             0     4     2   -b-      67.5 any cpu
Domain-0                             0     5     2   r--      62.5 any cpu
Domain-0                             0     6     3   -b-      44.3 any cpu
Domain-0                             0     7     2   -b-      46.5 any cpu
windowsxp2                           3     0     5   r--     755.4 any cpu
windowsxp2                           3     1     1   r--     688.1 any cpu
windowsxp2                           3     2     3   r--     702.6 any cpu
windowsxp2                           3     3     7   r--     723.4 any cpu
windowsxp2                           3     4     6   r--     724.7 any cpu
windowsxp2                           3     5     0   r--     725.0 any cpu
windowsxp2                           3     6     4   r--     821.3 any cpu


Here's dmesg just to see the version:

 __  __            _  _    ____    ____ 
 \ \/ /___ _ __   | || |  |___ \  |___ \
  \  // _ \ '_ \  | || |_   __) |   __) |
  /  \  __/ | | | |__   _| / __/ _ / __/
 /_/\_\___|_| |_|    |_|(_)_____(_)_____|
                                        
(XEN) Xen version 4.2.2 (root@site) (gcc (SUSE Linux) 4.7.1 20120723
[gcc-4_7-branch revision 189773]) Sun Apr 28 00:16:04 CEST 2013
(XEN) Latest ChangeSet: unavailable
(XEN) Bootloader: GRUB2 2.00
(XEN) Command line: dom0_mem=8192M,max:8192M iommu=1 loglvl=all
guest_loglvl=all


Here's the dmesg after domu boots:



(XEN) HVM2: Press F12 for boot menu.
(XEN) HVM2:
(XEN) HVM2: Booting from Hard Disk...
(XEN) HVM2: Booting from 0000:7c00
(XEN) HVM3: HVM Loader
(XEN) HVM3: Detected Xen v4.2.2
(XEN) HVM3: Xenbus rings @0xfeffc000, event channel 9
(XEN) HVM3: System requested ROMBIOS
(XEN) HVM3: CPU speed is 4500 MHz
(XEN) irq.c:270: Dom3 PCI link 0 changed 0 -> 5
(XEN) HVM3: PCI-ISA link 0 routed to IRQ5
(XEN) irq.c:270: Dom3 PCI link 1 changed 0 -> 10
(XEN) HVM3: PCI-ISA link 1 routed to IRQ10
(XEN) irq.c:270: Dom3 PCI link 2 changed 0 -> 11
(XEN) HVM3: PCI-ISA link 2 routed to IRQ11
(XEN) irq.c:270: Dom3 PCI link 3 changed 0 -> 5
(XEN) HVM3: PCI-ISA link 3 routed to IRQ5
(XEN) HVM3: pci dev 01:2 INTD->IRQ5
(XEN) HVM3: pci dev 01:3 INTA->IRQ10
(XEN) HVM3: pci dev 03:0 INTA->IRQ5
(XEN) HVM3: pci dev 04:0 INTA->IRQ5
(XEN) HVM3: pci dev 05:0 INTA->IRQ10
(XEN) HVM3: pci dev 02:0 bar 10 size 02000000: f0000008
(XEN) HVM3: pci dev 03:0 bar 14 size 01000000: f2000008
(XEN) HVM3: pci dev 02:0 bar 14 size 00001000: f3000000
(XEN) HVM3: pci dev 03:0 bar 10 size 00000100: 0000c001
(XEN) HVM3: pci dev 04:0 bar 10 size 00000100: 0000c101
(XEN) HVM3: pci dev 04:0 bar 14 size 00000100: f3001000
(XEN) HVM3: pci dev 05:0 bar 10 size 00000100: 0000c201
(XEN) HVM3: pci dev 01:2 bar 20 size 00000020: 0000c301
(XEN) HVM3: pci dev 01:1 bar 20 size 00000010: 0000c321
(XEN) HVM3: Multiprocessor initialisation:
(XEN) HVM3:  - CPU0 ... 48-bit phys ... fixed MTRRs ... var MTRRs [2/8]
... done.
(XEN) HVM3:  - CPU1 ... 48-bit phys ... fixed MTRRs ... var MTRRs [2/8]
... done.
(XEN) HVM3:  - CPU2 ... 48-bit phys ... fixed MTRRs ... var MTRRs [2/8]
... done.
(XEN) HVM3:  - CPU3 ... 48-bit phys ... fixed MTRRs ... var MTRRs [2/8]
... done.
(XEN) HVM3:  - CPU4 ... 48-bit phys ... fixed MTRRs ... var MTRRs [2/8]
... done.
(XEN) HVM3:  - CPU5 ... 48-bit phys ... fixed MTRRs ... var MTRRs [2/8]
... done.
(XEN) HVM3:  - CPU6 ... 48-bit phys ... fixed MTRRs ... var MTRRs [2/8]
... done.
(XEN) HVM3: Testing HVM environment:
(XEN) HVM3:  - REP INSB across page boundaries ... passed
(XEN) HVM3:  - GS base MSRs and SWAPGS ... passed
(XEN) HVM3: Passed 2 of 2 tests
(XEN) HVM3: Writing SMBIOS tables ...
(XEN) HVM3: Loading ROMBIOS ...
(XEN) HVM3: 12604 bytes of ROMBIOS high-memory extensions:
(XEN) HVM3:   Relocating to 0xfc001000-0xfc00413c ... done
(XEN) HVM3: Creating MP tables ...
(XEN) HVM3: Loading Cirrus VGABIOS ...
(XEN) HVM3: Loading PCI Option ROM ...
(XEN) HVM3:  - Manufacturer: http://ipxe.org
(XEN) HVM3:  - Product name: iPXE
(XEN) HVM3: Option ROMs:
(XEN) HVM3:  c0000-c8fff: VGA BIOS
(XEN) HVM3:  c9000-d8fff: Etherboot ROM
(XEN) HVM3: Loading ACPI ...
(XEN) HVM3: vm86 TSS at fc010200
(XEN) HVM3: BIOS map:
(XEN) HVM3:  f0000-fffff: Main BIOS
(XEN) HVM3: E820 table:
(XEN) HVM3:  [00]: 00000000:00000000 - 00000000:0009e000: RAM
(XEN) HVM3:  [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED
(XEN) HVM3:  HOLE: 00000000:000a0000 - 00000000:000e0000
(XEN) HVM3:  [02]: 00000000:000e0000 - 00000000:00100000: RESERVED
(XEN) HVM3:  [03]: 00000000:00100000 - 00000000:f0000000: RAM
(XEN) HVM3:  HOLE: 00000000:f0000000 - 00000000:fc000000
(XEN) HVM3:  [04]: 00000000:fc000000 - 00000001:00000000: RESERVED
(XEN) HVM3:  [05]: 00000001:00000000 - 00000001:10000000: RAM
(XEN) HVM3: Invoking ROMBIOS ...
(XEN) HVM3: $Revision: 1.221 $ $Date: 2008/12/07 17:32:29 $
(XEN) stdvga.c:147:d3 entering stdvga and caching modes
(XEN) HVM3: VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $
(XEN) HVM3: Bochs BIOS - build: 06/23/99
(XEN) HVM3: $Revision: 1.221 $ $Date: 2008/12/07 17:32:29 $
(XEN) HVM3: Options: apmbios pcibios eltorito PMM
(XEN) HVM3:
(XEN) HVM3: ata0-0: PCHS=16383/16/63 translation=lba LCHS=1024/255/63
(XEN) HVM3: ata0 master: QEMU HARDDISK ATA-7 Hard-Disk (40960 MBytes)
(XEN) HVM3: IDE time out
(XEN) HVM3:
(XEN) HVM3:
(XEN) HVM3:
(XEN) HVM3: Press F12 for boot menu.
(XEN) HVM3:
(XEN) HVM3: Booting from Hard Disk...
(XEN) HVM3: Booting from 0000:7c00
(XEN) HVM3: int13_harddisk: function 15, unmapped device for ELDL=81
(XEN) HVM3: *** int 15h function AX=e980, BX=007e not yet supported!
(XEN) irq.c:270: Dom3 PCI link 0 changed 5 -> 0
(XEN) irq.c:270: Dom3 PCI link 1 changed 10 -> 0
(XEN) irq.c:270: Dom3 PCI link 2 changed 11 -> 0
(XEN) irq.c:270: Dom3 PCI link 3 changed 5 -> 0
(XEN) grant_table.c:1237:d3 Expanding dom (3) grant table from (4) to
(32) frames.
(XEN) irq.c:375: Dom3 callback via changed to GSI 28
(XEN) stdvga.c:151:d3 leaving stdvga

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update / winxp AMD performance regression
  2013-04-28 10:18                 ` Peter Maloney
@ 2013-04-29  9:01                   ` George Dunlap
  0 siblings, 0 replies; 53+ messages in thread
From: George Dunlap @ 2013-04-29  9:01 UTC (permalink / raw)
  To: Peter Maloney
  Cc: suravee.suthikulpanit@amd.com, Andres Lagar-Cavilla,
	Tim (Xen.org), xen-devel@lists.xen.org, Jan Beulich

On 28/04/13 11:18, Peter Maloney wrote:
> On 04/25/2013 04:24 PM, Andres Lagar-Cavilla wrote:
>> On Apr 25, 2013, at 10:00 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>
>>> On 04/25/2013 02:51 PM, Pasi Kärkkäinen wrote:
>>>> On Wed, Apr 03, 2013 at 11:34:13AM -0400, Andres Lagar-Cavilla wrote:
>>>>> On Apr 3, 2013, at 6:53 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>>>>
>>>>>> On 03/04/13 08:27, Jan Beulich wrote:
>>>>>>>>>> On 02.04.13 at 18:34, Tim Deegan <tim@xen.org> wrote:
>>>>>>>> At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:
>>>>>>>>>>>> On 02.04.13 at 16:07, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>>>>>>>>>> * AMD NPT performance regression after c/s 24770:7f79475d3de7
>>>>>>>>>>    owner: ?
>>>>>>>>>>    Reference: http://marc.info/?l=xen-devel&m=135075376805215
>>>>>>>>> This is supposedly fixed with the RTC changes Tim committed the
>>>>>>>>> other day. Suravee, is that correct?
>>>>>>>> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
>>>>>>>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>>>>>>>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
>>>>>>>> _lot_ of vmexits for IRQL reads and writes.
>>>>>>> Ah, okay, sorry for mixing this up. But how is this a regression
>>>>>>> then?
>>>>>> My sense, when I looked at this back whenever that there was much more to this.  The XP IRQL updating is a problem, but it's made terribly worse by the changset in question.  It seemed to me like the kind of thing that would be caused by TLB or caches suddenly becoming much less effective.
>>>>> The commit in question does not add p2m mutations, so it doesn't nuke the NPT/EPT TLBs. It introduces a spin lock in the hot path and that is the problem. Later in the 4.2 cycle we changed the common case to use an rwlock. Does the same perf degradation occur with tip of 4.2?
>>>>>
>>>> Adding Peter to CC who reported the original winxp performance problem/regression on AMD.
>>>>
>>>> Peter: Can you try Xen 4.2.2 please and report if it has the performance problem or not?
>>> Do you want to compare 4.2.2 to 4.2.1, or 4.3?
>>>
>>> The changeset in question was included in the initial release of 4.2, so unless you think it's been fixed since, I would expect 4.2 to have this regression.
>> I believe you will see this 4.2 onwards. 4.2 includes the rwlock optimization. Nothing has been added to the tree in that regard recently.
>>
>> Andres
> Bad news... It is very slow still. With 7 vcpus, it took very long to
> get to the login screen, then I hit the login button at 10:30:30 and at
> 10.32:10 I can watch my icons starting to appear one by one very slowly.
> When the icons are all there, I see a blue bar instead of the taskbar.
> 10:32:47 the taskbar looks normal finally, but systray is still empty. I
> clicked the start menu at 10:33:40 (still empty systray). At 10:33:54,
> the start menu opened. At 10:34:20, the first systray icon appeared. at
> 10:36 I managed to get Task manager loaded, and it shows 88-95% CPU
> usage in 7 cpus, but doesn't show any processes using much. (xming using
> 16, System using 11, taskmgr.exe using 9, CCC.exe using 5, explorer and
> services using 4%, etc.) xm top shows the domain at 646.9% CPU.

What guest OS is this again?  Windows XP?  Do you see the same behavior 
with other Windows OSes?  (e.g., Win7, Win8, w2k3sp2, w2k8?)

If you're really keen, you could do a quick xentrace for me after the VM 
has mostly booted:
1. Run "xentrace -D -e all -S 32 -T 30 /tmp/[name].trace" on your Xen host
2. Clone and build the following hg repo: 
http://xenbits.xen.org/ext/xenalyze
3. Run "xenalyze --svm-mode -s [name].trace > [name].summary" and send 
me the results

  -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-04 17:05             ` Tim Deegan
@ 2013-04-29 13:21               ` Peter Maloney
  2013-05-02 15:48                 ` Tim Deegan
  2013-05-07 13:56                 ` Pasi Kärkkäinen
  0 siblings, 2 replies; 53+ messages in thread
From: Peter Maloney @ 2013-04-29 13:21 UTC (permalink / raw)
  To: Tim Deegan
  Cc: George Dunlap, Andres Lagar-Cavilla, Jan Beulich,
	suravee.suthikulpanit, xen-devel

On 04/04/2013 07:05 PM, Tim Deegan wrote:
> At 16:23 +0100 on 04 Apr (1365092601), Tim Deegan wrote:
>> At 11:34 -0400 on 03 Apr (1364988853), Andres Lagar-Cavilla wrote:
>>> On Apr 3, 2013, at 6:53 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>> Yes, 4.2 is definitely slower.  A compile test on a 4-vcpu VM that takes
>> about 12 minutes before this locking change takes more than 20 minutes
>> on the current tip of xen-unstable (I gave up at 22 minutes and rebooted
>> to test something else).
> I did a bit of prodding at this, but messed up my measurements in a
> bunch of different ways over the afternoon. :(  I'm going to be away
> from my test boxes for a couple of weeks now, so all I can say is, if
> you're investigating this bug, beware that:
>
>  - the revision before this change still has the RTC bugs that were
>    fixed last week, so don't measure performance based on guest
>    wallclock time, or your 'before' perf will look too good.
>  - the current unstable tip has test code to exercise the new
>    map_domain_page(), which will badly affect all the many memory
>    accesses done in HVM emulation, so make sure you use debug=n builds
>    for measurement.
>
> Also, if there is still a bad slowdown, caused by the p2m lookups, this
> might help a little bit:
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 38e87ce..7bd8646 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -1361,6 +1361,18 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
>          }
>      }
>  
> +
> +    /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
> +     * a fast path for LAPIC accesses, skipping the p2m lookup. */
> +    if ( !nestedhvm_vcpu_in_guestmode(v)
> +         && gfn == vlapic_base_address(vcpu_vlapic(current)) >> PAGE_SHIFT )
> +    {
> +        if ( !handle_mmio() )
> +            hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +        rc = 1;
> +        goto out;
> +    }
> +
>      p2m = p2m_get_hostp2m(v->domain);
>      mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 
>                                P2M_ALLOC | (access_w ? P2M_UNSHARE : 0), NULL);
This patch (applied to 4.2.2) has a very large improvement on my box
(AMD FX-8150) and WinXP 32 bit.

It only took about 2.5 minutes to log in and see task manager. It takes
about 6 minutes without the patch. And 2.5 minutes is still terrible,
but obviously better.
>
> but in fact, the handle_mmio() will have to do GVA->GFN lookups for its
> %RIP and all its operands, and each of those will involve multiple
> GFN->MFN lookups for the pagetable entries, so if the GFN->MFN lookup
> has got slower, eliminating just the one at the start may not be all
> that great.
>
> Cheers,
>
> Tim.
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-29 13:21               ` Peter Maloney
@ 2013-05-02 15:48                 ` Tim Deegan
  2013-05-03 16:41                   ` George Dunlap
  2013-05-07 13:56                 ` Pasi Kärkkäinen
  1 sibling, 1 reply; 53+ messages in thread
From: Tim Deegan @ 2013-05-02 15:48 UTC (permalink / raw)
  To: Peter Maloney
  Cc: George Dunlap, Andres Lagar-Cavilla, suravee.suthikulpanit,
	Jan Beulich, xen-devel

At 15:21 +0200 on 29 Apr (1367248894), Peter Maloney wrote:
> On 04/04/2013 07:05 PM, Tim Deegan wrote:
> > Also, if there is still a bad slowdown, caused by the p2m lookups, this
> > might help a little bit:
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index 38e87ce..7bd8646 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -1361,6 +1361,18 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
> >          }
> >      }
> >  
> > +
> > +    /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
> > +     * a fast path for LAPIC accesses, skipping the p2m lookup. */
> > +    if ( !nestedhvm_vcpu_in_guestmode(v)
> > +         && gfn == vlapic_base_address(vcpu_vlapic(current)) >> PAGE_SHIFT )
> > +    {
> > +        if ( !handle_mmio() )
> > +            hvm_inject_hw_exception(TRAP_gp_fault, 0);
> > +        rc = 1;
> > +        goto out;
> > +    }
> > +
> >      p2m = p2m_get_hostp2m(v->domain);
> >      mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 
> >                                P2M_ALLOC | (access_w ? P2M_UNSHARE : 0), NULL);
> This patch (applied to 4.2.2) has a very large improvement on my box
> (AMD FX-8150) and WinXP 32 bit.

Hmm - I expected it to be only a mild improvement.  How about this one, 
which puts in the same shortcut in another place as well?  I don't think
it will be much better than the last one, but it's worth a try.

Tim.

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index c8487b8..10b6f6b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1361,6 +1361,17 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
         }
     }
 
+    /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
+     * a fast path for LAPIC accesses, skipping the p2m lookup. */
+    if ( !nestedhvm_vcpu_in_guestmode(v)
+         && gfn == vlapic_base_address(vcpu_vlapic(v)) >> PAGE_SHIFT )
+    {
+        if ( !handle_mmio() )
+            hvm_inject_hw_exception(TRAP_gp_fault, 0);
+        rc = 1;
+        goto out;
+    }
+
     p2m = p2m_get_hostp2m(v->domain);
     mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 
                               P2M_ALLOC | (access_w ? P2M_UNSHARE : 0), NULL);
@@ -2471,6 +2482,12 @@ static enum hvm_copy_result __hvm_copy(
             gfn = addr >> PAGE_SHIFT;
         }
 
+        /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
+         * a fast path for LAPIC accesses, skipping the p2m lookup. */
+        if ( !nestedhvm_vcpu_in_guestmode(curr)
+             && gfn == vlapic_base_address(vcpu_vlapic(curr)) >> PAGE_SHIFT )
+            return HVMCOPY_bad_gfn_to_mfn;
+
         page = get_page_from_gfn(curr->domain, gfn, &p2mt, P2M_UNSHARE);
 
         if ( p2m_is_paging(p2mt) )

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-25 15:20             ` George Dunlap
  2013-04-25 15:26               ` George Dunlap
  2013-04-25 15:46               ` Tim Deegan
@ 2013-05-03  9:35               ` George Dunlap
  2 siblings, 0 replies; 53+ messages in thread
From: George Dunlap @ 2013-05-03  9:35 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Andres Lagar-Cavilla, Peter Maloney, Suravee Suthikulpanit,
	Jan Beulich, xen-devel@lists.xen.org

On Thu, Apr 25, 2013 at 4:20 PM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> On Thu, Apr 4, 2013 at 4:23 PM, Tim Deegan <tim@xen.org> wrote:
>> At 11:34 -0400 on 03 Apr (1364988853), Andres Lagar-Cavilla wrote:
>>> On Apr 3, 2013, at 6:53 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>>
>>> > On 03/04/13 08:27, Jan Beulich wrote:
>>> >>>>> On 02.04.13 at 18:34, Tim Deegan <tim@xen.org> wrote:
>>> >>> This is a separate problem.  IIRC the AMD XP perf issue is caused by the
>>> >>> emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
>>> >>> patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it takes a
>>> >>> _lot_ of vmexits for IRQL reads and writes.
>>> >> Ah, okay, sorry for mixing this up. But how is this a regression
>>> >> then?
>>> >
>>> > My sense, when I looked at this back whenever that there was much more to this.  The XP IRQL updating is a problem, but it's made terribly worse by the changset in question.  It seemed to me like the kind of thing that would be caused by TLB or caches suddenly becoming much less effective.
>>>
>>> The commit in question does not add p2m mutations, so it doesn't nuke the NPT/EPT TLBs. It introduces a spin lock in the hot path and that is the problem. Later in the 4.2 cycle we changed the common case to use an rwlock. Does the same perf degradation occur with tip of 4.2?
>>>
>>
>> Yes, 4.2 is definitely slower.  A compile test on a 4-vcpu VM that takes
>> about 12 minutes before this locking change takes more than 20 minutes
>> on the current tip of xen-unstable (I gave up at 22 minutes and rebooted
>> to test something else).
>
> Tim,
>
> Can you go into a bit more detail about what you complied on what kind of OS?
>
> I just managed to actually find a c/s from which I could build the
> tools (git 914e61c), and then compared that with just rebuilding xen
> on accused changeset (6b719c3).
>
> The VM was a Debian Wheezy VM, stock kernel (3.2), PVHVM mode, 1G of
> RAM, 4 vcpus, LVM-backed 8G disk.
>
> Host is an AMD Barcelona (I think), 8 cores, 4G RAM.
>
> The test was "make -C xen clean && make -j 6 XEN_TARGET_ARCH=x86_64 xen".
>
> Time was measured on the "test controller" machine -- i.e., my dev
> box, which is not running Xen.  (This means there's some potential for
> timing variance with ssh and the network, but no potential for timing
> variance due to virtual time issues.)
>
> "Good" (c/s 914e61c):
> 334.92
> 312.22
> 311.21
> 311.71
> 315.87
>
> "Bad" (c/s 6b719c3)
> 326.50
> 295.77
> 288.50
> 296.43
> 276.66
>
> In the "Good" run I had a vnc display going, whereas in the "bad" run
> I didn't; that could account for the speed-up.  But so far it
> contradicts the idea of a systematic problem in c/s 6b719c3.
>
> I'm going to try some other combinations as well...

BTW, I did the same test with 4.1, 4.2.2-RC2, and a recent
xen-unstable tip.  Here are all the results, presented in the order of
the version of xen tested:

v4.1:
292.35
267.31
270.91
285.81
278.30

"Good" git c/s 914e61c:
334.92
312.22
311.21
311.71
315.87

"Bad" git c/s 6b719c3:
326.50
295.77
288.50
296.43
276.66

4.2.2-rc2:
261.49
250.75
246.82
246.23
247.64

Xen-unstable "recent" master:
267.31
258.49
256.83
250.77
252.36

So overall I think we can say that c/s 6b719c3 didn't cause a general
performance regression on AMD HVM guests.

I'm in the process of duplicating the exact test which Peter noticed,
namely time to boot Windows XP.

 -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-05-02 15:48                 ` Tim Deegan
@ 2013-05-03 16:41                   ` George Dunlap
  2013-05-03 16:59                     ` Tim Deegan
                                       ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: George Dunlap @ 2013-05-03 16:41 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Andres Lagar-Cavilla, Peter Maloney,
	suravee.suthikulpanit@amd.com, Jan Beulich,
	xen-devel@lists.xen.org

On 02/05/13 16:48, Tim Deegan wrote:
> At 15:21 +0200 on 29 Apr (1367248894), Peter Maloney wrote:
>> On 04/04/2013 07:05 PM, Tim Deegan wrote:
>>> Also, if there is still a bad slowdown, caused by the p2m lookups, this
>>> might help a little bit:
>>>
>>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>>> index 38e87ce..7bd8646 100644
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -1361,6 +1361,18 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
>>>           }
>>>       }
>>>   
>>> +
>>> +    /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
>>> +     * a fast path for LAPIC accesses, skipping the p2m lookup. */
>>> +    if ( !nestedhvm_vcpu_in_guestmode(v)
>>> +         && gfn == vlapic_base_address(vcpu_vlapic(current)) >> PAGE_SHIFT )
>>> +    {
>>> +        if ( !handle_mmio() )
>>> +            hvm_inject_hw_exception(TRAP_gp_fault, 0);
>>> +        rc = 1;
>>> +        goto out;
>>> +    }
>>> +
>>>       p2m = p2m_get_hostp2m(v->domain);
>>>       mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma,
>>>                                 P2M_ALLOC | (access_w ? P2M_UNSHARE : 0), NULL);
>> This patch (applied to 4.2.2) has a very large improvement on my box
>> (AMD FX-8150) and WinXP 32 bit.
> Hmm - I expected it to be only a mild improvement.  How about this one,
> which puts in the same shortcut in another place as well?  I don't think
> it will be much better than the last one, but it's worth a try.

So I dusted off my old perf testing scripts and added in one to measure 
boot performance.

Below are boot times, from after "xl create" returns, until a specific 
python daemon running in the VM starts responding to requests.  So lower 
is better.

There are a number of places where there can be a few seconds of noise 
either way, but on the whole the tests seem fairly repeatable.

I ran this with w2k3eesp2 and with winxpsp3, using some of the 
auto-install test images made for the XenServer regression testing. All 
of them are using a flat file disk backend with qemu-traditional.

Results are in order of commits:

Xen 4.1:

w2k3: 43 34 34 33 34
winxp: 110 111 111 110 112

Xen 4.2:

w2k3: 34 44 45 45 45
winxp: 203 221 210 211 200

Xen-unstable w/ RTC fix:

w2k3: 43 44 44 45 44
winxp: 268 275 265 276 265

Xen-unstable with rtc fix + this "fast lapic" patch:

w2k3: 43 45 44 45 45
winxp: 224 232 232 232 232

So w2k3 boots fairly quickly anyway; has a 50% slow-down when moving 
from 4.1 to 4.2, and no discernible change after that.

winxp boots fairly slowly; nearly doubles in speed for 4.2, and gets 
even worse for xen-unstable.  The patch is a measurable improvement, but 
still nowhere near 4.1, or even 4.2.

On the whole however -- I'm not sure that boot time by itself is a 
blocker.  If the problem really is primarily the "eager TPR" issue for 
Windows XP, then I'm not terribly motivated either: the Citrix PV 
drivers patch Windows XP to modify the routine to be lazy (like w2k3); 
there is hardware available which allows the TPR to be virtualized; and 
there are plenty of Windows-based OSes available which do not have this 
problem.

I'll be doing some more workload-based benchmarks (probably starting 
with the Windows ddk example build) to see if there are other issues I 
turn up.

  -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-05-03 16:41                   ` George Dunlap
@ 2013-05-03 16:59                     ` Tim Deegan
  2013-05-04 10:47                     ` Pasi Kärkkäinen
  2013-05-07 13:15                     ` George Dunlap
  2 siblings, 0 replies; 53+ messages in thread
From: Tim Deegan @ 2013-05-03 16:59 UTC (permalink / raw)
  To: George Dunlap
  Cc: Andres Lagar-Cavilla, Peter Maloney,
	suravee.suthikulpanit@amd.com, Jan Beulich,
	xen-devel@lists.xen.org

At 17:41 +0100 on 03 May (1367602895), George Dunlap wrote:
> winxp boots fairly slowly; nearly doubles in speed for 4.2, and gets 
> even worse for xen-unstable.  The patch is a measurable improvement, but 
> still nowhere near 4.1, or even 4.2.

Ergh. :(

> On the whole however -- I'm not sure that boot time by itself is a 
> blocker.  If the problem really is primarily the "eager TPR" issue for 
> Windows XP, then I'm not terribly motivated either: the Citrix PV 
> drivers patch Windows XP to modify the routine to be lazy (like w2k3); 
> there is hardware available which allows the TPR to be virtualized;

One reason I was chasing this is that the AMD hardware acceleration for
TPR (via the CR8 register) needs software changes in the OS to make
use of it (which XP doesn't have).  The Intel acceleration works fine
for XP, AFAICT.

Tim.

> and 
> there are plenty of Windows-based OSes available which do not have this 
> problem.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-05-03 16:41                   ` George Dunlap
  2013-05-03 16:59                     ` Tim Deegan
@ 2013-05-04 10:47                     ` Pasi Kärkkäinen
  2013-05-07 14:55                       ` George Dunlap
  2013-05-07 13:15                     ` George Dunlap
  2 siblings, 1 reply; 53+ messages in thread
From: Pasi Kärkkäinen @ 2013-05-04 10:47 UTC (permalink / raw)
  To: George Dunlap
  Cc: suravee.suthikulpanit@amd.com, Tim Deegan,
	xen-devel@lists.xen.org, Jan Beulich, Andres Lagar-Cavilla,
	Peter Maloney

On Fri, May 03, 2013 at 05:41:35PM +0100, George Dunlap wrote:
> On 02/05/13 16:48, Tim Deegan wrote:
> >At 15:21 +0200 on 29 Apr (1367248894), Peter Maloney wrote:
> >>On 04/04/2013 07:05 PM, Tim Deegan wrote:
> >>>Also, if there is still a bad slowdown, caused by the p2m lookups, this
> >>>might help a little bit:
> >>>
> >>>diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> >>>index 38e87ce..7bd8646 100644
> >>>--- a/xen/arch/x86/hvm/hvm.c
> >>>+++ b/xen/arch/x86/hvm/hvm.c
> >>>@@ -1361,6 +1361,18 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
> >>>          }
> >>>      }
> >>>+
> >>>+    /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
> >>>+     * a fast path for LAPIC accesses, skipping the p2m lookup. */
> >>>+    if ( !nestedhvm_vcpu_in_guestmode(v)
> >>>+         && gfn == vlapic_base_address(vcpu_vlapic(current)) >> PAGE_SHIFT )
> >>>+    {
> >>>+        if ( !handle_mmio() )
> >>>+            hvm_inject_hw_exception(TRAP_gp_fault, 0);
> >>>+        rc = 1;
> >>>+        goto out;
> >>>+    }
> >>>+
> >>>      p2m = p2m_get_hostp2m(v->domain);
> >>>      mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma,
> >>>                                P2M_ALLOC | (access_w ? P2M_UNSHARE : 0), NULL);
> >>This patch (applied to 4.2.2) has a very large improvement on my box
> >>(AMD FX-8150) and WinXP 32 bit.
> >Hmm - I expected it to be only a mild improvement.  How about this one,
> >which puts in the same shortcut in another place as well?  I don't think
> >it will be much better than the last one, but it's worth a try.
> 
> So I dusted off my old perf testing scripts and added in one to
> measure boot performance.
> 
> Below are boot times, from after "xl create" returns, until a
> specific python daemon running in the VM starts responding to
> requests.  So lower is better.
> 
> There are a number of places where there can be a few seconds of
> noise either way, but on the whole the tests seem fairly repeatable.
> 
> I ran this with w2k3eesp2 and with winxpsp3, using some of the
> auto-install test images made for the XenServer regression testing.
> All of them are using a flat file disk backend with
> qemu-traditional.
> 
> Results are in order of commits:
> 
> Xen 4.1:
> 
> w2k3: 43 34 34 33 34
> winxp: 110 111 111 110 112
> 
> Xen 4.2:
> 
> w2k3: 34 44 45 45 45
> winxp: 203 221 210 211 200
> 
> Xen-unstable w/ RTC fix:
> 
> w2k3: 43 44 44 45 44
> winxp: 268 275 265 276 265
> 
> Xen-unstable with rtc fix + this "fast lapic" patch:
> 
> w2k3: 43 45 44 45 45
> winxp: 224 232 232 232 232
> 
> 
> So w2k3 boots fairly quickly anyway; has a 50% slow-down when moving
> from 4.1 to 4.2, and no discernible change after that.
> 
> winxp boots fairly slowly; nearly doubles in speed for 4.2, and gets
> even worse for xen-unstable.  The patch is a measurable improvement,
> but still nowhere near 4.1, or even 4.2.
> 
> On the whole however -- I'm not sure that boot time by itself is a
> blocker.  If the problem really is primarily the "eager TPR" issue
> for Windows XP, then I'm not terribly motivated either: the Citrix
> PV drivers patch Windows XP to modify the routine to be lazy (like
> w2k3); there is hardware available which allows the TPR to be
> virtualized; and there are plenty of Windows-based OSes available
> which do not have this problem.
> 

A couple of questions:

- Does Citrix XenServer Windows PV driver work with vanilla Xen 4.2.x? I remember someone complaining on the list that it doesn't work.. (but I'm not sure about that).

- Does GPLPV do the lazy patching for WinXP on AMD?


-- Pasi

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-05-03 16:41                   ` George Dunlap
  2013-05-03 16:59                     ` Tim Deegan
  2013-05-04 10:47                     ` Pasi Kärkkäinen
@ 2013-05-07 13:15                     ` George Dunlap
  2013-05-07 15:35                       ` George Dunlap
  2 siblings, 1 reply; 53+ messages in thread
From: George Dunlap @ 2013-05-07 13:15 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Andres Lagar-Cavilla, Peter Maloney, Jan Beulich,
	suravee.suthikulpanit@amd.com, xen-devel@lists.xen.org

On Fri, May 3, 2013 at 5:41 PM, George Dunlap
<george.dunlap@eu.citrix.com> wrote:

> I'll be doing some more workload-based benchmarks (probably starting with
> the Windows ddk example build) to see if there are other issues I turn up.

So here are my results with ddk-build for Windows 2003 (which again
have the "lazy IRQL" feature, and so aren't impacted as hard by the
extra time processing).  It's a "time to complete" test, so lower is
better.  (I recommend ignoring the first run, as it will be warming up
the disk cache.)

Xen 4.1: 223 167 167 170 165

Xen 4.2: 216 140 145 145 150

Xen-unstable: 227 200 190 200 210

Xen-unstable+lapic: 246 175 175 180 175

So it appears that there has been a regression from 4.1, but since 4.2
is actually significantly *better* than 4.1, it's probably not related
to the c/s we've been discussing.

In any case, the lapic patch seems to give a measurable advantage, so
it's probably worth putting in.

I'm going to try doing some tests of the same builds on an Intel box
and see what we get as well.  Not sure XP is worth doing, as each
build is going to take forever...

 -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-04-29 13:21               ` Peter Maloney
  2013-05-02 15:48                 ` Tim Deegan
@ 2013-05-07 13:56                 ` Pasi Kärkkäinen
  2013-05-07 14:57                   ` George Dunlap
  1 sibling, 1 reply; 53+ messages in thread
From: Pasi Kärkkäinen @ 2013-05-07 13:56 UTC (permalink / raw)
  To: Peter Maloney
  Cc: Jan Beulich, George Dunlap, Tim Deegan, xen-devel,
	suravee.suthikulpanit, Andres Lagar-Cavilla

On Mon, Apr 29, 2013 at 03:21:34PM +0200, Peter Maloney wrote:
> On 04/04/2013 07:05 PM, Tim Deegan wrote:
> > At 16:23 +0100 on 04 Apr (1365092601), Tim Deegan wrote:
> >> At 11:34 -0400 on 03 Apr (1364988853), Andres Lagar-Cavilla wrote:
> >>> On Apr 3, 2013, at 6:53 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> >> Yes, 4.2 is definitely slower.  A compile test on a 4-vcpu VM that takes
> >> about 12 minutes before this locking change takes more than 20 minutes
> >> on the current tip of xen-unstable (I gave up at 22 minutes and rebooted
> >> to test something else).
> > I did a bit of prodding at this, but messed up my measurements in a
> > bunch of different ways over the afternoon. :(  I'm going to be away
> > from my test boxes for a couple of weeks now, so all I can say is, if
> > you're investigating this bug, beware that:
> >
> >  - the revision before this change still has the RTC bugs that were
> >    fixed last week, so don't measure performance based on guest
> >    wallclock time, or your 'before' perf will look too good.
> >  - the current unstable tip has test code to exercise the new
> >    map_domain_page(), which will badly affect all the many memory
> >    accesses done in HVM emulation, so make sure you use debug=n builds
> >    for measurement.
> >
> > Also, if there is still a bad slowdown, caused by the p2m lookups, this
> > might help a little bit:
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index 38e87ce..7bd8646 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -1361,6 +1361,18 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
> >          }
> >      }
> >  
> > +
> > +    /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
> > +     * a fast path for LAPIC accesses, skipping the p2m lookup. */
> > +    if ( !nestedhvm_vcpu_in_guestmode(v)
> > +         && gfn == vlapic_base_address(vcpu_vlapic(current)) >> PAGE_SHIFT )
> > +    {
> > +        if ( !handle_mmio() )
> > +            hvm_inject_hw_exception(TRAP_gp_fault, 0);
> > +        rc = 1;
> > +        goto out;
> > +    }
> > +
> >      p2m = p2m_get_hostp2m(v->domain);
> >      mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 
> >                                P2M_ALLOC | (access_w ? P2M_UNSHARE : 0), NULL);
> This patch (applied to 4.2.2) has a very large improvement on my box
> (AMD FX-8150) and WinXP 32 bit.
> 
> It only took about 2.5 minutes to log in and see task manager. It takes
> about 6 minutes without the patch. And 2.5 minutes is still terrible,
> but obviously better.
>

So is the problem only on WinXP with "booting up / logging in to windows", 
or do you see performance regressions on some actual benchmark tools aswell 
(after windows has started up) ?

-- Pasi

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-05-04 10:47                     ` Pasi Kärkkäinen
@ 2013-05-07 14:55                       ` George Dunlap
  2013-05-07 22:23                         ` James Harper
  0 siblings, 1 reply; 53+ messages in thread
From: George Dunlap @ 2013-05-07 14:55 UTC (permalink / raw)
  To: Pasi Kärkkäinen
  Cc: James Harper, suravee.suthikulpanit@amd.com, Tim (Xen.org),
	xen-devel@lists.xen.org, Jan Beulich, Andres Lagar-Cavilla,
	Peter Maloney

On 04/05/13 11:47, Pasi Kärkkäinen wrote:
> On Fri, May 03, 2013 at 05:41:35PM +0100, George Dunlap wrote:
>> On 02/05/13 16:48, Tim Deegan wrote:
>>> At 15:21 +0200 on 29 Apr (1367248894), Peter Maloney wrote:
>>>> On 04/04/2013 07:05 PM, Tim Deegan wrote:
>>>>> Also, if there is still a bad slowdown, caused by the p2m lookups, this
>>>>> might help a little bit:
>>>>>
>>>>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>>>>> index 38e87ce..7bd8646 100644
>>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>>> @@ -1361,6 +1361,18 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
>>>>>           }
>>>>>       }
>>>>> +
>>>>> +    /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
>>>>> +     * a fast path for LAPIC accesses, skipping the p2m lookup. */
>>>>> +    if ( !nestedhvm_vcpu_in_guestmode(v)
>>>>> +         && gfn == vlapic_base_address(vcpu_vlapic(current)) >> PAGE_SHIFT )
>>>>> +    {
>>>>> +        if ( !handle_mmio() )
>>>>> +            hvm_inject_hw_exception(TRAP_gp_fault, 0);
>>>>> +        rc = 1;
>>>>> +        goto out;
>>>>> +    }
>>>>> +
>>>>>       p2m = p2m_get_hostp2m(v->domain);
>>>>>       mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma,
>>>>>                                 P2M_ALLOC | (access_w ? P2M_UNSHARE : 0), NULL);
>>>> This patch (applied to 4.2.2) has a very large improvement on my box
>>>> (AMD FX-8150) and WinXP 32 bit.
>>> Hmm - I expected it to be only a mild improvement.  How about this one,
>>> which puts in the same shortcut in another place as well?  I don't think
>>> it will be much better than the last one, but it's worth a try.
>> So I dusted off my old perf testing scripts and added in one to
>> measure boot performance.
>>
>> Below are boot times, from after "xl create" returns, until a
>> specific python daemon running in the VM starts responding to
>> requests.  So lower is better.
>>
>> There are a number of places where there can be a few seconds of
>> noise either way, but on the whole the tests seem fairly repeatable.
>>
>> I ran this with w2k3eesp2 and with winxpsp3, using some of the
>> auto-install test images made for the XenServer regression testing.
>> All of them are using a flat file disk backend with
>> qemu-traditional.
>>
>> Results are in order of commits:
>>
>> Xen 4.1:
>>
>> w2k3: 43 34 34 33 34
>> winxp: 110 111 111 110 112
>>
>> Xen 4.2:
>>
>> w2k3: 34 44 45 45 45
>> winxp: 203 221 210 211 200
>>
>> Xen-unstable w/ RTC fix:
>>
>> w2k3: 43 44 44 45 44
>> winxp: 268 275 265 276 265
>>
>> Xen-unstable with rtc fix + this "fast lapic" patch:
>>
>> w2k3: 43 45 44 45 45
>> winxp: 224 232 232 232 232
>>
>>
>> So w2k3 boots fairly quickly anyway; has a 50% slow-down when moving
>> from 4.1 to 4.2, and no discernible change after that.
>>
>> winxp boots fairly slowly; nearly doubles in speed for 4.2, and gets
>> even worse for xen-unstable.  The patch is a measurable improvement,
>> but still nowhere near 4.1, or even 4.2.
>>
>> On the whole however -- I'm not sure that boot time by itself is a
>> blocker.  If the problem really is primarily the "eager TPR" issue
>> for Windows XP, then I'm not terribly motivated either: the Citrix
>> PV drivers patch Windows XP to modify the routine to be lazy (like
>> w2k3); there is hardware available which allows the TPR to be
>> virtualized; and there are plenty of Windows-based OSes available
>> which do not have this problem.
>>
> A couple of questions:
>
> - Does Citrix XenServer Windows PV driver work with vanilla Xen 4.2.x? I remember someone complaining on the list that it doesn't work.. (but I'm not sure about that).

I did a quick test of the XS 6.0.2 drivers on unstable and they didn't 
work.  Didn't do any debugging, however.

> - Does GPLPV do the lazy patching for WinXP on AMD?

I highly doubt it, but you'd have to ask James Harper.

  -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-05-07 13:56                 ` Pasi Kärkkäinen
@ 2013-05-07 14:57                   ` George Dunlap
  0 siblings, 0 replies; 53+ messages in thread
From: George Dunlap @ 2013-05-07 14:57 UTC (permalink / raw)
  To: Pasi Kärkkäinen
  Cc: Jan Beulich, Tim (Xen.org), xen-devel@lists.xen.org,
	suravee.suthikulpanit@amd.com, Andres Lagar-Cavilla,
	Peter Maloney

On 07/05/13 14:56, Pasi Kärkkäinen wrote:
> On Mon, Apr 29, 2013 at 03:21:34PM +0200, Peter Maloney wrote:
>> On 04/04/2013 07:05 PM, Tim Deegan wrote:
>>> At 16:23 +0100 on 04 Apr (1365092601), Tim Deegan wrote:
>>>> At 11:34 -0400 on 03 Apr (1364988853), Andres Lagar-Cavilla wrote:
>>>>> On Apr 3, 2013, at 6:53 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>>> Yes, 4.2 is definitely slower.  A compile test on a 4-vcpu VM that takes
>>>> about 12 minutes before this locking change takes more than 20 minutes
>>>> on the current tip of xen-unstable (I gave up at 22 minutes and rebooted
>>>> to test something else).
>>> I did a bit of prodding at this, but messed up my measurements in a
>>> bunch of different ways over the afternoon. :(  I'm going to be away
>>> from my test boxes for a couple of weeks now, so all I can say is, if
>>> you're investigating this bug, beware that:
>>>
>>>   - the revision before this change still has the RTC bugs that were
>>>     fixed last week, so don't measure performance based on guest
>>>     wallclock time, or your 'before' perf will look too good.
>>>   - the current unstable tip has test code to exercise the new
>>>     map_domain_page(), which will badly affect all the many memory
>>>     accesses done in HVM emulation, so make sure you use debug=n builds
>>>     for measurement.
>>>
>>> Also, if there is still a bad slowdown, caused by the p2m lookups, this
>>> might help a little bit:
>>>
>>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>>> index 38e87ce..7bd8646 100644
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -1361,6 +1361,18 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
>>>           }
>>>       }
>>>   
>>> +
>>> +    /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
>>> +     * a fast path for LAPIC accesses, skipping the p2m lookup. */
>>> +    if ( !nestedhvm_vcpu_in_guestmode(v)
>>> +         && gfn == vlapic_base_address(vcpu_vlapic(current)) >> PAGE_SHIFT )
>>> +    {
>>> +        if ( !handle_mmio() )
>>> +            hvm_inject_hw_exception(TRAP_gp_fault, 0);
>>> +        rc = 1;
>>> +        goto out;
>>> +    }
>>> +
>>>       p2m = p2m_get_hostp2m(v->domain);
>>>       mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma,
>>>                                 P2M_ALLOC | (access_w ? P2M_UNSHARE : 0), NULL);
>> This patch (applied to 4.2.2) has a very large improvement on my box
>> (AMD FX-8150) and WinXP 32 bit.
>>
>> It only took about 2.5 minutes to log in and see task manager. It takes
>> about 6 minutes without the patch. And 2.5 minutes is still terrible,
>> but obviously better.
>>
> So is the problem only on WinXP with "booting up / logging in to windows",
> or do you see performance regressions on some actual benchmark tools aswell
> (after windows has started up) ?

For the sake of people watching this thread: The last 4-5 mails I've 
sent to Peter Maloney have bounced with "Mailbox Full" messages; so it's 
possible he's not actually hearing this part of the discussion...

  -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-05-07 13:15                     ` George Dunlap
@ 2013-05-07 15:35                       ` George Dunlap
  0 siblings, 0 replies; 53+ messages in thread
From: George Dunlap @ 2013-05-07 15:35 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Andres Lagar-Cavilla, Peter Maloney, Jan Beulich,
	suravee.suthikulpanit@amd.com, xen-devel@lists.xen.org

On Tue, May 7, 2013 at 2:15 PM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> On Fri, May 3, 2013 at 5:41 PM, George Dunlap
> <george.dunlap@eu.citrix.com> wrote:
>
>> I'll be doing some more workload-based benchmarks (probably starting with
>> the Windows ddk example build) to see if there are other issues I turn up.
>
> So here are my results with ddk-build for Windows 2003 (which again
> have the "lazy IRQL" feature, and so aren't impacted as hard by the
> extra time processing).  It's a "time to complete" test, so lower is
> better.  (I recommend ignoring the first run, as it will be warming up
> the disk cache.)
>
> Xen 4.1: 223 167 167 170 165
>
> Xen 4.2: 216 140 145 145 150
>
> Xen-unstable: 227 200 190 200 210
>
> Xen-unstable+lapic: 246 175 175 180 175

If anyone's interested, the numbers on my Intel box (which does I
believe have the vlapic stuff) are:

Xen 4.1: 110 70 65 70 70
Xen 4.2: 110 70 65 65 65
unstable: 115 70 70 70 71
unstable+lapic: 75 65 65 65 65

There seems to be a bit of a quantization effect, so I'm not sure I
would read much into the differences in the result here, except to
conclude that the fast lapic patch doesn't seem to hurt Intel.  It
should, however, reduce suspicion from other things which my have
changed (e.g. regressions in qemu, &c).

 -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-05-07 14:55                       ` George Dunlap
@ 2013-05-07 22:23                         ` James Harper
  2013-05-08  9:00                           ` George Dunlap
  0 siblings, 1 reply; 53+ messages in thread
From: James Harper @ 2013-05-07 22:23 UTC (permalink / raw)
  To: George Dunlap, Pasi Kärkkäinen
  Cc: suravee.suthikulpanit@amd.com, Tim (Xen.org),
	xen-devel@lists.xen.org, Jan Beulich, Andres Lagar-Cavilla,
	Peter Maloney

> > A couple of questions:
> >
> > - Does Citrix XenServer Windows PV driver work with vanilla Xen 4.2.x? I
> remember someone complaining on the list that it doesn't work.. (but I'm
> not sure about that).
> 
> I did a quick test of the XS 6.0.2 drivers on unstable and they didn't
> work.  Didn't do any debugging, however.
> 
> > - Does GPLPV do the lazy patching for WinXP on AMD?
> 
> I highly doubt it, but you'd have to ask James Harper.
> 

GPLPV does do some TPR patching. You need to add the /PATCHTPR option to your boot.ini. It works for 2000 as well (and 2003 before MS stopped using TPR at all in sp2), if anyone cares :)

For AMD, TPR access is changed to a LOCK MOVE CR8 instruction which enables setting of TPR without a VMEXIT. For Intel, TPR writes are only done if it would change the value of TPR, and reads are always done from a cached value. I guess this is what you mean by 'lazy'.

I think xen itself does TPR optimisation for Intel these days so this may be unnecessary.

It certainly makes a big difference for XP.

James

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Xen 4.3 development update
  2013-05-07 22:23                         ` James Harper
@ 2013-05-08  9:00                           ` George Dunlap
       [not found]                             ` <6035A0D088A63A46850C3988ED045A4B57B45CBE@BITCOM1.int.sbss.com.au>
  0 siblings, 1 reply; 53+ messages in thread
From: George Dunlap @ 2013-05-08  9:00 UTC (permalink / raw)
  To: James Harper
  Cc: suravee.suthikulpanit@amd.com, Tim (Xen.org),
	xen-devel@lists.xen.org, Jan Beulich, Andres Lagar-Cavilla,
	Peter Maloney

On 07/05/13 23:23, James Harper wrote:
>>> A couple of questions:
>>>
>>> - Does Citrix XenServer Windows PV driver work with vanilla Xen 4.2.x? I
>> remember someone complaining on the list that it doesn't work.. (but I'm
>> not sure about that).
>>
>> I did a quick test of the XS 6.0.2 drivers on unstable and they didn't
>> work.  Didn't do any debugging, however.
>>
>>> - Does GPLPV do the lazy patching for WinXP on AMD?
>> I highly doubt it, but you'd have to ask James Harper.
>>
> GPLPV does do some TPR patching. You need to add the /PATCHTPR option to your boot.ini. It works for 2000 as well (and 2003 before MS stopped using TPR at all in sp2), if anyone cares :)
>
> For AMD, TPR access is changed to a LOCK MOVE CR8 instruction which enables setting of TPR without a VMEXIT. For Intel, TPR writes are only done if it would change the value of TPR, and reads are always done from a cached value. I guess this is what you mean by 'lazy'.
>
> I think xen itself does TPR optimisation for Intel these days so this may be unnecessary.
>
> It certainly makes a big difference for XP.

Well the context of this thread is a set of changes that makes the 
non-lazy TPR exits *much much* more expensive on AMD hardware.  The 
existence of a widely-available set of drivers as a work-round would be 
a pretty important factor in how we decide to proceed.

So if I just download your latest drivers and add /PATCHTPR on the 
boot.ini, the AMD TPR patching should work?

  -George

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Suspicious URL:Re:  Xen 4.3 development update
       [not found]                                 ` <518B6B36.3050404@eu.citrix.com>
@ 2013-05-12  7:22                                   ` James Harper
  0 siblings, 0 replies; 53+ messages in thread
From: James Harper @ 2013-05-12  7:22 UTC (permalink / raw)
  To: George Dunlap
  Cc: suravee.suthikulpanit@amd.com, Tim (Xen.org),
	xen-devel@lists.xen.org, Jan Beulich, Andres Lagar-Cavilla,
	Peter Maloney

> > The gplpv-no-patchtpr times are comparable to the times booting without
> > gplpv drivers.  So the patchtpr seems to work pretty well. This is WinXP
> > SP3, with whatever version we use for testing in XenServer.  Possible,
> > as you say, that newer patches break things.
> 
> Hmm, but booting the same image on an Intel box (with /patchtpr) causes
> the VM to crash at boot. :-(  Seems to work fine w/o the switch though.
> 

What sort of crash are you getting? I managed to round up an Intel box and tested it and xp blows up before it even gets a chance to log anything to /var/log/xen/qemu-dm-<domu>.log, even with no /patchtpr, but my intel box is running 4.2.0 (from debian experimental) while my amd box is running 4.1.2 (from debian wheezy), so maybe there is something in that. Did I see a thread about 2003 failing recently? Win 2012 seems to work just fine.

James

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2013-05-12  7:22 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-02 14:07 Xen 4.3 development update George Dunlap
2013-04-02 15:42 ` Jan Beulich
2013-04-02 15:45   ` Suravee Suthikulanit
2013-04-02 15:51     ` George Dunlap
2013-04-02 16:34   ` Tim Deegan
2013-04-02 16:47     ` Suravee Suthikulpanit
2013-04-04 10:57       ` Tim Deegan
2013-04-02 17:06     ` Suravee Suthikulpanit
2013-04-02 23:48       ` Suravee Suthikulanit
2013-04-03 10:51         ` George Dunlap
2013-04-04 15:29           ` Suravee Suthikulanit
2013-04-04 17:14           ` Suravee Suthikulanit
2013-04-05 13:43             ` George Dunlap
2013-04-03  8:37       ` Christoph Egger
2013-04-03 10:49         ` George Dunlap
2013-04-04 12:19           ` xenalyze (was: Re: Xen 4.3 development update) Christoph Egger
2013-04-04 12:51             ` xenalyze George Dunlap
2013-04-03  7:27     ` Xen 4.3 development update Jan Beulich
2013-04-03 10:53       ` George Dunlap
2013-04-03 15:34         ` Andres Lagar-Cavilla
2013-04-04 15:23           ` Tim Deegan
2013-04-04 17:05             ` Tim Deegan
2013-04-29 13:21               ` Peter Maloney
2013-05-02 15:48                 ` Tim Deegan
2013-05-03 16:41                   ` George Dunlap
2013-05-03 16:59                     ` Tim Deegan
2013-05-04 10:47                     ` Pasi Kärkkäinen
2013-05-07 14:55                       ` George Dunlap
2013-05-07 22:23                         ` James Harper
2013-05-08  9:00                           ` George Dunlap
     [not found]                             ` <6035A0D088A63A46850C3988ED045A4B57B45CBE@BITCOM1.int.sbss.com.au>
     [not found]                               ` <518A5143.5090308@eu.citrix.com>
     [not found]                                 ` <518B6B36.3050404@eu.citrix.com>
2013-05-12  7:22                                   ` Suspicious URL:Re: " James Harper
2013-05-07 13:15                     ` George Dunlap
2013-05-07 15:35                       ` George Dunlap
2013-05-07 13:56                 ` Pasi Kärkkäinen
2013-05-07 14:57                   ` George Dunlap
2013-04-25 15:20             ` George Dunlap
2013-04-25 15:26               ` George Dunlap
2013-04-25 15:46               ` Tim Deegan
2013-04-25 15:50                 ` George Dunlap
2013-05-03  9:35               ` George Dunlap
2013-04-25 13:51           ` Xen 4.3 development update / winxp AMD performance regression Pasi Kärkkäinen
2013-04-25 14:00             ` George Dunlap
2013-04-25 14:24               ` Andres Lagar-Cavilla
2013-04-28 10:18                 ` Peter Maloney
2013-04-29  9:01                   ` George Dunlap
2013-04-09  2:03 ` Xen 4.3 development update Dario Faggioli
2013-04-10 12:12 ` Ian Campbell
2013-04-10 12:15 ` Ian Campbell
2013-04-10 16:41 ` Konrad Rzeszutek Wilk
2013-04-11  9:28   ` George Dunlap
2013-04-11  9:33     ` Ian Campbell
2013-04-11  9:43       ` George Dunlap
2013-04-11  9:49         ` Ian Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).