* xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
@ 2012-10-20 17:21 Peter Maloney
2012-10-20 18:40 ` Peter Maloney
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Peter Maloney @ 2012-10-20 17:21 UTC (permalink / raw)
To: xen-devel; +Cc: Andres Lagar-Cavilla
I ran a bisect to find out when Windows XP 32 bit becomes unusably slow.
And I found the changeset that caused it.
==========
The problem:
==========
Windows 8 64 bit and 32 bit run fast and fine in the newest xen versions.
Windows XP 32 bit runs unusably slow in anything new that I built from
xen-unstable, but runs fast in 4.1.2 and 4.1.3 stable. While it is
running slow, "xm top" or "xl top" show cpu usage around 650% for the domu.
The bug might be AMD specific. I'm running an AMD FX-8150.
==========
The result:
==========
good: 24769:730f6ed72d70
bad: 24770:7f79475d3de7
The change was 8 months ago
changeset: 24770:7f79475d3de7
user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
date: Fri Feb 10 16:07:07 2012 +0000
summary: x86/mm: Make p2m lookups fully synchronized wrt modifications
==========
My hardware:
==========
AMD FX-8150
990 FX chipset
Here's a dmidecode: http://pastebin.com/XUZjmiVz
==========
My kernel:
==========
I compiled the for-linus branch of cmason's linux-btrfs git repo, around
August 11th (
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
for-linus )
peter:~/xen # uname -a
Linux peter 3.5.0-1-default+ #3 SMP Sat Aug 11 21:30:44 CEST 2012 x86_64
x86_64 x86_64 GNU/Linux
Here's the kernel config: http://pastebin.com/1GQbiFZE (only weird thing
I set was CONFIG_NR_CPUS=16 for no particular reason; default was 512 or
256)
==========
My Windows XP VM config:
==========
# grep -vE "^#|^$" windowsxp2
name="windowsxp2"
description="None"
uuid="292b0651-9913-2459-5cfa-fb828f9c4314"
memory=4096
maxmem=4096
vcpus=7
on_poweroff="destroy"
on_reboot="restart"
on_crash="destroy"
localtime=1
keymap="en-us"
builder="hvm"
device_model="/usr/lib/xen/bin/qemu-dm"
kernel="/usr/lib/xen/boot/hvmloader"
boot="c"
disk=[ 'phy:/dev/data/winxp1_disk1,hda,w',
'file:/var/lib/xen/winxp1_disk2.raw,hdb,w', ]
vif=[ 'mac=00:16:3e:4e:c5:0c,bridge=br0,model=e1000', ]
sdl=0
vnc=1
vncunused=1
audio=0
soundhw='es1370'
viridian=1
usb=1
acpi=1
apic=0
pae=1
usbdevice='tablet'
serial="pty"
stdvga=1
gfx_passthru=0
# this is an AMD Radeon HD 6770 and it's HDMI audio, and 2 USB ports
pci = [ '04:00.0' , '04:00.1' , '00:12.0' , '00:12.2' ]
xen_platform_pci=1
pci_msitranslate=1
The Windows 8 32 and 64 bit configs I used are the same except changed
mac address, and different disk.
Whether or not I use sound or PCI passthrough doesn't (significantly)
affect performance.
==========
my build process, including how to hack the build so it actually compiles:
==========
# Install older libyajl-devel
On openSUSE, this would be:
zypper install libyajl1-devel
# Delete everything (except .hg)... prevents unclean builds from
breaking things. make distclean is not enough for very many builds.
cd xen-unstable.hg
rm -rf *
# If you have permission denied errors (caused by running make install
as root earlier), make sure to use chown and run rm again, or builds
will fail.
# Check out the revision
hg update --clean "${build}"
# hack up a troublesome Makefile that prevents builds
vim tools/libxl/Makefile
add "-lyajl":
at the end of all 4 "$(CC) ..." lines
to LIBXL_LIBS
to LIBXLU_LIBS
to LIBUUID_LIBS
(don't know which ones are important... but it works with all of it)
make distclean >/tmp/xen.distclean.log 2>&1 ; status=$? ; echo $status
if [ -e configure ]; then
./configure
else
touch .config
fi
make dist >/tmp/xen.dist.log 2>&1 ; status=$? ; echo $status
==========
my install process
==========
To install the build, it's important to clean out old lib files...
uninstall doesn't get them all. If you miss these, xm, xl, etc. may fail
due to shared library issues.
Also, "make uninstall" deletes important system files it should not
(kernel, kernel modules, vm disks).
As it says in the "make help":
uninstall - attempt to remove installed Xen tools
(use with extreme care!)
Here is my process to solve the uninstall issues:
http://pastebin.com/nXCavFTp
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
2012-10-20 17:21 xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7 Peter Maloney
@ 2012-10-20 18:40 ` Peter Maloney
2012-10-22 13:56 ` Andres Lagar-Cavilla
2012-10-22 13:59 ` Tim Deegan
2 siblings, 0 replies; 16+ messages in thread
From: Peter Maloney @ 2012-10-20 18:40 UTC (permalink / raw)
To: xen-devel
And so Pasi suggested on IRC that I try with 2 vcpus, and in this
situation it still runs slow, but it's usably slow.
When it first boots, "xl top" shows pretty high cpu usage, and then it
goes down eventually and it's harder to notice it is slower than usual.
By comparison, with 4-8 vcpus, it is unbearably slow, and would take you
probably 10 minutes to even log in, but also the cpu would go down over
time.
And also with 2 vcpus, in task manager, you can see the processes using
CPU seem to be using much more than they should. So when the cpu usage
is lower later on, while the system is still idle, a bunch of processes
are using 2 or 3% each adding up to 20-50% (fluctuating). And with 2
vcpus, the mouse seems faster.
And then I tested minecraft, which runs only 5-20 fps. So it's
definitely still slow, but usable for non-3d stuff.
On 10/20/2012 07:21 PM, Peter Maloney wrote:
> I ran a bisect to find out when Windows XP 32 bit becomes unusably slow.
> And I found the changeset that caused it.
>
> ==========
> The problem:
> ==========
>
> Windows 8 64 bit and 32 bit run fast and fine in the newest xen versions.
>
> Windows XP 32 bit runs unusably slow in anything new that I built from
> xen-unstable, but runs fast in 4.1.2 and 4.1.3 stable. While it is
> running slow, "xm top" or "xl top" show cpu usage around 650% for the domu.
>
> The bug might be AMD specific. I'm running an AMD FX-8150.
>
> ==========
> The result:
> ==========
>
> good: 24769:730f6ed72d70
> bad: 24770:7f79475d3de7
>
> The change was 8 months ago
>
> changeset: 24770:7f79475d3de7
> user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
> date: Fri Feb 10 16:07:07 2012 +0000
> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications
>
> ==========
> My hardware:
> ==========
>
> AMD FX-8150
> 990 FX chipset
>
> Here's a dmidecode: http://pastebin.com/XUZjmiVz
>
> ==========
> My kernel:
> ==========
>
> I compiled the for-linus branch of cmason's linux-btrfs git repo, around
> August 11th (
> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
> for-linus )
>
> peter:~/xen # uname -a
> Linux peter 3.5.0-1-default+ #3 SMP Sat Aug 11 21:30:44 CEST 2012 x86_64
> x86_64 x86_64 GNU/Linux
>
> Here's the kernel config: http://pastebin.com/1GQbiFZE (only weird thing
> I set was CONFIG_NR_CPUS=16 for no particular reason; default was 512 or
> 256)
>
> ==========
> My Windows XP VM config:
> ==========
>
> # grep -vE "^#|^$" windowsxp2
> name="windowsxp2"
> description="None"
> uuid="292b0651-9913-2459-5cfa-fb828f9c4314"
> memory=4096
> maxmem=4096
> vcpus=7
> on_poweroff="destroy"
> on_reboot="restart"
> on_crash="destroy"
> localtime=1
> keymap="en-us"
> builder="hvm"
> device_model="/usr/lib/xen/bin/qemu-dm"
> kernel="/usr/lib/xen/boot/hvmloader"
> boot="c"
> disk=[ 'phy:/dev/data/winxp1_disk1,hda,w',
> 'file:/var/lib/xen/winxp1_disk2.raw,hdb,w', ]
> vif=[ 'mac=00:16:3e:4e:c5:0c,bridge=br0,model=e1000', ]
> sdl=0
> vnc=1
> vncunused=1
> audio=0
> soundhw='es1370'
> viridian=1
> usb=1
> acpi=1
> apic=0
> pae=1
> usbdevice='tablet'
> serial="pty"
> stdvga=1
> gfx_passthru=0
> # this is an AMD Radeon HD 6770 and it's HDMI audio, and 2 USB ports
> pci = [ '04:00.0' , '04:00.1' , '00:12.0' , '00:12.2' ]
> xen_platform_pci=1
> pci_msitranslate=1
>
>
> The Windows 8 32 and 64 bit configs I used are the same except changed
> mac address, and different disk.
>
> Whether or not I use sound or PCI passthrough doesn't (significantly)
> affect performance.
>
>
> ==========
> my build process, including how to hack the build so it actually compiles:
> ==========
>
> # Install older libyajl-devel
>
> On openSUSE, this would be:
>
> zypper install libyajl1-devel
>
> # Delete everything (except .hg)... prevents unclean builds from
> breaking things. make distclean is not enough for very many builds.
> cd xen-unstable.hg
> rm -rf *
> # If you have permission denied errors (caused by running make install
> as root earlier), make sure to use chown and run rm again, or builds
> will fail.
>
> # Check out the revision
> hg update --clean "${build}"
>
> # hack up a troublesome Makefile that prevents builds
> vim tools/libxl/Makefile
> add "-lyajl":
> at the end of all 4 "$(CC) ..." lines
> to LIBXL_LIBS
> to LIBXLU_LIBS
> to LIBUUID_LIBS
>
> (don't know which ones are important... but it works with all of it)
>
> make distclean >/tmp/xen.distclean.log 2>&1 ; status=$? ; echo $status
>
> if [ -e configure ]; then
> ./configure
> else
> touch .config
> fi
>
> make dist >/tmp/xen.dist.log 2>&1 ; status=$? ; echo $status
>
>
> ==========
> my install process
> ==========
>
> To install the build, it's important to clean out old lib files...
> uninstall doesn't get them all. If you miss these, xm, xl, etc. may fail
> due to shared library issues.
>
> Also, "make uninstall" deletes important system files it should not
> (kernel, kernel modules, vm disks).
>
> As it says in the "make help":
> uninstall - attempt to remove installed Xen tools
> (use with extreme care!)
>
> Here is my process to solve the uninstall issues:
> http://pastebin.com/nXCavFTp
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
2012-10-20 17:21 xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7 Peter Maloney
2012-10-20 18:40 ` Peter Maloney
@ 2012-10-22 13:56 ` Andres Lagar-Cavilla
2012-10-22 13:59 ` Tim Deegan
2 siblings, 0 replies; 16+ messages in thread
From: Andres Lagar-Cavilla @ 2012-10-22 13:56 UTC (permalink / raw)
To: Peter Maloney; +Cc: Andres Lagar-Cavilla, xen-devel
On Oct 20, 2012, at 1:21 PM, Peter Maloney <peter.maloney@brockmann-consult.de> wrote:
> I ran a bisect to find out when Windows XP 32 bit becomes unusably slow.
> And I found the changeset that caused it.
>
> ==========
> The problem:
> ==========
>
> Windows 8 64 bit and 32 bit run fast and fine in the newest xen versions.
>
> Windows XP 32 bit runs unusably slow in anything new that I built from
> xen-unstable, but runs fast in 4.1.2 and 4.1.3 stable. While it is
> running slow, "xm top" or "xl top" show cpu usage around 650% for the domu.
>
> The bug might be AMD specific. I'm running an AMD FX-8150.
>
> ==========
> The result:
> ==========
>
> good: 24769:730f6ed72d70
> bad: 24770:7f79475d3de7
>
> The change was 8 months ago
>
> changeset: 24770:7f79475d3de7
> user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
> date: Fri Feb 10 16:07:07 2012 +0000
> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications
Peter,
thanks for the bug report and the bisection.
vcpus (and guest processes) look like they're chewing CPU because they're spending cycles within the hypervisor contending for spin locks. In the 4.2 time frame we had a similar report and we partially reverted the change set you mention to use read/write locks, ameliorating contention. It's obviously critical to figure which code path is win xp exercising wrt the p2m lock.
There are a number of profiling tools out there, so please go ahead with your favorite one to figure out what the vcpu's are doing in hypervisor context.
If unsure, my advice, in terms of quick initial turnaround, would be to
xl dmesg -c
for i in a_number_of_times; do xl debug-keys d; xl dmesg -c; done;
This is gonna dump stack traces for all scheduled vcpus. We should be able to see the stack traces for your domU vcpus, and through sampling quickly infer where they are spending most of their time.
Let us know what you find.
Thanks
Andres
>
> ==========
> My hardware:
> ==========
>
> AMD FX-8150
> 990 FX chipset
>
> Here's a dmidecode: http://pastebin.com/XUZjmiVz
>
> ==========
> My kernel:
> ==========
>
> I compiled the for-linus branch of cmason's linux-btrfs git repo, around
> August 11th (
> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
> for-linus )
>
> peter:~/xen # uname -a
> Linux peter 3.5.0-1-default+ #3 SMP Sat Aug 11 21:30:44 CEST 2012 x86_64
> x86_64 x86_64 GNU/Linux
>
> Here's the kernel config: http://pastebin.com/1GQbiFZE (only weird thing
> I set was CONFIG_NR_CPUS=16 for no particular reason; default was 512 or
> 256)
>
> ==========
> My Windows XP VM config:
> ==========
>
> # grep -vE "^#|^$" windowsxp2
> name="windowsxp2"
> description="None"
> uuid="292b0651-9913-2459-5cfa-fb828f9c4314"
> memory=4096
> maxmem=4096
> vcpus=7
> on_poweroff="destroy"
> on_reboot="restart"
> on_crash="destroy"
> localtime=1
> keymap="en-us"
> builder="hvm"
> device_model="/usr/lib/xen/bin/qemu-dm"
> kernel="/usr/lib/xen/boot/hvmloader"
> boot="c"
> disk=[ 'phy:/dev/data/winxp1_disk1,hda,w',
> 'file:/var/lib/xen/winxp1_disk2.raw,hdb,w', ]
> vif=[ 'mac=00:16:3e:4e:c5:0c,bridge=br0,model=e1000', ]
> sdl=0
> vnc=1
> vncunused=1
> audio=0
> soundhw='es1370'
> viridian=1
> usb=1
> acpi=1
> apic=0
> pae=1
> usbdevice='tablet'
> serial="pty"
> stdvga=1
> gfx_passthru=0
> # this is an AMD Radeon HD 6770 and it's HDMI audio, and 2 USB ports
> pci = [ '04:00.0' , '04:00.1' , '00:12.0' , '00:12.2' ]
> xen_platform_pci=1
> pci_msitranslate=1
>
>
> The Windows 8 32 and 64 bit configs I used are the same except changed
> mac address, and different disk.
>
> Whether or not I use sound or PCI passthrough doesn't (significantly)
> affect performance.
>
>
> ==========
> my build process, including how to hack the build so it actually compiles:
> ==========
>
> # Install older libyajl-devel
>
> On openSUSE, this would be:
>
> zypper install libyajl1-devel
>
> # Delete everything (except .hg)... prevents unclean builds from
> breaking things. make distclean is not enough for very many builds.
> cd xen-unstable.hg
> rm -rf *
> # If you have permission denied errors (caused by running make install
> as root earlier), make sure to use chown and run rm again, or builds
> will fail.
>
> # Check out the revision
> hg update --clean "${build}"
>
> # hack up a troublesome Makefile that prevents builds
> vim tools/libxl/Makefile
> add "-lyajl":
> at the end of all 4 "$(CC) ..." lines
> to LIBXL_LIBS
> to LIBXLU_LIBS
> to LIBUUID_LIBS
>
> (don't know which ones are important... but it works with all of it)
>
> make distclean >/tmp/xen.distclean.log 2>&1 ; status=$? ; echo $status
>
> if [ -e configure ]; then
> ./configure
> else
> touch .config
> fi
>
> make dist >/tmp/xen.dist.log 2>&1 ; status=$? ; echo $status
>
>
> ==========
> my install process
> ==========
>
> To install the build, it's important to clean out old lib files...
> uninstall doesn't get them all. If you miss these, xm, xl, etc. may fail
> due to shared library issues.
>
> Also, "make uninstall" deletes important system files it should not
> (kernel, kernel modules, vm disks).
>
> As it says in the "make help":
> uninstall - attempt to remove installed Xen tools
> (use with extreme care!)
>
> Here is my process to solve the uninstall issues:
> http://pastebin.com/nXCavFTp
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
2012-10-20 17:21 xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7 Peter Maloney
2012-10-20 18:40 ` Peter Maloney
2012-10-22 13:56 ` Andres Lagar-Cavilla
@ 2012-10-22 13:59 ` Tim Deegan
2012-10-23 22:17 ` Peter Maloney
2012-11-01 17:00 ` Tim Deegan
2 siblings, 2 replies; 16+ messages in thread
From: Tim Deegan @ 2012-10-22 13:59 UTC (permalink / raw)
To: Peter Maloney; +Cc: Andres Lagar-Cavilla, xen-devel
At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote:
> I ran a bisect to find out when Windows XP 32 bit becomes unusably slow.
> And I found the changeset that caused it.
>
> ==========
> The problem:
> ==========
>
> Windows 8 64 bit and 32 bit run fast and fine in the newest xen versions.
>
> Windows XP 32 bit runs unusably slow in anything new that I built from
> xen-unstable, but runs fast in 4.1.2 and 4.1.3 stable. While it is
> running slow, "xm top" or "xl top" show cpu usage around 650% for the domu.
>
> The bug might be AMD specific. I'm running an AMD FX-8150.
The bug does seem to be AMD-specific, and NPT-specific; with
'hap=0' it goes much faster.
> ==========
> The result:
> ==========
>
> good: 24769:730f6ed72d70
> bad: 24770:7f79475d3de7
>
> The change was 8 months ago
>
> changeset: 24770:7f79475d3de7
> user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
> date: Fri Feb 10 16:07:07 2012 +0000
> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications
This change was bad for performnace across the board and most of it has
since been either reverted or amended, but clearly we missed something
here.
It's interesting that Win8 isn't slowed down. I wonder whether that's to
do with the way it drives the VGA card -- IIRC it uses a generic VESA
driver rather than a Cirrus one.
Tim.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
2012-10-22 13:59 ` Tim Deegan
@ 2012-10-23 22:17 ` Peter Maloney
2012-11-01 17:00 ` Tim Deegan
1 sibling, 0 replies; 16+ messages in thread
From: Peter Maloney @ 2012-10-23 22:17 UTC (permalink / raw)
To: Tim Deegan; +Cc: Andres Lagar-Cavilla, xen-devel
On 10/22/2012 03:59 PM, Tim Deegan wrote:
> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote:
>> I ran a bisect to find out when Windows XP 32 bit becomes unusably slow.
>> And I found the changeset that caused it.
>>
>> ==========
>> The problem:
>> ==========
>>
>> Windows 8 64 bit and 32 bit run fast and fine in the newest xen versions.
>>
>> Windows XP 32 bit runs unusably slow in anything new that I built from
>> xen-unstable, but runs fast in 4.1.2 and 4.1.3 stable. While it is
>> running slow, "xm top" or "xl top" show cpu usage around 650% for the domu.
>>
>> The bug might be AMD specific. I'm running an AMD FX-8150.
> The bug does seem to be AMD-specific, and NPT-specific; with
> 'hap=0' it goes much faster.
K. glad to hear it ;) I just guessed it was AMD specific since not so
many of us seem to run AMDs and it seemed to be only me with the problem.
>> ==========
>> The result:
>> ==========
>>
>> good: 24769:730f6ed72d70
>> bad: 24770:7f79475d3de7
>>
>> The change was 8 months ago
>>
>> changeset: 24770:7f79475d3de7
>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
>> date: Fri Feb 10 16:07:07 2012 +0000
>> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications
> This change was bad for performnace across the board and most of it has
> since been either reverted or amended, but clearly we missed something
> here.
>
> It's interesting that Win8 isn't slowed down. I wonder whether that's to
> do with the way it drives the VGA card -- IIRC it uses a generic VESA
> driver rather than a Cirrus one.
My tests included
- passthrough and "stdvga=1" which replaces the cirrus one with some
other card.
- no passthrougn, and did not set stdvga (used vnc and cirrus)
Both were slow.
>
> Tim.
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
2012-10-22 13:59 ` Tim Deegan
2012-10-23 22:17 ` Peter Maloney
@ 2012-11-01 17:00 ` Tim Deegan
2012-11-01 17:28 ` Andres Lagar-Cavilla
1 sibling, 1 reply; 16+ messages in thread
From: Tim Deegan @ 2012-11-01 17:00 UTC (permalink / raw)
To: Peter Maloney; +Cc: Andres Lagar-Cavilla, xen-devel
Hi,
At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote:
> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote:
> > The change was 8 months ago
> >
> > changeset: 24770:7f79475d3de7
> > user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
> > date: Fri Feb 10 16:07:07 2012 +0000
> > summary: x86/mm: Make p2m lookups fully synchronized wrt modifications
>
> This change was bad for performnace across the board and most of it has
> since been either reverted or amended, but clearly we missed something
> here.
>
> It's interesting that Win8 isn't slowed down. I wonder whether that's to
> do with the way it drives the VGA card -- IIRC it uses a generic VESA
> driver rather than a Cirrus one.
In fact this is to do with the APIC. On my test system, a busy 2-vcpu
VM is making about 300k/s accesses to the APIC TPR. These accesses are
all trapped and emulated by Xen, and that emulation has got more
expensive as part of this change.
Later Windows OSes have a feature called 'lazy IRQL' which makes those
accesses go away, but sadly that's not been done for WinXP. On modern
Intel CPUs, the hardware acceleration for TPR accesses works for XP; on
AMD it requires the OS to use 'MOV reg32, CR8' to access the TPR instead
of MMIO, which XP is clearly not doing. :(
Peter: if you have the option, you might find that installing the PV
drivers that ship with Citrix XenServer 6.0 makes things work better.
Andres: even though this load of APIC emulations is pretty extreme, it's
surprising that the VM runs faster on shadow pagetables! Any ideas for
where this slowdown is coming from?
Cheers,
Tim.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
2012-11-01 17:00 ` Tim Deegan
@ 2012-11-01 17:28 ` Andres Lagar-Cavilla
2012-11-13 13:17 ` Peter Maloney
0 siblings, 1 reply; 16+ messages in thread
From: Andres Lagar-Cavilla @ 2012-11-01 17:28 UTC (permalink / raw)
To: Tim Deegan; +Cc: Peter Maloney, Andres Lagar-Cavilla, xen-devel
On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote:
> Hi,
>
> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote:
>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote:
>>> The change was 8 months ago
>>>
>>> changeset: 24770:7f79475d3de7
>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
>>> date: Fri Feb 10 16:07:07 2012 +0000
>>> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications
>>
>> This change was bad for performnace across the board and most of it has
>> since been either reverted or amended, but clearly we missed something
>> here.
>>
>> It's interesting that Win8 isn't slowed down. I wonder whether that's to
>> do with the way it drives the VGA card -- IIRC it uses a generic VESA
>> driver rather than a Cirrus one.
>
> In fact this is to do with the APIC. On my test system, a busy 2-vcpu
> VM is making about 300k/s accesses to the APIC TPR. These accesses are
> all trapped and emulated by Xen, and that emulation has got more
> expensive as part of this change.
>
> Later Windows OSes have a feature called 'lazy IRQL' which makes those
> accesses go away, but sadly that's not been done for WinXP. On modern
> Intel CPUs, the hardware acceleration for TPR accesses works for XP; on
> AMD it requires the OS to use 'MOV reg32, CR8' to access the TPR instead
> of MMIO, which XP is clearly not doing. :(
>
> Peter: if you have the option, you might find that installing the PV
> drivers that ship with Citrix XenServer 6.0 makes things work better.
>
> Andres: even though this load of APIC emulations is pretty extreme, it's
> surprising that the VM runs faster on shadow pagetables! Any ideas for
> where this slowdown is coming from?
Not any immediate ideas without profiling.
However, most callers of hvmemul_do_io pass a stub zero ram_gpa address. We might be madly hitting the p2m locks for no reason there.
How about the following patch, Peter, Tim?
diff -r 5171750d133e xen/arch/x86/hvm/emulate.c
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -60,24 +60,28 @@ static int hvmemul_do_io(
ioreq_t *p = get_ioreq(curr);
unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
p2m_type_t p2mt;
- struct page_info *ram_page;
+ struct page_info *ram_page = NULL;
int rc;
/* Check for paged out page */
- ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
- if ( p2m_is_paging(p2mt) )
+ if ( ram_gpa != INVALID_MFN )
{
- if ( ram_page )
- put_page(ram_page);
- p2m_mem_paging_populate(curr->domain, ram_gfn);
- return X86EMUL_RETRY;
- }
- if ( p2m_is_shared(p2mt) )
- {
- if ( ram_page )
- put_page(ram_page);
- return X86EMUL_RETRY;
- }
+ ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
+ if ( p2m_is_paging(p2mt) )
+ {
+ if ( ram_page )
+ put_page(ram_page);
+ p2m_mem_paging_populate(curr->domain, ram_gfn);
+ return X86EMUL_RETRY;
+ }
+ if ( p2m_is_shared(p2mt) )
+ {
+ if ( ram_page )
+ put_page(ram_page);
+ return X86EMUL_RETRY;
+ }
+ } else
+ value = 0; /* for pvalue */
/*
* Weird-sized accesses have undefined behaviour: we discard writes
@@ -455,7 +459,7 @@ static int __hvmemul_read(
return X86EMUL_UNHANDLEABLE;
gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
if ( (off + bytes) <= PAGE_SIZE )
- return hvmemul_do_mmio(gpa, &reps, bytes, 0,
+ return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN,
IOREQ_READ, 0, p_data);
}
@@ -480,7 +484,8 @@ static int __hvmemul_read(
addr, &gpa, bytes, &reps, pfec, hvmemul_ctxt);
if ( rc != X86EMUL_OKAY )
return rc;
- return hvmemul_do_mmio(gpa, &reps, bytes, 0, IOREQ_READ, 0, p_data);
+ return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN,
+ IOREQ_READ, 0, p_data);
case HVMCOPY_gfn_paged_out:
return X86EMUL_RETRY;
case HVMCOPY_gfn_shared:
@@ -552,7 +557,7 @@ static int hvmemul_write(
unsigned int off = addr & (PAGE_SIZE - 1);
gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
if ( (off + bytes) <= PAGE_SIZE )
- return hvmemul_do_mmio(gpa, &reps, bytes, 0,
+ return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN,
IOREQ_WRITE, 0, p_data);
}
@@ -573,7 +578,7 @@ static int hvmemul_write(
addr, &gpa, bytes, &reps, pfec, hvmemul_ctxt);
if ( rc != X86EMUL_OKAY )
return rc;
- return hvmemul_do_mmio(gpa, &reps, bytes, 0,
+ return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN,
IOREQ_WRITE, 0, p_data);
case HVMCOPY_gfn_paged_out:
return X86EMUL_RETRY;
@@ -804,7 +809,7 @@ static int hvmemul_read_io(
{
unsigned long reps = 1;
*val = 0;
- return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_READ, 0, val);
+ return hvmemul_do_pio(port, &reps, bytes, INVALID_MFN, IOREQ_READ, 0, val);
}
static int hvmemul_write_io(
@@ -814,7 +819,7 @@ static int hvmemul_write_io(
struct x86_emulate_ctxt *ctxt)
{
unsigned long reps = 1;
- return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_WRITE, 0, &val);
+ return hvmemul_do_pio(port, &reps, bytes, INVALID_MFN, IOREQ_WRITE, 0, &val);
}
static int hvmemul_read_cr(
diff -r 5171750d133e xen/arch/x86/hvm/io.c
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -231,7 +231,7 @@ int handle_pio(uint16_t port, int size,
if ( dir == IOREQ_WRITE )
data = guest_cpu_user_regs()->eax;
- rc = hvmemul_do_pio(port, &reps, size, 0, dir, 0, &data);
+ rc = hvmemul_do_pio(port, &reps, size, INVALID_MFN, dir, 0, &data);
switch ( rc )
{
>
> Cheers,
>
> Tim.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
2012-11-01 17:28 ` Andres Lagar-Cavilla
@ 2012-11-13 13:17 ` Peter Maloney
2012-11-22 18:54 ` Peter Maloney
0 siblings, 1 reply; 16+ messages in thread
From: Peter Maloney @ 2012-11-13 13:17 UTC (permalink / raw)
To: Andres Lagar-Cavilla; +Cc: Tim Deegan, Andres Lagar-Cavilla, xen-devel
On 2012-11-01 18:28, Andres Lagar-Cavilla wrote:
> On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote:
>
>> Hi,
>>
>> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote:
>>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote:
>>>> The change was 8 months ago
>>>>
>>>> changeset: 24770:7f79475d3de7
>>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
>>>> date: Fri Feb 10 16:07:07 2012 +0000
>>>> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications
>>> This change was bad for performnace across the board and most of it has
>>> since been either reverted or amended, but clearly we missed something
>>> here.
>>>
>>> It's interesting that Win8 isn't slowed down. I wonder whether that's to
>>> do with the way it drives the VGA card -- IIRC it uses a generic VESA
>>> driver rather than a Cirrus one.
>> In fact this is to do with the APIC. On my test system, a busy 2-vcpu
>> VM is making about 300k/s accesses to the APIC TPR. These accesses are
>> all trapped and emulated by Xen, and that emulation has got more
>> expensive as part of this change.
>>
>> Later Windows OSes have a feature called 'lazy IRQL' which makes those
>> accesses go away, but sadly that's not been done for WinXP. On modern
>> Intel CPUs, the hardware acceleration for TPR accesses works for XP; on
>> AMD it requires the OS to use 'MOV reg32, CR8' to access the TPR instead
>> of MMIO, which XP is clearly not doing. :(
>>
>> Peter: if you have the option, you might find that installing the PV
>> drivers that ship with Citrix XenServer 6.0 makes things work better.
>>
>> Andres: even though this load of APIC emulations is pretty extreme, it's
>> surprising that the VM runs faster on shadow pagetables! Any ideas for
>> where this slowdown is coming from?
> Not any immediate ideas without profiling.
>
> However, most callers of hvmemul_do_io pass a stub zero ram_gpa address. We might be madly hitting the p2m locks for no reason there.
>
> How about the following patch, Peter, Tim?
Thanks,
I'll give it a try sometime this week I guess.
> diff -r 5171750d133e xen/arch/x86/hvm/emulate.c
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -60,24 +60,28 @@ static int hvmemul_do_io(
> ioreq_t *p = get_ioreq(curr);
> unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
> p2m_type_t p2mt;
> - struct page_info *ram_page;
> + struct page_info *ram_page = NULL;
> int rc;
>
> /* Check for paged out page */
> - ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
> - if ( p2m_is_paging(p2mt) )
> + if ( ram_gpa != INVALID_MFN )
> {
> - if ( ram_page )
> - put_page(ram_page);
> - p2m_mem_paging_populate(curr->domain, ram_gfn);
> - return X86EMUL_RETRY;
> - }
> - if ( p2m_is_shared(p2mt) )
> - {
> - if ( ram_page )
> - put_page(ram_page);
> - return X86EMUL_RETRY;
> - }
> + ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
> + if ( p2m_is_paging(p2mt) )
> + {
> + if ( ram_page )
> + put_page(ram_page);
> + p2m_mem_paging_populate(curr->domain, ram_gfn);
> + return X86EMUL_RETRY;
> + }
> + if ( p2m_is_shared(p2mt) )
> + {
> + if ( ram_page )
> + put_page(ram_page);
> + return X86EMUL_RETRY;
> + }
> + } else
> + value = 0; /* for pvalue */
>
> /*
> * Weird-sized accesses have undefined behaviour: we discard writes
> @@ -455,7 +459,7 @@ static int __hvmemul_read(
> return X86EMUL_UNHANDLEABLE;
> gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
> if ( (off + bytes) <= PAGE_SIZE )
> - return hvmemul_do_mmio(gpa, &reps, bytes, 0,
> + return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN,
> IOREQ_READ, 0, p_data);
> }
>
> @@ -480,7 +484,8 @@ static int __hvmemul_read(
> addr, &gpa, bytes, &reps, pfec, hvmemul_ctxt);
> if ( rc != X86EMUL_OKAY )
> return rc;
> - return hvmemul_do_mmio(gpa, &reps, bytes, 0, IOREQ_READ, 0, p_data);
> + return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN,
> + IOREQ_READ, 0, p_data);
> case HVMCOPY_gfn_paged_out:
> return X86EMUL_RETRY;
> case HVMCOPY_gfn_shared:
> @@ -552,7 +557,7 @@ static int hvmemul_write(
> unsigned int off = addr & (PAGE_SIZE - 1);
> gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
> if ( (off + bytes) <= PAGE_SIZE )
> - return hvmemul_do_mmio(gpa, &reps, bytes, 0,
> + return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN,
> IOREQ_WRITE, 0, p_data);
> }
>
> @@ -573,7 +578,7 @@ static int hvmemul_write(
> addr, &gpa, bytes, &reps, pfec, hvmemul_ctxt);
> if ( rc != X86EMUL_OKAY )
> return rc;
> - return hvmemul_do_mmio(gpa, &reps, bytes, 0,
> + return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN,
> IOREQ_WRITE, 0, p_data);
> case HVMCOPY_gfn_paged_out:
> return X86EMUL_RETRY;
> @@ -804,7 +809,7 @@ static int hvmemul_read_io(
> {
> unsigned long reps = 1;
> *val = 0;
> - return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_READ, 0, val);
> + return hvmemul_do_pio(port, &reps, bytes, INVALID_MFN, IOREQ_READ, 0, val);
> }
>
> static int hvmemul_write_io(
> @@ -814,7 +819,7 @@ static int hvmemul_write_io(
> struct x86_emulate_ctxt *ctxt)
> {
> unsigned long reps = 1;
> - return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_WRITE, 0, &val);
> + return hvmemul_do_pio(port, &reps, bytes, INVALID_MFN, IOREQ_WRITE, 0, &val);
> }
>
> static int hvmemul_read_cr(
> diff -r 5171750d133e xen/arch/x86/hvm/io.c
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -231,7 +231,7 @@ int handle_pio(uint16_t port, int size,
> if ( dir == IOREQ_WRITE )
> data = guest_cpu_user_regs()->eax;
>
> - rc = hvmemul_do_pio(port, &reps, size, 0, dir, 0, &data);
> + rc = hvmemul_do_pio(port, &reps, size, INVALID_MFN, dir, 0, &data);
>
> switch ( rc )
> {
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
2012-11-13 13:17 ` Peter Maloney
@ 2012-11-22 18:54 ` Peter Maloney
2013-01-12 15:25 ` Peter Maloney
0 siblings, 1 reply; 16+ messages in thread
From: Peter Maloney @ 2012-11-22 18:54 UTC (permalink / raw)
To: xen-devel, Andres Lagar-Cavilla
[-- Attachment #1: Type: text/plain, Size: 1544 bytes --]
On 11/13/2012 02:17 PM, Peter Maloney wrote:
> On 2012-11-01 18:28, Andres Lagar-Cavilla wrote:
>> On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote:
>>
>>> Hi,
>>>
>>> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote:
>>>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote:
>>>>> The change was 8 months ago
>>>>>
>>>>> changeset: 24770:7f79475d3de7
>>>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
>>>>> date: Fri Feb 10 16:07:07 2012 +0000
>>>>> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications
>>> [...]
>> Not any immediate ideas without profiling.
>>
>> However, most callers of hvmemul_do_io pass a stub zero ram_gpa address. We might be madly hitting the p2m locks for no reason there.
>>
>> How about the following patch, Peter, Tim?
>
I tried the patch applied to xen-unstable 4.2.0-branched
528f0708b6db+ 4.2.0-branched
It seemed the same. It was extremely slow with 7 vcpus, and with 2 vcpus
it was slow, but fast enough that I could bother to log in and out
during the test.
Attached are logs generated with this command (using xm instead of xl):
for i in {1..30}; do xm debug-keys d; xm dmesg -c; done >> nameoflog
xenxp_xm_dmesg_-c_7cpus_idle.log
xenxp_xm_dmesg_-c_7cpus_logintooslow.log
xenxp_xm_dmesg_-c_7cpus_shutdown.log
xenxp_xm_dmesg_-c_duringlogin.log
xenxp_xm_dmesg_-c_idling_login_screen.log
Also there is xenxp_dmesg.log which is output from hitting alt+sysrq+w
and p in case it's relevant.
BTW this time I am testing with kernel 3.6.7
[-- Attachment #2: xenxp_logs.tgz --]
[-- Type: application/x-compressed-tar, Size: 328293 bytes --]
[-- Attachment #3: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
2012-11-22 18:54 ` Peter Maloney
@ 2013-01-12 15:25 ` Peter Maloney
2013-01-17 20:57 ` Pasi Kärkkäinen
2013-01-18 14:30 ` George Dunlap
0 siblings, 2 replies; 16+ messages in thread
From: Peter Maloney @ 2013-01-12 15:25 UTC (permalink / raw)
To: xen-devel
On 11/22/2012 07:54 PM, Peter Maloney wrote:
> On 11/13/2012 02:17 PM, Peter Maloney wrote:
>> On 2012-11-01 18:28, Andres Lagar-Cavilla wrote:
>>> On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote:
>>>>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote:
>>>>>> The change was 8 months ago
>>>>>>
>>>>>> changeset: 24770:7f79475d3de7
>>>>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
>>>>>> date: Fri Feb 10 16:07:07 2012 +0000
>>>>>> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications
>>>> [...]
>>> Not any immediate ideas without profiling.
>>>
>>> However, most callers of hvmemul_do_io pass a stub zero ram_gpa address. We might be madly hitting the p2m locks for no reason there.
>>>
>>> How about the following patch, Peter, Tim?
> I tried the patch applied to xen-unstable 4.2.0-branched
> 528f0708b6db+ 4.2.0-branched
>
> It seemed the same. It was extremely slow with 7 vcpus, and with 2 vcpus
> it was slow, but fast enough that I could bother to log in and out
> during the test.
>
> Attached are logs generated with this command (using xm instead of xl):
> for i in {1..30}; do xm debug-keys d; xm dmesg -c; done >> nameoflog
>
> xenxp_xm_dmesg_-c_7cpus_idle.log
> xenxp_xm_dmesg_-c_7cpus_logintooslow.log
> xenxp_xm_dmesg_-c_7cpus_shutdown.log
> xenxp_xm_dmesg_-c_duringlogin.log
> xenxp_xm_dmesg_-c_idling_login_screen.log
>
> Also there is xenxp_dmesg.log which is output from hitting alt+sysrq+w
> and p in case it's relevant.
>
> BTW this time I am testing with kernel 3.6.7
>
I also tested 4.2.1 now, and it has the same problem. And after using it
for a while with windows 8 (playing games), I get the general feel that
it is laggier than with 4.1.3. And now I'm using 4.1.4 which is fast
like 4.1.3.
So any ideas on how to fix this or gather more useful information?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
2013-01-12 15:25 ` Peter Maloney
@ 2013-01-17 20:57 ` Pasi Kärkkäinen
2013-01-18 14:22 ` George Dunlap
2013-01-18 14:30 ` George Dunlap
1 sibling, 1 reply; 16+ messages in thread
From: Pasi Kärkkäinen @ 2013-01-17 20:57 UTC (permalink / raw)
To: George Dunlap; +Cc: Andres Lagar-Cavilla, Tim Deegan, Peter Maloney, xen-devel
Hello,
George: What do you think, should we add this bug to the Xen 4.3 status email for tracking it?
It's a serious HVM/winxp performance regression on AMD..
-- Pasi
On Sat, Jan 12, 2013 at 04:25:47PM +0100, Peter Maloney wrote:
> On 11/22/2012 07:54 PM, Peter Maloney wrote:
> > On 11/13/2012 02:17 PM, Peter Maloney wrote:
> >> On 2012-11-01 18:28, Andres Lagar-Cavilla wrote:
> >>> On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote:
> >>>>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote:
> >>>>>> The change was 8 months ago
> >>>>>>
> >>>>>> changeset: 24770:7f79475d3de7
> >>>>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
> >>>>>> date: Fri Feb 10 16:07:07 2012 +0000
> >>>>>> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications
> >>>> [...]
> >>> Not any immediate ideas without profiling.
> >>>
> >>> However, most callers of hvmemul_do_io pass a stub zero ram_gpa address. We might be madly hitting the p2m locks for no reason there.
> >>>
> >>> How about the following patch, Peter, Tim?
> > I tried the patch applied to xen-unstable 4.2.0-branched
> > 528f0708b6db+ 4.2.0-branched
> >
> > It seemed the same. It was extremely slow with 7 vcpus, and with 2 vcpus
> > it was slow, but fast enough that I could bother to log in and out
> > during the test.
> >
> > Attached are logs generated with this command (using xm instead of xl):
> > for i in {1..30}; do xm debug-keys d; xm dmesg -c; done >> nameoflog
> >
> > xenxp_xm_dmesg_-c_7cpus_idle.log
> > xenxp_xm_dmesg_-c_7cpus_logintooslow.log
> > xenxp_xm_dmesg_-c_7cpus_shutdown.log
> > xenxp_xm_dmesg_-c_duringlogin.log
> > xenxp_xm_dmesg_-c_idling_login_screen.log
> >
> > Also there is xenxp_dmesg.log which is output from hitting alt+sysrq+w
> > and p in case it's relevant.
> >
> > BTW this time I am testing with kernel 3.6.7
> >
>
> I also tested 4.2.1 now, and it has the same problem. And after using it
> for a while with windows 8 (playing games), I get the general feel that
> it is laggier than with 4.1.3. And now I'm using 4.1.4 which is fast
> like 4.1.3.
>
> So any ideas on how to fix this or gather more useful information?
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
2013-01-17 20:57 ` Pasi Kärkkäinen
@ 2013-01-18 14:22 ` George Dunlap
2013-01-18 14:40 ` Andres Lagar-Cavilla
0 siblings, 1 reply; 16+ messages in thread
From: George Dunlap @ 2013-01-18 14:22 UTC (permalink / raw)
To: Pasi Kärkkäinen
Cc: Andres Lagar-Cavilla, Tim (Xen.org), Peter Maloney,
xen-devel@lists.xen.org
On 17/01/13 20:57, Pasi Kärkkäinen wrote:
> Hello,
>
> George: What do you think, should we add this bug to the Xen 4.3 status email for tracking it?
> It's a serious HVM/winxp performance regression on AMD..
Hey Pasi -- thanks for bringing this thread to my attention. I had
noticed a performance impact on AMD boxen myself, but investigating it
had kind of gotten buried in more urgent tasks. Yes, I think we should
track it. I'll put it on the list.
-George
>
> -- Pasi
>
> On Sat, Jan 12, 2013 at 04:25:47PM +0100, Peter Maloney wrote:
>> On 11/22/2012 07:54 PM, Peter Maloney wrote:
>>> On 11/13/2012 02:17 PM, Peter Maloney wrote:
>>>> On 2012-11-01 18:28, Andres Lagar-Cavilla wrote:
>>>>> On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote:
>>>>>>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote:
>>>>>>>> The change was 8 months ago
>>>>>>>>
>>>>>>>> changeset: 24770:7f79475d3de7
>>>>>>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
>>>>>>>> date: Fri Feb 10 16:07:07 2012 +0000
>>>>>>>> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications
>>>>>> [...]
>>>>> Not any immediate ideas without profiling.
>>>>>
>>>>> However, most callers of hvmemul_do_io pass a stub zero ram_gpa address. We might be madly hitting the p2m locks for no reason there.
>>>>>
>>>>> How about the following patch, Peter, Tim?
>>> I tried the patch applied to xen-unstable 4.2.0-branched
>>> 528f0708b6db+ 4.2.0-branched
>>>
>>> It seemed the same. It was extremely slow with 7 vcpus, and with 2 vcpus
>>> it was slow, but fast enough that I could bother to log in and out
>>> during the test.
>>>
>>> Attached are logs generated with this command (using xm instead of xl):
>>> for i in {1..30}; do xm debug-keys d; xm dmesg -c; done >> nameoflog
>>>
>>> xenxp_xm_dmesg_-c_7cpus_idle.log
>>> xenxp_xm_dmesg_-c_7cpus_logintooslow.log
>>> xenxp_xm_dmesg_-c_7cpus_shutdown.log
>>> xenxp_xm_dmesg_-c_duringlogin.log
>>> xenxp_xm_dmesg_-c_idling_login_screen.log
>>>
>>> Also there is xenxp_dmesg.log which is output from hitting alt+sysrq+w
>>> and p in case it's relevant.
>>>
>>> BTW this time I am testing with kernel 3.6.7
>>>
>>
>> I also tested 4.2.1 now, and it has the same problem. And after using it
>> for a while with windows 8 (playing games), I get the general feel that
>> it is laggier than with 4.1.3. And now I'm using 4.1.4 which is fast
>> like 4.1.3.
>>
>> So any ideas on how to fix this or gather more useful information?
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
2013-01-12 15:25 ` Peter Maloney
2013-01-17 20:57 ` Pasi Kärkkäinen
@ 2013-01-18 14:30 ` George Dunlap
2013-01-26 12:30 ` Peter Maloney
1 sibling, 1 reply; 16+ messages in thread
From: George Dunlap @ 2013-01-18 14:30 UTC (permalink / raw)
To: Peter Maloney; +Cc: xen-devel@lists.xen.org
[-- Attachment #1.1: Type: text/plain, Size: 2665 bytes --]
On Sat, Jan 12, 2013 at 3:25 PM, Peter Maloney <
peter.maloney@brockmann-consult.de> wrote:
> On 11/22/2012 07:54 PM, Peter Maloney wrote:
> > On 11/13/2012 02:17 PM, Peter Maloney wrote:
> >> On 2012-11-01 18:28, Andres Lagar-Cavilla wrote:
> >>> On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote:
> >>>>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote:
> >>>>>> The change was 8 months ago
> >>>>>>
> >>>>>> changeset: 24770:7f79475d3de7
> >>>>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
> >>>>>> date: Fri Feb 10 16:07:07 2012 +0000
> >>>>>> summary: x86/mm: Make p2m lookups fully synchronized wrt
> modifications
> >>>> [...]
> >>> Not any immediate ideas without profiling.
> >>>
> >>> However, most callers of hvmemul_do_io pass a stub zero ram_gpa
> address. We might be madly hitting the p2m locks for no reason there.
> >>>
> >>> How about the following patch, Peter, Tim?
> > I tried the patch applied to xen-unstable 4.2.0-branched
> > 528f0708b6db+ 4.2.0-branched
> >
> > It seemed the same. It was extremely slow with 7 vcpus, and with 2 vcpus
> > it was slow, but fast enough that I could bother to log in and out
> > during the test.
> >
> > Attached are logs generated with this command (using xm instead of xl):
> > for i in {1..30}; do xm debug-keys d; xm dmesg -c; done >> nameoflog
> >
> > xenxp_xm_dmesg_-c_7cpus_idle.log
> > xenxp_xm_dmesg_-c_7cpus_logintooslow.log
> > xenxp_xm_dmesg_-c_7cpus_shutdown.log
> > xenxp_xm_dmesg_-c_duringlogin.log
> > xenxp_xm_dmesg_-c_idling_login_screen.log
> >
> > Also there is xenxp_dmesg.log which is output from hitting alt+sysrq+w
> > and p in case it's relevant.
> >
> > BTW this time I am testing with kernel 3.6.7
> >
>
> I also tested 4.2.1 now, and it has the same problem. And after using it
> for a while with windows 8 (playing games), I get the general feel that
> it is laggier than with 4.1.3. And now I'm using 4.1.4 which is fast
> like 4.1.3.
>
> So any ideas on how to fix this or gather more useful information?
>
Pete,
One thing that would be helpful is if we could have a quantifiable
difference, other than "feels laggier". If this is related to the problem
I saw a few months ago, running winXP and looking at "top" in qemu is
pretty clear. If you have a bit of time, do you suppose you could try to
look around for a freely-available benchmark that would give us some
numbers for Windows 8? That might help us track down the problem better as
well.
I've put this on my 4.3 release tracking list, so it should get attention.
-George
[-- Attachment #1.2: Type: text/html, Size: 3691 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
2013-01-18 14:22 ` George Dunlap
@ 2013-01-18 14:40 ` Andres Lagar-Cavilla
2013-01-21 12:07 ` George Dunlap
0 siblings, 1 reply; 16+ messages in thread
From: Andres Lagar-Cavilla @ 2013-01-18 14:40 UTC (permalink / raw)
To: George Dunlap
Cc: Peter Maloney, Andres Lagar-Cavilla, Tim (Xen.org),
xen-devel@lists.xen.org
I unfortunately don't have AMD hardware to test with. On our regular win7 EPT-based workloads we've seen no noticeable degradations.
The answer to this one obviously lies in profiling the right sw/hw combo.
Andres
On Jan 18, 2013, at 9:22 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> On 17/01/13 20:57, Pasi Kärkkäinen wrote:
>> Hello,
>>
>> George: What do you think, should we add this bug to the Xen 4.3 status email for tracking it?
>> It's a serious HVM/winxp performance regression on AMD..
>
> Hey Pasi -- thanks for bringing this thread to my attention. I had noticed a performance impact on AMD boxen myself, but investigating it had kind of gotten buried in more urgent tasks. Yes, I think we should track it. I'll put it on the list.
>
> -George
>
>>
>> -- Pasi
>>
>> On Sat, Jan 12, 2013 at 04:25:47PM +0100, Peter Maloney wrote:
>>> On 11/22/2012 07:54 PM, Peter Maloney wrote:
>>>> On 11/13/2012 02:17 PM, Peter Maloney wrote:
>>>>> On 2012-11-01 18:28, Andres Lagar-Cavilla wrote:
>>>>>> On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote:
>>>>>>>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote:
>>>>>>>>> The change was 8 months ago
>>>>>>>>>
>>>>>>>>> changeset: 24770:7f79475d3de7
>>>>>>>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
>>>>>>>>> date: Fri Feb 10 16:07:07 2012 +0000
>>>>>>>>> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications
>>>>>>> [...]
>>>>>> Not any immediate ideas without profiling.
>>>>>>
>>>>>> However, most callers of hvmemul_do_io pass a stub zero ram_gpa address. We might be madly hitting the p2m locks for no reason there.
>>>>>>
>>>>>> How about the following patch, Peter, Tim?
>>>> I tried the patch applied to xen-unstable 4.2.0-branched
>>>> 528f0708b6db+ 4.2.0-branched
>>>>
>>>> It seemed the same. It was extremely slow with 7 vcpus, and with 2 vcpus
>>>> it was slow, but fast enough that I could bother to log in and out
>>>> during the test.
>>>>
>>>> Attached are logs generated with this command (using xm instead of xl):
>>>> for i in {1..30}; do xm debug-keys d; xm dmesg -c; done >> nameoflog
>>>>
>>>> xenxp_xm_dmesg_-c_7cpus_idle.log
>>>> xenxp_xm_dmesg_-c_7cpus_logintooslow.log
>>>> xenxp_xm_dmesg_-c_7cpus_shutdown.log
>>>> xenxp_xm_dmesg_-c_duringlogin.log
>>>> xenxp_xm_dmesg_-c_idling_login_screen.log
>>>>
>>>> Also there is xenxp_dmesg.log which is output from hitting alt+sysrq+w
>>>> and p in case it's relevant.
>>>>
>>>> BTW this time I am testing with kernel 3.6.7
>>>>
>>>
>>> I also tested 4.2.1 now, and it has the same problem. And after using it
>>> for a while with windows 8 (playing games), I get the general feel that
>>> it is laggier than with 4.1.3. And now I'm using 4.1.4 which is fast
>>> like 4.1.3.
>>>
>>> So any ideas on how to fix this or gather more useful information?
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
2013-01-18 14:40 ` Andres Lagar-Cavilla
@ 2013-01-21 12:07 ` George Dunlap
0 siblings, 0 replies; 16+ messages in thread
From: George Dunlap @ 2013-01-21 12:07 UTC (permalink / raw)
To: Andres Lagar-Cavilla
Cc: Tim (Xen.org), Peter Maloney, xen-devel@lists.xen.org
[-- Attachment #1.1: Type: text/plain, Size: 851 bytes --]
On Fri, Jan 18, 2013 at 2:40 PM, Andres Lagar-Cavilla <
andreslc@gridcentric.ca> wrote:
> I unfortunately don't have AMD hardware to test with. On our regular win7
> EPT-based workloads we've seen no noticeable degradations.
>
> The answer to this one obviously lies in profiling the right sw/hw combo.
>
It seems like given the nature of the work you're doing, having at least a
couple of AMD boxes to test on would make sense. It would be a shame for
your company to lose a big deal because (for example) a potential customer
had just bought 100's of AMD boxes, and your extensions just had terrible
performance on their already-paid-for-and-installed hardware.
I've got a box here that exhibits the behavior (or something like it
anyway), but I probably wouldn't have a chance to look at it until the end
of February at the earliest.
-George
[-- Attachment #1.2: Type: text/html, Size: 1241 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
2013-01-18 14:30 ` George Dunlap
@ 2013-01-26 12:30 ` Peter Maloney
0 siblings, 0 replies; 16+ messages in thread
From: Peter Maloney @ 2013-01-26 12:30 UTC (permalink / raw)
To: George Dunlap; +Cc: xen-devel@lists.xen.org
[-- Attachment #1.1: Type: text/plain, Size: 4485 bytes --]
On 01/18/2013 03:30 PM, George Dunlap wrote:
> On Sat, Jan 12, 2013 at 3:25 PM, Peter Maloney
> <peter.maloney@brockmann-consult.de
> <mailto:peter.maloney@brockmann-consult.de>> wrote:
>
> On 11/22/2012 07:54 PM, Peter Maloney wrote:
> > On 11/13/2012 02:17 PM, Peter Maloney wrote:
> >> On 2012-11-01 18:28, Andres Lagar-Cavilla wrote:
> >>> On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org
> <mailto:tim@xen.org>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote:
> >>>>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote:
> >>>>>> The change was 8 months ago
> >>>>>>
> >>>>>> changeset: 24770:7f79475d3de7
> >>>>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org
> <mailto:andres@lagarcavilla.org>>
> >>>>>> date: Fri Feb 10 16:07:07 2012 +0000
> >>>>>> summary: x86/mm: Make p2m lookups fully synchronized
> wrt modifications
> >>>> [...]
> >>> Not any immediate ideas without profiling.
> >>>
> >>> However, most callers of hvmemul_do_io pass a stub zero
> ram_gpa address. We might be madly hitting the p2m locks for no
> reason there.
> >>>
> >>> How about the following patch, Peter, Tim?
> > I tried the patch applied to xen-unstable 4.2.0-branched
> > 528f0708b6db+ 4.2.0-branched
> >
> > It seemed the same. It was extremely slow with 7 vcpus, and with
> 2 vcpus
> > it was slow, but fast enough that I could bother to log in and out
> > during the test.
> >
> > Attached are logs generated with this command (using xm instead
> of xl):
> > for i in {1..30}; do xm debug-keys d; xm dmesg -c; done >> nameoflog
> >
> > xenxp_xm_dmesg_-c_7cpus_idle.log
> > xenxp_xm_dmesg_-c_7cpus_logintooslow.log
> > xenxp_xm_dmesg_-c_7cpus_shutdown.log
> > xenxp_xm_dmesg_-c_duringlogin.log
> > xenxp_xm_dmesg_-c_idling_login_screen.log
> >
> > Also there is xenxp_dmesg.log which is output from hitting
> alt+sysrq+w
> > and p in case it's relevant.
> >
> > BTW this time I am testing with kernel 3.6.7
> >
>
> I also tested 4.2.1 now, and it has the same problem. And after
> using it
> for a while with windows 8 (playing games), I get the general feel
> that
> it is laggier than with 4.1.3. And now I'm using 4.1.4 which is fast
> like 4.1.3.
>
> So any ideas on how to fix this or gather more useful information?
>
>
> Pete,
>
> One thing that would be helpful is if we could have a quantifiable
> difference, other than "feels laggier". If this is related to the
> problem I saw a few months ago, running winXP and looking at "top" in
> qemu is pretty clear. If you have a bit of time, do you suppose you
> could try to look around for a freely-available benchmark that would
> give us some numbers for Windows 8? That might help us track down the
> problem better as well.
>
> I've put this on my 4.3 release tracking list, so it should get attention.
>
> -George
Unfortunately, "Feels laggier" is hard to put into numbers. I don't mean
the fps, iops is lower, or the cpu % is higher, etc. I mean that at
random times something I did doesn't respond as quickly as normal. Or in
other words not 10% slower 100% of the time (which benchmarks seem to
say, and doesn't bother me so much, but should bother businesses), but
more like 800% slower 1% of the time (which is bad for games, so bothers
me much but might not matter as much to businesses).
Attached is a tgz of benchmark results from PCMark7 on win8 preview, and
PCMark5 on winxp, comparing xen 4.1.4 and 4.2.1.
(and when shutting down win8 in 4.2.1, skype in dom0 stopped working,
and no more domus would start, so I had to reboot to do the xp test, but
I believe this is a separate issue.)
When testing XP and 4.2.1, here is what xm top showed for cpu% (2 vcpus):
Idle: around 80-95%
During single tests: around 130%
During the last test which does 4 or 5 tests together: 198.2%
When testing 8 and 4.2.1,
Idle: dom0 44%, domu 24%
During tests around 200% (seems normal)
When testing XP and 4.1.4 ... mdadm decided to resync a disk at this
time (it does this occasionally when I reboot many times, esp. when
things go wrong like the 'no more domus would start' issue), so this may
be terribly invalid:
Idle: dom0 44%, domu bouncing between 30-70%
Is this the sort of thing you expected from me?
Peter
[-- Attachment #1.2: Type: text/html, Size: 7475 bytes --]
[-- Attachment #2: xenbenchmarks.tgz --]
[-- Type: application/x-compressed-tar, Size: 3392978 bytes --]
[-- Attachment #3: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2013-01-26 12:30 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-20 17:21 xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7 Peter Maloney
2012-10-20 18:40 ` Peter Maloney
2012-10-22 13:56 ` Andres Lagar-Cavilla
2012-10-22 13:59 ` Tim Deegan
2012-10-23 22:17 ` Peter Maloney
2012-11-01 17:00 ` Tim Deegan
2012-11-01 17:28 ` Andres Lagar-Cavilla
2012-11-13 13:17 ` Peter Maloney
2012-11-22 18:54 ` Peter Maloney
2013-01-12 15:25 ` Peter Maloney
2013-01-17 20:57 ` Pasi Kärkkäinen
2013-01-18 14:22 ` George Dunlap
2013-01-18 14:40 ` Andres Lagar-Cavilla
2013-01-21 12:07 ` George Dunlap
2013-01-18 14:30 ` George Dunlap
2013-01-26 12:30 ` Peter Maloney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).