* Poor performance on HVM (kernbench)
@ 2008-09-10 18:23 Todd Deshane
2008-09-10 21:22 ` Todd Deshane
` (3 more replies)
0 siblings, 4 replies; 30+ messages in thread
From: Todd Deshane @ 2008-09-10 18:23 UTC (permalink / raw)
To: xen-devel mailing list
Hi All,
We are continuing our Xen vs. KVM benchmarking that I presented at Xen summit.
This time, we are focusing on newer versions and also planning to
include Xen HVM
and KVM with PV drivers results. As well as adding some more tests.
I have setup Xen 3.3 from source, and am using Linux 2.6.27-rc4 for
all the guests.
Below are some raw kernbench results, which clearly show that I have a problem
with Xen HVM. It may just be a configuration issue, but we have tried
all that we
could think of so far (i.e file:, instead of tap:aio). I have also
tried xen-unstable and
it doesn't seem to produce any better results. I am also in the
process of trying
kernbench on older versions of Xen HVM.
here is the xm command line
xm create /dev/null name=benchvm0 memory=2048
kernel="/usr/lib/xen/boot/hvmloader" builder="hvm"
device_model=/usr/lib64/xen/bin/qemu-dm
disk=file:/root/benchvm/bin/img-perf_xen_hvm_test1/image-0.img,hda,w
vnc=1 vncdisplay=0 vif=mac=AA:BB:CC:DD:EE:00,bridge=br0
vif=mac=AA:BB:CC:DD:EE:7b,bridge=br1 vncviewer="yes"
on_poweroff=destroy on_reboot=restart on_crash=preserve
I will also consider an IO test, such as iozone to see if
the disk IO problems are a cause. The dom0 cpu
doesn't seem to be under much load at all during the
kernbench run.
System time on the kernbench run is 1/2 of the time, so does
that suggest either disk IO or guest scheduling problem?
System time on the other cases is 1/4 or less on the other
cases.
If anybody has any ideas, suggestions, or can even run Xen HVM kernbench
vs. native on their setup to compare against that would be very helpful.
The system setup is a Intel core2 dual 4 GB of ram.
The HVM guest does run the libata driver similar to KVM with emulated drivers.
Thanks,
Todd
KVM PV drivers
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 527.572 (0.681337)
User Time 404.3 (0.982141)
System Time 122.552 (0.468636)
Percent CPU 99 (0)
Context Switches 116020 (180.82)
Sleeps 31307 (94.2072)
KVM Emulated drivers
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 527.968 (0.450744)
User Time 403.95 (0.342929)
System Time 122.134 (0.550709)
Percent CPU 99 (0)
Context Switches 115907 (214.3)
Sleeps 31302.4 (88.7175)
Xen PV
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 446.876 (0.130115)
User Time 392.088 (0.339367)
System Time 54.76 (0.391088)
Percent CPU 99 (0)
Context Switches 64601.4 (163.314)
Sleeps 31214.8 (183.53)
Xen HVM
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 2081.71 (34.0459)
User Time 617.36 (3.61771)
System Time 1430.36 (28.3309)
Percent CPU 98 (0)
Context Switches 331843 (5283.28)
Sleeps 37329.8 (91.538)
KVM Native (Linux)
Average Optimal load -j 8 Run (std deviation):
Elapsed Time 216.076 (0.121778)
User Time 381.122 (0.259557)
System Time 43.242 (0.278783)
Percent CPU 196 (0)
Context Switches 75483.2 (389.988)
Sleeps 38078.8 (354.267)
Xen native 2.6.18.8 dom0 kernel
Average Optimal load -j 8 Run (std deviation):
Elapsed Time 228.504 (0.0808084)
User Time 384.014 (0.657632)
System Time 64.028 (0.733669)
Percent CPU 195.8 (0.447214)
Context Switches 35270.4 (264.36)
Sleeps 39493.4 (266.222)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor performance on HVM (kernbench)
2008-09-10 18:23 Poor performance on HVM (kernbench) Todd Deshane
@ 2008-09-10 21:22 ` Todd Deshane
2008-09-10 22:42 ` Anthony Liguori
2008-09-11 10:00 ` Gianluca Guida
2008-09-10 21:37 ` Daniel Magenheimer
` (2 subsequent siblings)
3 siblings, 2 replies; 30+ messages in thread
From: Todd Deshane @ 2008-09-10 21:22 UTC (permalink / raw)
To: xen-devel mailing list; +Cc: Muli Ben-Yehuda, Anthony Liguori
As an update, Xen HVM on Xen 3.2 on Ubuntu 8.04 from packages:
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 954.4 (4.95457)
User Time 441.744 (2.56251)
System Time 506.018 (7.45156)
Percent CPU 99 (0)
Context Switches 160222 (1113.68)
Sleeps 37604.8 (182.796)
This is actually more what would be expected of Xen 3.2 right?
Xen 3.3 should be an improvement with shadow3 right?
Should I need to adjust the shadow_memory parameter for the guest?
I'm going to try Xen 3.2.1 from source next.
Todd
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Poor performance on HVM (kernbench)
2008-09-10 18:23 Poor performance on HVM (kernbench) Todd Deshane
2008-09-10 21:22 ` Todd Deshane
@ 2008-09-10 21:37 ` Daniel Magenheimer
2008-09-10 21:42 ` Todd Deshane
2008-09-11 6:31 ` Pasi Kärkkäinen
2008-09-11 16:02 ` Todd Deshane
3 siblings, 1 reply; 30+ messages in thread
From: Daniel Magenheimer @ 2008-09-10 21:37 UTC (permalink / raw)
To: deshantm, xen-devel mailing list
This doesn't answer the HVM question but it appears that
you are running guests with 1 vCPU but comparing against
a dual-CPU native. True?
> -----Original Message-----
> From: Todd Deshane [mailto:deshantm@gmail.com]
> Sent: Wednesday, September 10, 2008 12:23 PM
> To: xen-devel mailing list
> Subject: [Xen-devel] Poor performance on HVM (kernbench)
>
>
> Hi All,
>
> We are continuing our Xen vs. KVM benchmarking that I
> presented at Xen summit.
>
> This time, we are focusing on newer versions and also planning to
> include Xen HVM
> and KVM with PV drivers results. As well as adding some more tests.
>
> I have setup Xen 3.3 from source, and am using Linux 2.6.27-rc4 for
> all the guests.
>
> Below are some raw kernbench results, which clearly show that
> I have a problem
> with Xen HVM. It may just be a configuration issue, but we have tried
> all that we
> could think of so far (i.e file:, instead of tap:aio). I have also
> tried xen-unstable and
> it doesn't seem to produce any better results. I am also in the
> process of trying
> kernbench on older versions of Xen HVM.
>
> here is the xm command line
> xm create /dev/null name=benchvm0 memory=2048
> kernel="/usr/lib/xen/boot/hvmloader" builder="hvm"
> device_model=/usr/lib64/xen/bin/qemu-dm
> disk=file:/root/benchvm/bin/img-perf_xen_hvm_test1/image-0.img,hda,w
> vnc=1 vncdisplay=0 vif=mac=AA:BB:CC:DD:EE:00,bridge=br0
> vif=mac=AA:BB:CC:DD:EE:7b,bridge=br1 vncviewer="yes"
> on_poweroff=destroy on_reboot=restart on_crash=preserve
>
> I will also consider an IO test, such as iozone to see if
> the disk IO problems are a cause. The dom0 cpu
> doesn't seem to be under much load at all during the
> kernbench run.
>
> System time on the kernbench run is 1/2 of the time, so does
> that suggest either disk IO or guest scheduling problem?
>
> System time on the other cases is 1/4 or less on the other
> cases.
>
> If anybody has any ideas, suggestions, or can even run Xen
> HVM kernbench
> vs. native on their setup to compare against that would be
> very helpful.
>
> The system setup is a Intel core2 dual 4 GB of ram.
> The HVM guest does run the libata driver similar to KVM with
> emulated drivers.
>
> Thanks,
> Todd
>
> KVM PV drivers
>
> Average Optimal load -j 4 Run (std deviation):
> Elapsed Time 527.572 (0.681337)
> User Time 404.3 (0.982141)
> System Time 122.552 (0.468636)
> Percent CPU 99 (0)
> Context Switches 116020 (180.82)
> Sleeps 31307 (94.2072)
>
>
> KVM Emulated drivers
>
> Average Optimal load -j 4 Run (std deviation):
> Elapsed Time 527.968 (0.450744)
> User Time 403.95 (0.342929)
> System Time 122.134 (0.550709)
> Percent CPU 99 (0)
> Context Switches 115907 (214.3)
> Sleeps 31302.4 (88.7175)
>
> Xen PV
>
> Average Optimal load -j 4 Run (std deviation):
> Elapsed Time 446.876 (0.130115)
> User Time 392.088 (0.339367)
> System Time 54.76 (0.391088)
> Percent CPU 99 (0)
> Context Switches 64601.4 (163.314)
> Sleeps 31214.8 (183.53)
>
>
>
> Xen HVM
>
> Average Optimal load -j 4 Run (std deviation):
> Elapsed Time 2081.71 (34.0459)
> User Time 617.36 (3.61771)
> System Time 1430.36 (28.3309)
> Percent CPU 98 (0)
> Context Switches 331843 (5283.28)
> Sleeps 37329.8 (91.538)
>
>
> KVM Native (Linux)
>
> Average Optimal load -j 8 Run (std deviation):
> Elapsed Time 216.076 (0.121778)
> User Time 381.122 (0.259557)
> System Time 43.242 (0.278783)
> Percent CPU 196 (0)
> Context Switches 75483.2 (389.988)
> Sleeps 38078.8 (354.267)
>
>
> Xen native 2.6.18.8 dom0 kernel
>
> Average Optimal load -j 8 Run (std deviation):
> Elapsed Time 228.504 (0.0808084)
> User Time 384.014 (0.657632)
> System Time 64.028 (0.733669)
> Percent CPU 195.8 (0.447214)
> Context Switches 35270.4 (264.36)
> Sleeps 39493.4 (266.222)
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor performance on HVM (kernbench)
2008-09-10 21:37 ` Daniel Magenheimer
@ 2008-09-10 21:42 ` Todd Deshane
2008-09-10 21:45 ` Daniel Magenheimer
2008-09-10 21:51 ` Steve Ofsthun
0 siblings, 2 replies; 30+ messages in thread
From: Todd Deshane @ 2008-09-10 21:42 UTC (permalink / raw)
To: Daniel Magenheimer; +Cc: xen-devel mailing list
On Wed, Sep 10, 2008 at 5:37 PM, Daniel Magenheimer
<dan.magenheimer@oracle.com> wrote:
> This doesn't answer the HVM question but it appears that
> you are running guests with 1 vCPU but comparing against
> a dual-CPU native. True?
Yes. The intuition is that we don't want to overcommit virtual CPUs
since then you are stressing the schedulers more.
I ran some tests with 2 vCPUs and all the numbers are a bit higher (as
having two CPUs tackling a compile is faster).
Although overcommit (of CPUs and memory ;) is interesting, we leave a
CPU dedicated to the host system (linux/dom0)
on purpose.
Todd
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Poor performance on HVM (kernbench)
2008-09-10 21:42 ` Todd Deshane
@ 2008-09-10 21:45 ` Daniel Magenheimer
2008-09-11 2:48 ` Todd Deshane
2008-09-10 21:51 ` Steve Ofsthun
1 sibling, 1 reply; 30+ messages in thread
From: Daniel Magenheimer @ 2008-09-10 21:45 UTC (permalink / raw)
To: deshantm; +Cc: xen-devel mailing list
Perhaps you should run the native with nosmp then, to ensure
the comparison isn't taken out of context.
> -----Original Message-----
> From: Todd Deshane [mailto:deshantm@gmail.com]
> Sent: Wednesday, September 10, 2008 3:42 PM
> To: Daniel Magenheimer
> Cc: xen-devel mailing list
> Subject: Re: [Xen-devel] Poor performance on HVM (kernbench)
>
>
> On Wed, Sep 10, 2008 at 5:37 PM, Daniel Magenheimer
> <dan.magenheimer@oracle.com> wrote:
> > This doesn't answer the HVM question but it appears that
> > you are running guests with 1 vCPU but comparing against
> > a dual-CPU native. True?
>
> Yes. The intuition is that we don't want to overcommit virtual CPUs
> since then you are stressing the schedulers more.
>
> I ran some tests with 2 vCPUs and all the numbers are a bit higher (as
> having two CPUs tackling a compile is faster).
>
> Although overcommit (of CPUs and memory ;) is interesting, we leave a
> CPU dedicated to the host system (linux/dom0)
> on purpose.
>
> Todd
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor performance on HVM (kernbench)
2008-09-10 21:42 ` Todd Deshane
2008-09-10 21:45 ` Daniel Magenheimer
@ 2008-09-10 21:51 ` Steve Ofsthun
2008-09-11 2:50 ` Todd Deshane
1 sibling, 1 reply; 30+ messages in thread
From: Steve Ofsthun @ 2008-09-10 21:51 UTC (permalink / raw)
To: deshantm; +Cc: Daniel Magenheimer, xen-devel mailing list
Todd Deshane wrote:
> On Wed, Sep 10, 2008 at 5:37 PM, Daniel Magenheimer
> <dan.magenheimer@oracle.com> wrote:
>> This doesn't answer the HVM question but it appears that
>> you are running guests with 1 vCPU but comparing against
>> a dual-CPU native. True?
>
> Yes. The intuition is that we don't want to overcommit virtual CPUs
> since then you are stressing the schedulers more.
I think what Dan is getting at is, the native execution run should restrict it's cpu and memory usage to be identical with the guest tests. So restrict the native test cpus with "maxcpus=1" or "nosmp" on the boot line. Similarly you can restrict memory using "mem=xxxM".
> I ran some tests with 2 vCPUs and all the numbers are a bit higher (as
> having two CPUs tackling a compile is faster).
>
> Although overcommit (of CPUs and memory ;) is interesting, we leave a
> CPU dedicated to the host system (linux/dom0)
> on purpose.
>
> Todd
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor performance on HVM (kernbench)
2008-09-10 21:22 ` Todd Deshane
@ 2008-09-10 22:42 ` Anthony Liguori
2008-09-11 2:52 ` Todd Deshane
2008-09-11 9:35 ` George Dunlap
2008-09-11 10:00 ` Gianluca Guida
1 sibling, 2 replies; 30+ messages in thread
From: Anthony Liguori @ 2008-09-10 22:42 UTC (permalink / raw)
To: deshantm
Cc: Muli Ben-Yehuda, Anthony Liguori, xen-devel mailing list,
Marcelo Tosatti
Todd Deshane wrote:
> As an update, Xen HVM on Xen 3.2 on Ubuntu 8.04 from packages:
>
> Average Optimal load -j 4 Run (std deviation):
> Elapsed Time 954.4 (4.95457)
> User Time 441.744 (2.56251)
> System Time 506.018 (7.45156)
> Percent CPU 99 (0)
> Context Switches 160222 (1113.68)
> Sleeps 37604.8 (182.796)
>
> This is actually more what would be expected of Xen 3.2 right?
It's pretty close to what I've seen. In my experience with shadow2, xen
pv is about twice as fast with kernbench. You're results for a pv were:
Elapsed Time 446.876 (0.130115)
So this result is a bit higher than what I've seen, but certainly within
the realm of possibility.
> Xen 3.3 should be an improvement with shadow3 right?
I know it is for Windows, but there's always the possibility that it has
caused a regression in Linux performance.
Regards,
Anthony Liguori
> Should I need to adjust the shadow_memory parameter for the guest?
>
> I'm going to try Xen 3.2.1 from source next.
>
> Todd
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor performance on HVM (kernbench)
2008-09-10 21:45 ` Daniel Magenheimer
@ 2008-09-11 2:48 ` Todd Deshane
0 siblings, 0 replies; 30+ messages in thread
From: Todd Deshane @ 2008-09-11 2:48 UTC (permalink / raw)
To: Daniel Magenheimer; +Cc: xen-devel mailing list
On Wed, Sep 10, 2008 at 5:45 PM, Daniel Magenheimer
<dan.magenheimer@oracle.com> wrote:
> Perhaps you should run the native with nosmp then, to ensure
> the comparison isn't taken out of context.
>
Dom0, nosmp
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 428.058 (0.18431)
User Time 371.648 (0.86358)
System Time 56.346 (0.989181)
Percent CPU 99 (0)
Context Switches 29144.6 (52.3479)
Sleeps 36239.8 (441.198)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor performance on HVM (kernbench)
2008-09-10 21:51 ` Steve Ofsthun
@ 2008-09-11 2:50 ` Todd Deshane
0 siblings, 0 replies; 30+ messages in thread
From: Todd Deshane @ 2008-09-11 2:50 UTC (permalink / raw)
To: Steve Ofsthun; +Cc: Daniel Magenheimer, xen-devel mailing list
> I think what Dan is getting at is, the native execution run should restrict
> it's cpu and memory usage to be identical with the guest tests. So restrict
> the native test cpus with "maxcpus=1" or "nosmp" on the boot line.
> Similarly you can restrict memory using "mem=xxxM".
Linux kernel (only had a spare Ubuntu 2.6.24) to work with
nosmp mem=2048M
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 403.744 (0.26857)
User Time 373.23 (0.421485)
System Time 30.32 (0.389551)
Percent CPU 99 (0)
Context Switches 90961.2 (105.838)
Sleeps 52311.8 (83.8373)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor performance on HVM (kernbench)
2008-09-10 22:42 ` Anthony Liguori
@ 2008-09-11 2:52 ` Todd Deshane
2008-09-11 9:35 ` George Dunlap
1 sibling, 0 replies; 30+ messages in thread
From: Todd Deshane @ 2008-09-11 2:52 UTC (permalink / raw)
To: Anthony Liguori
Cc: Muli Ben-Yehuda, Anthony Liguori, xen-devel mailing list,
Marcelo Tosatti
Another quick update:
xen-unstable HVM guest
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 1859.81 (30.4446)
User Time 595.756 (3.71579)
System Time 1246.4 (25.0328)
Percent CPU 98.8 (0.447214)
Context Switches 298999 (4736.48)
Sleeps 37258.2 (75.3638)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor performance on HVM (kernbench)
2008-09-10 18:23 Poor performance on HVM (kernbench) Todd Deshane
2008-09-10 21:22 ` Todd Deshane
2008-09-10 21:37 ` Daniel Magenheimer
@ 2008-09-11 6:31 ` Pasi Kärkkäinen
2008-09-11 16:02 ` Todd Deshane
3 siblings, 0 replies; 30+ messages in thread
From: Pasi Kärkkäinen @ 2008-09-11 6:31 UTC (permalink / raw)
To: Todd Deshane; +Cc: xen-devel mailing list
On Wed, Sep 10, 2008 at 02:23:17PM -0400, Todd Deshane wrote:
> Hi All,
>
> We are continuing our Xen vs. KVM benchmarking that I presented at Xen summit.
>
> This time, we are focusing on newer versions and also planning to
> include Xen HVM
> and KVM with PV drivers results. As well as adding some more tests.
>
> I have setup Xen 3.3 from source, and am using Linux 2.6.27-rc4 for
> all the guests.
>
> Below are some raw kernbench results, which clearly show that I have a problem
> with Xen HVM. It may just be a configuration issue, but we have tried
> all that we could think of so far (i.e file:, instead of tap:aio).
You could also try "phy:" and use raw devices or LVM volumes.. I think this
should be the best performing method?
Then again I'm pretty sure changing that doesn't explain/change your (bad) HVM results..
-- Pasi
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-10 22:42 ` Anthony Liguori
2008-09-11 2:52 ` Todd Deshane
@ 2008-09-11 9:35 ` George Dunlap
2008-09-11 15:30 ` Todd Deshane
1 sibling, 1 reply; 30+ messages in thread
From: George Dunlap @ 2008-09-11 9:35 UTC (permalink / raw)
To: Anthony Liguori
Cc: Muli Ben-Yehuda, deshantm, Anthony Liguori,
xen-devel mailing list, Marcelo Tosatti
On Wed, Sep 10, 2008 at 11:42 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
>> Xen 3.3 should be an improvement with shadow3 right?
>
> I know it is for Windows, but there's always the possibility that it has
> caused a regression in Linux performance.
Shadow3 was definitely developed with Windows in mind. Since it makes
shadows act more like a hardware TLB, I'd expect it to perform better,
or at least no worse; but since that's the biggest change with Xen HVM
between 3.2 and 3.3, that's the first place I'd look.
Todd, would it be possible to send me a 30-second xentrace "sample"
of kernbench running under 3.2 and 3.3? The relevant command:
xentrace -S 256 -e all /tmp/[filename].trace
Set the kernbench run going in the guest, let it get going for about
30 seconds or so, and then start xentrace. Let it run for 30 seconds,
then kill it. In 3.3, you can use the -T parameter to have it stop
after 30 seconds; in 3.2, you can do something like:
xentrace -S 256 -e all /tmp/[filename].trace & sleep 30 ; killall -INT xentrace
You can send me the files via something like http://yousendit.com.
If you could possibly take a trace with a recent xen-unstable build,
that would be even more helpful, since there are some key xentrace
changes that make the information even more useful.
Thanks,
-George
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-10 21:22 ` Todd Deshane
2008-09-10 22:42 ` Anthony Liguori
@ 2008-09-11 10:00 ` Gianluca Guida
2008-09-11 15:17 ` Gianluca Guida
1 sibling, 1 reply; 30+ messages in thread
From: Gianluca Guida @ 2008-09-11 10:00 UTC (permalink / raw)
To: deshantm; +Cc: Muli Ben-Yehuda, Anthony Liguori, xen-devel mailing list
Todd Deshane wrote:
> Xen 3.3 should be an improvement with shadow3 right?
As other people already said, shadow3 (especially the current unsync
policy) was mostly developed with Windows performance in mind. I would
be surprised if the shadow algorithm is causing such a performance loss
though; but in any case, you can disable the out-of-sync feature by
removing SHOPT_OUT_OF_SYNC from SHADOW_OPTIMIZATIONS in
xen/arch/x86/mm/shadows/private.h (setting it to 0xff instead of 0x1ff),
reverting the shadow code back to shadow 2 again.
Can you please test that and see if it makes any difference?
Thanks,
Gianluca
>
> Should I need to adjust the shadow_memory parameter for the guest?
>
> I'm going to try Xen 3.2.1 from source next.
>
> Todd
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-11 10:00 ` Gianluca Guida
@ 2008-09-11 15:17 ` Gianluca Guida
2008-09-11 15:25 ` Gianluca Guida
2008-09-11 15:35 ` Todd Deshane
0 siblings, 2 replies; 30+ messages in thread
From: Gianluca Guida @ 2008-09-11 15:17 UTC (permalink / raw)
To: deshantm; +Cc: Muli Ben-Yehuda, Anthony Liguori, xen-devel mailing list
[-- Attachment #1: Type: text/plain, Size: 1776 bytes --]
Hi,
Gianluca Guida wrote:
> Todd Deshane wrote:
>> Xen 3.3 should be an improvement with shadow3 right?
I made a few test, in an amd64 kernel, with shadow2 and shadow 3.
Results attached. What you can see is that in 1 vcpu environment the two
system compare very well (with shadow3 being 1.5% faster that shadow2,
system time being much lower). It's disturbing that in 2 vcpus, instead,
the shadow2 is about 11% faster. I'll try to look at that and make the
shadow3 algorithm a bit more linux-friendly but, in general, I don't
think that the slow down was due *only* to shadow3.
Was it a 32bit guest? PAE?
Thanks,
Gianluca
1 vcpu, shadow2
Thu Sep 11 14:09:14 EDT 2008
2.6.18-4-amd64
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 414.792 (0.4772)
User Time 303.276 (0.508409)
System Time 111.148 (0.592891)
Percent CPU 99.2 (0.447214)
Context Switches 25682.6 (146.046)
Sleeps 28827.8 (99.8884)
1 vcpu, shadow3
Thu Sep 11 13:16:23 EDT 2008
2.6.18-4-amd64
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 408.948 (1.96799)
User Time 326.482 (0.553191)
System Time 81.184 (2.0058)
Percent CPU 99 (0)
Context Switches 25239.6 (205.305)
Sleeps 28995 (152.91)
2 vcpus, shadow2
Thu Sep 11 11:59:27 EDT 2008
2.6.18-4-amd64
Average Optimal load -j 8 Run (std deviation):
Elapsed Time 223.144 (0.709704)
User Time 314.844 (0.933852)
System Time 121.536 (1.46827)
Percent CPU 195.2 (0.447214)
Context Switches 27307.4 (249.907)
Sleeps 34875.6 (212.731)
2 vcpus, shadow3
Thu Sep 11 12:32:41 EDT 2008
2.6.18-4-amd64
Average Optimal load -j 8 Run (std deviation):
Elapsed Time 251.832 (1.27152)
User Time 368.878 (0.745366)
System Time 124.472 (1.15751)
Percent CPU 195.4 (0.547723)
Context Switches 28585.2 (135.509)
Sleeps 35620.4 (447.023)
[-- Attachment #2: kernbench.log --]
[-- Type: text/x-log, Size: 1141 bytes --]
1 vcpu, shadow2
Thu Sep 11 14:09:14 EDT 2008
2.6.18-4-amd64
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 414.792 (0.4772)
User Time 303.276 (0.508409)
System Time 111.148 (0.592891)
Percent CPU 99.2 (0.447214)
Context Switches 25682.6 (146.046)
Sleeps 28827.8 (99.8884)
1 vcpu, shadow3
Thu Sep 11 13:16:23 EDT 2008
2.6.18-4-amd64
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 408.948 (1.96799)
User Time 326.482 (0.553191)
System Time 81.184 (2.0058)
Percent CPU 99 (0)
Context Switches 25239.6 (205.305)
Sleeps 28995 (152.91)
2 vcpus, shadow2
Thu Sep 11 11:59:27 EDT 2008
2.6.18-4-amd64
Average Optimal load -j 8 Run (std deviation):
Elapsed Time 223.144 (0.709704)
User Time 314.844 (0.933852)
System Time 121.536 (1.46827)
Percent CPU 195.2 (0.447214)
Context Switches 27307.4 (249.907)
Sleeps 34875.6 (212.731)
2 vcpus, shadow3
Thu Sep 11 12:32:41 EDT 2008
2.6.18-4-amd64
Average Optimal load -j 8 Run (std deviation):
Elapsed Time 251.832 (1.27152)
User Time 368.878 (0.745366)
System Time 124.472 (1.15751)
Percent CPU 195.4 (0.547723)
Context Switches 28585.2 (135.509)
Sleeps 35620.4 (447.023)
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-11 15:17 ` Gianluca Guida
@ 2008-09-11 15:25 ` Gianluca Guida
2008-09-11 15:35 ` Todd Deshane
1 sibling, 0 replies; 30+ messages in thread
From: Gianluca Guida @ 2008-09-11 15:25 UTC (permalink / raw)
To: xen-devel mailing list
Argh, sorry for putting twice the log.
I also forgot to tell that the test was done on a Intel Core2 6420 @
2.13GHz.
Gianluca
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-11 9:35 ` George Dunlap
@ 2008-09-11 15:30 ` Todd Deshane
0 siblings, 0 replies; 30+ messages in thread
From: Todd Deshane @ 2008-09-11 15:30 UTC (permalink / raw)
To: George Dunlap
Cc: Muli Ben-Yehuda, Anthony Liguori, xen-devel mailing list,
Anthony Liguori, Marcelo Tosatti
On Thu, Sep 11, 2008 at 5:35 AM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> On Wed, Sep 10, 2008 at 11:42 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
>>> Xen 3.3 should be an improvement with shadow3 right?
>>
>> I know it is for Windows, but there's always the possibility that it has
>> caused a regression in Linux performance.
>
> Shadow3 was definitely developed with Windows in mind. Since it makes
> shadows act more like a hardware TLB, I'd expect it to perform better,
> or at least no worse; but since that's the biggest change with Xen HVM
> between 3.2 and 3.3, that's the first place I'd look.
>
> Todd, would it be possible to send me a 30-second xentrace "sample"
> of kernbench running under 3.2 and 3.3? The relevant command:
>
> xentrace -S 256 -e all /tmp/[filename].trace
>
> Set the kernbench run going in the guest, let it get going for about
> 30 seconds or so, and then start xentrace. Let it run for 30 seconds,
> then kill it. In 3.3, you can use the -T parameter to have it stop
> after 30 seconds; in 3.2, you can do something like:
>
> xentrace -S 256 -e all /tmp/[filename].trace & sleep 30 ; killall -INT xentrace
>
> You can send me the files via something like http://yousendit.com.
>
> If you could possibly take a trace with a recent xen-unstable build,
> that would be even more helpful, since there are some key xentrace
> changes that make the information even more useful.
>
George: I sent both xen 3.2.1 and xen-unstable straces to you with the
service you suggested.
Let me know if you have any problems getting them.
If anyone else would like to see the traces, just let me know.
Cheers,
Todd
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-11 15:17 ` Gianluca Guida
2008-09-11 15:25 ` Gianluca Guida
@ 2008-09-11 15:35 ` Todd Deshane
2008-09-11 16:48 ` Gianluca Guida
2008-09-11 17:25 ` Gianluca Guida
1 sibling, 2 replies; 30+ messages in thread
From: Todd Deshane @ 2008-09-11 15:35 UTC (permalink / raw)
To: Gianluca Guida; +Cc: Muli Ben-Yehuda, Anthony Liguori, xen-devel mailing list
On Thu, Sep 11, 2008 at 11:17 AM, Gianluca Guida
<gianluca.guida@eu.citrix.com> wrote:
> Hi,
>
> Gianluca Guida wrote:
>>
>> Todd Deshane wrote:
>>>
>>> Xen 3.3 should be an improvement with shadow3 right?
>
> I made a few test, in an amd64 kernel, with shadow2 and shadow 3.
>
> Results attached. What you can see is that in 1 vcpu environment the two
> system compare very well (with shadow3 being 1.5% faster that shadow2,
> system time being much lower). It's disturbing that in 2 vcpus, instead, the
> shadow2 is about 11% faster. I'll try to look at that and make the shadow3
> algorithm a bit more linux-friendly but, in general, I don't think that the
> slow down was due *only* to shadow3.
>
> Was it a 32bit guest? PAE?
>
The guest is 64 bit
Can you also run kernbench on native for comparison?
We have a fairly similar setup, mine is
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz
4 GB of Ram
How much RAM do you have (native and guest)?
Are all your tests on Xen unstable with the code changes on and off as
you suggested?
What is the backend disk type for your HVM guest?
What is the kernel in your HVM guest?
I will make the same changes to the xen unstable code and re-run
kernbench with shadow3 disabled
on my system.
Thanks,
Todd
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Poor performance on HVM (kernbench)
2008-09-10 18:23 Poor performance on HVM (kernbench) Todd Deshane
` (2 preceding siblings ...)
2008-09-11 6:31 ` Pasi Kärkkäinen
@ 2008-09-11 16:02 ` Todd Deshane
3 siblings, 0 replies; 30+ messages in thread
From: Todd Deshane @ 2008-09-11 16:02 UTC (permalink / raw)
To: xen-devel mailing list
another number
Xen 3.2.1 HVM guest, much faster than on 3.3/unstable
Elapsed Time 834.082 (3.25046)
User Time 492.68 (1.61651)
System Time 328.778 (1.78148)
Percent CPU 98 (0)
Context Switches 146272 (437.262)
Sleeps 36858 (127.805)
On Wed, Sep 10, 2008 at 2:23 PM, Todd Deshane <deshantm@gmail.com> wrote:
> Hi All,
>
> We are continuing our Xen vs. KVM benchmarking that I presented at Xen summit.
>
> This time, we are focusing on newer versions and also planning to
> include Xen HVM
> and KVM with PV drivers results. As well as adding some more tests.
>
> I have setup Xen 3.3 from source, and am using Linux 2.6.27-rc4 for
> all the guests.
>
> Below are some raw kernbench results, which clearly show that I have a problem
> with Xen HVM. It may just be a configuration issue, but we have tried
> all that we
> could think of so far (i.e file:, instead of tap:aio). I have also
> tried xen-unstable and
> it doesn't seem to produce any better results. I am also in the
> process of trying
> kernbench on older versions of Xen HVM.
>
> here is the xm command line
> xm create /dev/null name=benchvm0 memory=2048
> kernel="/usr/lib/xen/boot/hvmloader" builder="hvm"
> device_model=/usr/lib64/xen/bin/qemu-dm
> disk=file:/root/benchvm/bin/img-perf_xen_hvm_test1/image-0.img,hda,w
> vnc=1 vncdisplay=0 vif=mac=AA:BB:CC:DD:EE:00,bridge=br0
> vif=mac=AA:BB:CC:DD:EE:7b,bridge=br1 vncviewer="yes"
> on_poweroff=destroy on_reboot=restart on_crash=preserve
>
> I will also consider an IO test, such as iozone to see if
> the disk IO problems are a cause. The dom0 cpu
> doesn't seem to be under much load at all during the
> kernbench run.
>
> System time on the kernbench run is 1/2 of the time, so does
> that suggest either disk IO or guest scheduling problem?
>
> System time on the other cases is 1/4 or less on the other
> cases.
>
> If anybody has any ideas, suggestions, or can even run Xen HVM kernbench
> vs. native on their setup to compare against that would be very helpful.
>
> The system setup is a Intel core2 dual 4 GB of ram.
> The HVM guest does run the libata driver similar to KVM with emulated drivers.
>
> Thanks,
> Todd
>
> KVM PV drivers
>
> Average Optimal load -j 4 Run (std deviation):
> Elapsed Time 527.572 (0.681337)
> User Time 404.3 (0.982141)
> System Time 122.552 (0.468636)
> Percent CPU 99 (0)
> Context Switches 116020 (180.82)
> Sleeps 31307 (94.2072)
>
>
> KVM Emulated drivers
>
> Average Optimal load -j 4 Run (std deviation):
> Elapsed Time 527.968 (0.450744)
> User Time 403.95 (0.342929)
> System Time 122.134 (0.550709)
> Percent CPU 99 (0)
> Context Switches 115907 (214.3)
> Sleeps 31302.4 (88.7175)
>
> Xen PV
>
> Average Optimal load -j 4 Run (std deviation):
> Elapsed Time 446.876 (0.130115)
> User Time 392.088 (0.339367)
> System Time 54.76 (0.391088)
> Percent CPU 99 (0)
> Context Switches 64601.4 (163.314)
> Sleeps 31214.8 (183.53)
>
>
>
> Xen HVM
>
> Average Optimal load -j 4 Run (std deviation):
> Elapsed Time 2081.71 (34.0459)
> User Time 617.36 (3.61771)
> System Time 1430.36 (28.3309)
> Percent CPU 98 (0)
> Context Switches 331843 (5283.28)
> Sleeps 37329.8 (91.538)
>
>
> KVM Native (Linux)
>
> Average Optimal load -j 8 Run (std deviation):
> Elapsed Time 216.076 (0.121778)
> User Time 381.122 (0.259557)
> System Time 43.242 (0.278783)
> Percent CPU 196 (0)
> Context Switches 75483.2 (389.988)
> Sleeps 38078.8 (354.267)
>
>
> Xen native 2.6.18.8 dom0 kernel
>
> Average Optimal load -j 8 Run (std deviation):
> Elapsed Time 228.504 (0.0808084)
> User Time 384.014 (0.657632)
> System Time 64.028 (0.733669)
> Percent CPU 195.8 (0.447214)
> Context Switches 35270.4 (264.36)
> Sleeps 39493.4 (266.222)
>
--
Todd Deshane
http://todddeshane.net
check out our book: http://runningxen.com
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-11 15:35 ` Todd Deshane
@ 2008-09-11 16:48 ` Gianluca Guida
2008-09-11 17:25 ` Gianluca Guida
1 sibling, 0 replies; 30+ messages in thread
From: Gianluca Guida @ 2008-09-11 16:48 UTC (permalink / raw)
To: deshantm; +Cc: Muli Ben-Yehuda, Anthony Liguori, xen-devel mailing list
Hello,
Todd Deshane wrote:
> Can you also run kernbench on native for comparison?
I will.
> How much RAM do you have (native and guest)?
2Gb host, 512Mb guest.
> Are all your tests on Xen unstable with the code changes on and off as
> you suggested?
Yes.
> What is the backend disk type for your HVM guest?
Here's my configuration for disks (no stubdomains, btw).
disk = [ 'file:/root/prova,hda,w',
'file:/local-images/debian-40r0-amd64-netinst.iso,hdc:cdrom,r']
> What is the kernel in your HVM guest?
It's the standard debian 4.0 I guess, 2.6.18-4-amd64.
> I will make the same changes to the xen unstable code and re-run
> kernbench with shadow3 disabled
> on my system.
Thanks, that would be interesting to know!
Gianluca
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-11 15:35 ` Todd Deshane
2008-09-11 16:48 ` Gianluca Guida
@ 2008-09-11 17:25 ` Gianluca Guida
2008-09-11 18:07 ` George Dunlap
1 sibling, 1 reply; 30+ messages in thread
From: Gianluca Guida @ 2008-09-11 17:25 UTC (permalink / raw)
To: deshantm; +Cc: Muli Ben-Yehuda, Anthony Liguori, xen-devel mailing list
Todd Deshane wrote:
> Can you also run kernbench on native for comparison?
Here tehy are, with a two CPUs dom0.
Thu Sep 11 13:03:57 EDT 2008
2.6.18.8-xen
Average Optimal load -j 8 Run (std deviation):
Elapsed Time 181.51 (0.550318)
User Time 300.494 (0.965572)
System Time 54.198 (0.784391)
Percent CPU 195 (0.707107)
Context Switches 26611.8 (205.029)
Sleeps 29778.8 (330.637)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-11 17:25 ` Gianluca Guida
@ 2008-09-11 18:07 ` George Dunlap
2008-09-11 18:26 ` Gianluca Guida
0 siblings, 1 reply; 30+ messages in thread
From: George Dunlap @ 2008-09-11 18:07 UTC (permalink / raw)
To: Gianluca Guida
Cc: Muli Ben-Yehuda, deshantm, Anthony Liguori,
xen-devel mailing list
So, the problem appears to be with a ton of brute-force searches to
remove writable mappings, both during resync and promotion. My
analysis tool is reporting that of the 30 seconds or so in the trace
from xen-unstable, the guest spent a whopping 67% in the hypervisor:
* 26% doing resyncs as a result of marking another page out-of-sync
* 9% promoting pages
* 27% resyncing as a result of cr3 switches
And almost the entirety of all of those can be attributed to
brute-force searches to remove writable mappings.
(Caveat emptor: My tool was designed to analyze XenServer product
traces, which have a different trace file format than xen-unstable.
I've just taught it to read the xen-unstable trace formats, so the
exact percentages may be incorrect still. But the preponderance of
brute-force searches is unmistakable.)
The good news is that if we can finger the cause of the brute-force
searches, we should be able to reduce all those numbers down to
respectable levels; my guess is totaling not more than 5%.
-George
On Thu, Sep 11, 2008 at 6:25 PM, Gianluca Guida
<gianluca.guida@eu.citrix.com> wrote:
> Todd Deshane wrote:
>>
>> Can you also run kernbench on native for comparison?
>
> Here tehy are, with a two CPUs dom0.
>
> Thu Sep 11 13:03:57 EDT 2008
> 2.6.18.8-xen
> Average Optimal load -j 8 Run (std deviation):
> Elapsed Time 181.51 (0.550318)
> User Time 300.494 (0.965572)
> System Time 54.198 (0.784391)
> Percent CPU 195 (0.707107)
> Context Switches 26611.8 (205.029)
> Sleeps 29778.8 (330.637)
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-11 18:07 ` George Dunlap
@ 2008-09-11 18:26 ` Gianluca Guida
2008-09-11 19:04 ` Todd Deshane
2008-09-12 11:04 ` George Dunlap
0 siblings, 2 replies; 30+ messages in thread
From: Gianluca Guida @ 2008-09-11 18:26 UTC (permalink / raw)
To: George Dunlap
Cc: Muli Ben-Yehuda, deshantm, Anthony Liguori,
xen-devel mailing list
George Dunlap wrote:
> So, the problem appears to be with a ton of brute-force searches to
> remove writable mappings, both during resync and promotion. My
> analysis tool is reporting that of the 30 seconds or so in the trace
> from xen-unstable, the guest spent a whopping 67% in the hypervisor:
> * 26% doing resyncs as a result of marking another page out-of-sync
> * 9% promoting pages
> * 27% resyncing as a result of cr3 switches
> And almost the entirety of all of those can be attributed to
> brute-force searches to remove writable mappings.
Fantastic (well, sort of)!
If I understand it correctly, Todd is using PV drivers in Linux HVM
guests, so the reason for brute-force search is due to former L1
page-tables being used as I/O pages, not being unshadowed because they
can get writable mappings out of it.
It is, shortly, an unshadowing problem. Should be `easy` to fix. I
wasn't using PV drivers, so I was not experiencing this behaviour.
Or, it could be a fixup table bug, but I doubt it.
George, did you saw excessive fixup faults in the trace?
Todd, could you try without PV drivers (plain qemu emulation) and see if
the results get better?
Thanks,
Gianluca
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-11 18:26 ` Gianluca Guida
@ 2008-09-11 19:04 ` Todd Deshane
2008-09-11 19:54 ` Todd Deshane
2008-09-11 20:02 ` Jeremy Fitzhardinge
2008-09-12 11:04 ` George Dunlap
1 sibling, 2 replies; 30+ messages in thread
From: Todd Deshane @ 2008-09-11 19:04 UTC (permalink / raw)
To: Gianluca Guida
Cc: Muli Ben-Yehuda, George Dunlap, xen-devel mailing list,
Anthony Liguori
On Thu, Sep 11, 2008 at 2:26 PM, Gianluca Guida
<gianluca.guida@eu.citrix.com> wrote:
> George Dunlap wrote:
>>
>> So, the problem appears to be with a ton of brute-force searches to
>> remove writable mappings, both during resync and promotion. My
>> analysis tool is reporting that of the 30 seconds or so in the trace
>> from xen-unstable, the guest spent a whopping 67% in the hypervisor:
>> * 26% doing resyncs as a result of marking another page out-of-sync
>> * 9% promoting pages
>> * 27% resyncing as a result of cr3 switches
>> And almost the entirety of all of those can be attributed to
>> brute-force searches to remove writable mappings.
>
> Fantastic (well, sort of)!
>
> If I understand it correctly, Todd is using PV drivers in Linux HVM guests,
> so the reason for brute-force search is due to former L1 page-tables being
> used as I/O pages, not being unshadowed because they can get writable
> mappings out of it.
> It is, shortly, an unshadowing problem. Should be `easy` to fix. I wasn't
> using PV drivers, so I was not experiencing this behaviour.
>
> Or, it could be a fixup table bug, but I doubt it.
>
> George, did you saw excessive fixup faults in the trace?
>
> Todd, could you try without PV drivers (plain qemu emulation) and see if the
> results get better?
To the best of my knowledge, I am not using PV on HVM drivers since
they are not upstream.
I am using 2.6.27-rc4 xen domU kernel, with the normal xen PV drivers
and KVM virtio built in.
Am I mistaken?
I am running the test with shadow3 disabled now.
I'll report the results when they come out.
Any other suggestions or things for me to try, let me know.
Cheers,
Todd
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-11 19:04 ` Todd Deshane
@ 2008-09-11 19:54 ` Todd Deshane
2008-09-11 20:02 ` Jeremy Fitzhardinge
1 sibling, 0 replies; 30+ messages in thread
From: Todd Deshane @ 2008-09-11 19:54 UTC (permalink / raw)
To: Gianluca Guida
Cc: Muli Ben-Yehuda, George Dunlap, xen-devel mailing list,
Anthony Liguori
> I am running the test with shadow3 disabled now.
> I'll report the results when they come out.
So with shadow3 disabled, the kernbench time is much
more reasonable. Better than 3.2.1 even.
xen-unstable HVM guest
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 737.144 (5.19414)
User Time 498.508 (2.52895)
System Time 235.056 (2.71348)
Percent CPU 99 (0)
Context Switches 133127 (823.517)
Sleeps 36295.4 (124.088)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-11 19:04 ` Todd Deshane
2008-09-11 19:54 ` Todd Deshane
@ 2008-09-11 20:02 ` Jeremy Fitzhardinge
2008-09-11 20:24 ` John Levon
2008-09-12 10:41 ` Gianluca Guida
1 sibling, 2 replies; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-09-11 20:02 UTC (permalink / raw)
To: deshantm
Cc: Muli Ben-Yehuda, Gianluca Guida, xen-devel mailing list,
George Dunlap, Anthony Liguori
Todd Deshane wrote:
> To the best of my knowledge, I am not using PV on HVM drivers since
> they are not upstream.
> I am using 2.6.27-rc4 xen domU kernel, with the normal xen PV drivers
> and KVM virtio built in.
> Am I mistaken?
>
No, there's no pv-hvm driver support upstream yet. When I get around to
it I intend adding support for the pagetable shootdown
paravirtualization too, so that unshadowing shouldn't be a problem.
(Is that in xen-unstable yet?)
J
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-11 20:02 ` Jeremy Fitzhardinge
@ 2008-09-11 20:24 ` John Levon
2008-09-12 10:41 ` Gianluca Guida
1 sibling, 0 replies; 30+ messages in thread
From: John Levon @ 2008-09-11 20:24 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Muli Ben-Yehuda, deshantm, Gianluca Guida, xen-devel mailing list,
George Dunlap, Anthony Liguori
On Thu, Sep 11, 2008 at 01:02:51PM -0700, Jeremy Fitzhardinge wrote:
> > To the best of my knowledge, I am not using PV on HVM drivers since
> > they are not upstream.
> > I am using 2.6.27-rc4 xen domU kernel, with the normal xen PV drivers
> > and KVM virtio built in.
> > Am I mistaken?
> >
>
> No, there's no pv-hvm driver support upstream yet. When I get around to
> it I intend adding support for the pagetable shootdown
> paravirtualization too, so that unshadowing shouldn't be a problem.
Hmm, this came up recently, but I don't remember seeing this. Sounds
interesting, is there somewhere we can read more?
regards
john
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-11 20:02 ` Jeremy Fitzhardinge
2008-09-11 20:24 ` John Levon
@ 2008-09-12 10:41 ` Gianluca Guida
1 sibling, 0 replies; 30+ messages in thread
From: Gianluca Guida @ 2008-09-12 10:41 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Muli Ben-Yehuda, deshantm, George Dunlap, xen-devel mailing list,
Anthony Liguori
Jeremy Fitzhardinge wrote:
> No, there's no pv-hvm driver support upstream yet. When I get around to
> it I intend adding support for the pagetable shootdown
> paravirtualization too, so that unshadowing shouldn't be a problem.
Oh, OK. I didn't know that. The only good news is that this behavior is
not expected by the shadow3 algorithm, since most of the developing time
has been spent in ways to prevent completely brute-force search of
writable mappings. I'll get back with a patch to fix this after I can
reproduce it on my machine.
Thanks,
Gianluca
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-11 18:26 ` Gianluca Guida
2008-09-11 19:04 ` Todd Deshane
@ 2008-09-12 11:04 ` George Dunlap
2008-09-12 11:19 ` George Dunlap
1 sibling, 1 reply; 30+ messages in thread
From: George Dunlap @ 2008-09-12 11:04 UTC (permalink / raw)
To: Gianluca Guida
Cc: Muli Ben-Yehuda, deshantm, Anthony Liguori,
xen-devel mailing list
On Thu, Sep 11, 2008 at 7:26 PM, Gianluca Guida
<gianluca.guida@eu.citrix.com> wrote:
> Or, it could be a fixup table bug, but I doubt it.
>
> George, did you saw excessive fixup faults in the trace?
No, nothing excessive; 273,480 over 30 seconds isn't that bad. The
main thing was that out of 15024 attempts to remove writable mappings,
13775 had to fall back to a brute-force search.
Looking at the trace, I can't really tell why there should be a
problem... I'm seeing tons of circumstances where there should only be
one writable mapping, but it falls back to brute-force search anyway.
Here's an example:
24.999159660 -x vmexit exit_reason EXCEPTION_NMI eip 2b105dcee330
24.999159660 -x wrmap-bf gfn 7453c
24.999159660 -x fixup va 2b105f000000 gl1e 800000005caf0067 flags
(60c)-gp-Pw------
24.999748980 -x vmentry
[...]
24.999759577 -x vmexit exit_reason EXCEPTION_NMI eip ffffffff8022a3b0
24.999759577 -x fixup:unsync va ffff88007453c008 gl1e 7453c067 flags
(c000c)-gp------ua-
24.999762562 -x vmentry
[...]
25.002946338 -x vmexit exit_reason CR_ACCESS eip ffffffff80491a63
25.002946338 -x wrmap-bf gfn 7e18c
25.002946338 -x oos resync full gfn 7e18c
25.002946338 -x wrmap-bf gfn 7453c
25.002946338 -x oos resync full gfn 7453c
25.003526640 -x vmentry
Here we see gfn 7453c:
* promoted to be a shadow (the big 'P' in the flag string); at the
vmentry, there should be no writable mappings.
* marked out-of-sync (one writable mapping, with fixup table)
* re-sync'ed because of a CR write, and a brute-force search.
Note that the times behind the "wrmap-bf" and "oos resync full" are
not valid; but the whole vmexit->vmentry arc takes over 1.5
milliseconds.
-George
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-12 11:04 ` George Dunlap
@ 2008-09-12 11:19 ` George Dunlap
2008-09-12 14:20 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 30+ messages in thread
From: George Dunlap @ 2008-09-12 11:19 UTC (permalink / raw)
To: Gianluca Guida
Cc: Muli Ben-Yehuda, deshantm, Anthony Liguori,
xen-devel mailing list
Ah, that's the problem... Linux seems to have changed the location of
the 1:1 map. Gianluca's using an older kernel, where it's at
0xffff810000000000, but this trace has it at 0xffff880000000000, so
the "guess" heuristic is missing.
Jereme, is this a permanent long-term move, or is it going to be
something random? I.e., should we just add a new heuristic "guess" at
this address, or do we need to do something more complicated?
That will solve brute-force searched for promotions, but the fixup
table for out-of-sync mappings should still be fixed...
-George
On Fri, Sep 12, 2008 at 12:04 PM, George Dunlap <dunlapg@umich.edu> wrote:
> On Thu, Sep 11, 2008 at 7:26 PM, Gianluca Guida
> <gianluca.guida@eu.citrix.com> wrote:
>> Or, it could be a fixup table bug, but I doubt it.
>>
>> George, did you saw excessive fixup faults in the trace?
>
> No, nothing excessive; 273,480 over 30 seconds isn't that bad. The
> main thing was that out of 15024 attempts to remove writable mappings,
> 13775 had to fall back to a brute-force search.
>
> Looking at the trace, I can't really tell why there should be a
> problem... I'm seeing tons of circumstances where there should only be
> one writable mapping, but it falls back to brute-force search anyway.
> Here's an example:
>
> 24.999159660 -x vmexit exit_reason EXCEPTION_NMI eip 2b105dcee330
> 24.999159660 -x wrmap-bf gfn 7453c
> 24.999159660 -x fixup va 2b105f000000 gl1e 800000005caf0067 flags
> (60c)-gp-Pw------
> 24.999748980 -x vmentry
> [...]
> 24.999759577 -x vmexit exit_reason EXCEPTION_NMI eip ffffffff8022a3b0
> 24.999759577 -x fixup:unsync va ffff88007453c008 gl1e 7453c067 flags
> (c000c)-gp------ua-
> 24.999762562 -x vmentry
> [...]
> 25.002946338 -x vmexit exit_reason CR_ACCESS eip ffffffff80491a63
> 25.002946338 -x wrmap-bf gfn 7e18c
> 25.002946338 -x oos resync full gfn 7e18c
> 25.002946338 -x wrmap-bf gfn 7453c
> 25.002946338 -x oos resync full gfn 7453c
> 25.003526640 -x vmentry
>
> Here we see gfn 7453c:
> * promoted to be a shadow (the big 'P' in the flag string); at the
> vmentry, there should be no writable mappings.
> * marked out-of-sync (one writable mapping, with fixup table)
> * re-sync'ed because of a CR write, and a brute-force search.
>
> Note that the times behind the "wrmap-bf" and "oos resync full" are
> not valid; but the whole vmexit->vmentry arc takes over 1.5
> milliseconds.
>
> -George
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Poor performance on HVM (kernbench)
2008-09-12 11:19 ` George Dunlap
@ 2008-09-12 14:20 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2008-09-12 14:20 UTC (permalink / raw)
To: George Dunlap
Cc: Muli Ben-Yehuda, deshantm, Gianluca Guida, xen-devel mailing list,
Anthony Liguori
George Dunlap wrote:
> Ah, that's the problem... Linux seems to have changed the location of
> the 1:1 map. Gianluca's using an older kernel, where it's at
> 0xffff810000000000, but this trace has it at 0xffff880000000000, so
> the "guess" heuristic is missing.
>
> Jereme, is this a permanent long-term move, or is it going to be
> something random? I.e., should we just add a new heuristic "guess" at
> this address, or do we need to do something more complicated?
>
It's a permanent move. I moved it up to 0xffff880000000000 to leave
space for Xen when running as a PV kernel, but there's no reasonable way
to make it variable so the linear map will be there regardless of what
mode the kernel is operating in.
J
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2008-09-12 14:20 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-10 18:23 Poor performance on HVM (kernbench) Todd Deshane
2008-09-10 21:22 ` Todd Deshane
2008-09-10 22:42 ` Anthony Liguori
2008-09-11 2:52 ` Todd Deshane
2008-09-11 9:35 ` George Dunlap
2008-09-11 15:30 ` Todd Deshane
2008-09-11 10:00 ` Gianluca Guida
2008-09-11 15:17 ` Gianluca Guida
2008-09-11 15:25 ` Gianluca Guida
2008-09-11 15:35 ` Todd Deshane
2008-09-11 16:48 ` Gianluca Guida
2008-09-11 17:25 ` Gianluca Guida
2008-09-11 18:07 ` George Dunlap
2008-09-11 18:26 ` Gianluca Guida
2008-09-11 19:04 ` Todd Deshane
2008-09-11 19:54 ` Todd Deshane
2008-09-11 20:02 ` Jeremy Fitzhardinge
2008-09-11 20:24 ` John Levon
2008-09-12 10:41 ` Gianluca Guida
2008-09-12 11:04 ` George Dunlap
2008-09-12 11:19 ` George Dunlap
2008-09-12 14:20 ` Jeremy Fitzhardinge
2008-09-10 21:37 ` Daniel Magenheimer
2008-09-10 21:42 ` Todd Deshane
2008-09-10 21:45 ` Daniel Magenheimer
2008-09-11 2:48 ` Todd Deshane
2008-09-10 21:51 ` Steve Ofsthun
2008-09-11 2:50 ` Todd Deshane
2008-09-11 6:31 ` Pasi Kärkkäinen
2008-09-11 16:02 ` Todd Deshane
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.