* Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
@ 2013-07-09 15:27 Lars Kurth
2013-07-09 15:40 ` Thanos Makatos
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Lars Kurth @ 2013-07-09 15:27 UTC (permalink / raw)
To: xen-devel@lists.xen.org
Not sure whether anyone has seen this:
http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virtualization
Some of the comments are interesting, but not really as negative as they
used to be. In any case, it may make sense to have a quick look
Lars
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
2013-07-09 15:27 Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell Lars Kurth
@ 2013-07-09 15:40 ` Thanos Makatos
2013-07-09 15:53 ` Ian Murray
2013-07-09 15:54 ` Gordan Bobic
2013-07-09 16:52 ` Alex Bligh
2 siblings, 1 reply; 13+ messages in thread
From: Thanos Makatos @ 2013-07-09 15:40 UTC (permalink / raw)
To: lars.kurth@xen.org, xen-devel@lists.xen.org
> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Lars Kurth
> Sent: 09 July 2013 16:28
> To: xen-devel@lists.xen.org
> Subject: [Xen-devel] Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
>
> Not sure whether anyone has seen this:
> http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virtua
> lization
>
> Some of the comments are interesting, but not really as negative as
> they used to be. In any case, it may make sense to have a quick look
>
> Lars
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
They use PostMark for their disk I/O tests, which is an ancient benchmark.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
2013-07-09 15:40 ` Thanos Makatos
@ 2013-07-09 15:53 ` Ian Murray
2013-07-09 15:56 ` Thanos Makatos
0 siblings, 1 reply; 13+ messages in thread
From: Ian Murray @ 2013-07-09 15:53 UTC (permalink / raw)
To: Thanos Makatos, lars.kurth@xen.org, xen-devel@lists.xen.org
----- Original Message -----
> From: Thanos Makatos <thanos.makatos@citrix.com>
> To: "lars.kurth@xen.org" <lars.kurth@xen.org>; "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
> Cc:
> Sent: Tuesday, 9 July 2013, 16:40
> Subject: Re: [Xen-devel] Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
>
>> -----Original Message-----
>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
>> bounces@lists.xen.org] On Behalf Of Lars Kurth
>> Sent: 09 July 2013 16:28
>> To: xen-devel@lists.xen.org
>> Subject: [Xen-devel] Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
>>
>> Not sure whether anyone has seen this:
>> http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virtua
>> lization
>>
>> Some of the comments are interesting, but not really as negative as
>> they used to be. In any case, it may make sense to have a quick look
>>
>> Lars
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
> They use PostMark for their disk I/O tests, which is an ancient benchmark.
is that a good or a bad thing? If so, why?
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
2013-07-09 15:27 Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell Lars Kurth
2013-07-09 15:40 ` Thanos Makatos
@ 2013-07-09 15:54 ` Gordan Bobic
2013-07-11 10:53 ` Dario Faggioli
2013-07-09 16:52 ` Alex Bligh
2 siblings, 1 reply; 13+ messages in thread
From: Gordan Bobic @ 2013-07-09 15:54 UTC (permalink / raw)
To: lars.kurth; +Cc: xen-devel
On Tue, 09 Jul 2013 16:27:31 +0100, Lars Kurth <lars.kurth@xen.org>
wrote:
> Not sure whether anyone has seen this:
>
> http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virtualization
>
> Some of the comments are interesting, but not really as negative as
> they used to be. In any case, it may make sense to have a quick look
Relative figures at least in terms of ordering are similar to what I
found last time I did a similar test:
http://www.altechnative.net/2012/08/04/virtual-performance-part-1-vmware/
My test was harsher, though, because it exposed more of the context
switching and inter-core (and worse, inter-die since I tested on a
C2Q) migration overheads.
The process migration overheads are _expensive_ - I found that on bare
metal pining CPU/RAM intensive processes to cores made a ~20%
difference to overall throughput on a C2Q class CPU (no shared caches
between the two dies made it worse). I expect 4.3.x will be a
substantial improvement with NUMA awareness improvements to the
scheduler (looking forward to trying it this weekend).
Shame phoronix didn't test PV performance, in my tests that made
a huge difference and put Xen firmly ahead of the competition.
Gordan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
2013-07-09 15:53 ` Ian Murray
@ 2013-07-09 15:56 ` Thanos Makatos
2013-07-09 16:14 ` Gordan Bobic
0 siblings, 1 reply; 13+ messages in thread
From: Thanos Makatos @ 2013-07-09 15:56 UTC (permalink / raw)
To: Ian Murray, lars.kurth@xen.org, xen-devel@lists.xen.org
> >> Not sure whether anyone has seen this:
> >>
> >>
> http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virt
> >> ua
> >> lization
> >>
> >> Some of the comments are interesting, but not really as negative as
> >> they used to be. In any case, it may make sense to have a quick look
> >>
> >> Lars
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xen.org
> >> http://lists.xen.org/xen-devel
> >
> > They use PostMark for their disk I/O tests, which is an ancient
> benchmark.
>
> is that a good or a bad thing? If so, why?
IMO it's a bad thing because it's far from a representative benchmark, which can lead to wrong conclusions when evaluation I/O performance.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
2013-07-09 15:56 ` Thanos Makatos
@ 2013-07-09 16:14 ` Gordan Bobic
2013-07-09 16:21 ` Thanos Makatos
0 siblings, 1 reply; 13+ messages in thread
From: Gordan Bobic @ 2013-07-09 16:14 UTC (permalink / raw)
To: Thanos Makatos; +Cc: Ian Murray, lars.kurth@xen.org, xen-devel
On Tue, 9 Jul 2013 15:56:51 +0000, Thanos Makatos
<thanos.makatos@citrix.com> wrote:
>> >> Not sure whether anyone has seen this:
>> >>
>> >>
>>
>> http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virt
>> >> ua
>> >> lization
>> >>
>> >> Some of the comments are interesting, but not really as negative
>> as
>> >> they used to be. In any case, it may make sense to have a quick
>> look
>> >>
>> >> Lars
>> >>
>> > They use PostMark for their disk I/O tests, which is an ancient
>> benchmark.
>>
>> is that a good or a bad thing? If so, why?
>
> IMO it's a bad thing because it's far from a representative
> benchmark, which can lead to wrong conclusions when evaluation I/O
> performance.
Ancient doesn't mean non-representative. A good file-system benchmark
is a tricky one to come up with because most FS-es are good at some
things and bad at others. If you really want to test the virtualization
overhead on FS I/O, the only sane way to test it is by putting the
FS on the host's RAM disk and testing from there. That should
expose the full extent of the overhead, subject to the same
caveat about different FS-es being better at different load types.
Personally I'm in favour of redneck-benchmarks that easily push
the whole stack to saturation point (e.g. highly parallel kernel
compile) since those cannot be cheated. But generically speaking,
the only way to get a worthwhile measure is to create a custom
benchmark that tests your specific application to saturation
point. Any generic/synthetic benchmark will provide results
that are almost certainly going to be misleading for any
specific real-world load you are planning to run on your
system.
For example, on a read-only MySQL load (read-only
because it simplified testing, no need to rebuild huge data
sets between runs, just drop all the caches), in custom application
performance test that I carried out for a client, ESX showed
a ~40% throughput degradation over bare metal (8 cores/server, 16
SQL threads cat-ing select-filtered general-log extracts, load
generator running in same VM). And the test machines (both
physical and virtual had enough RAM in them that they were both
only disk I/O bound for the first 2-3 minutes of the test (which
took the best part of an hour to complete); which goes to show
that disk I/O bottlenecks are good at covering up overheads
elsewhere.
Gordan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
2013-07-09 16:14 ` Gordan Bobic
@ 2013-07-09 16:21 ` Thanos Makatos
2013-07-09 16:26 ` Gordan Bobic
0 siblings, 1 reply; 13+ messages in thread
From: Thanos Makatos @ 2013-07-09 16:21 UTC (permalink / raw)
To: Gordan Bobic; +Cc: Ian Murray, lars.kurth@xen.org, xen-devel@lists.xen.org
> > IMO it's a bad thing because it's far from a representative
> benchmark,
> > which can lead to wrong conclusions when evaluation I/O performance.
>
> Ancient doesn't mean non-representative. A good file-system benchmark
In this particular case it is: PostMark is a single-threaded application that performs read and write operations on a fixed set of files, at an unrealistically low directory depth; modern I/O workloads exhibit much more complicated behaviour than this.
> is a tricky one to come up with because most FS-es are good at some
> things and bad at others. If you really want to test the virtualization
> overhead on FS I/O, the only sane way to test it is by putting the FS
> on the host's RAM disk and testing from there. That should expose the
> full extent of the overhead, subject to the same caveat about
> different FS-es being better at different load types.
>
> Personally I'm in favour of redneck-benchmarks that easily push the
> whole stack to saturation point (e.g. highly parallel kernel
> compile) since those cannot be cheated. But generically speaking, the
> only way to get a worthwhile measure is to create a custom benchmark
> that tests your specific application to saturation point. Any
> generic/synthetic benchmark will provide results that are almost
> certainly going to be misleading for any specific real-world load you
> are planning to run on your system.
>
> For example, on a read-only MySQL load (read-only because it
> simplified testing, no need to rebuild huge data sets between runs,
> just drop all the caches), in custom application performance test that
> I carried out for a client, ESX showed a ~40% throughput degradation
> over bare metal (8 cores/server, 16 SQL threads cat-ing select-
> filtered general-log extracts, load generator running in same VM). And
> the test machines (both physical and virtual had enough RAM in them
> that they were both only disk I/O bound for the first 2-3 minutes of
> the test (which took the best part of an hour to complete); which goes
> to show that disk I/O bottlenecks are good at covering up overheads
> elsewhere.
>
> Gordan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
2013-07-09 16:21 ` Thanos Makatos
@ 2013-07-09 16:26 ` Gordan Bobic
0 siblings, 0 replies; 13+ messages in thread
From: Gordan Bobic @ 2013-07-09 16:26 UTC (permalink / raw)
To: Thanos Makatos; +Cc: Ian Murray, lars.kurth@xen.org, xen-devel
On Tue, 9 Jul 2013 16:21:52 +0000, Thanos Makatos
<thanos.makatos@citrix.com> wrote:
>> > IMO it's a bad thing because it's far from a representative
>> benchmark,
>> > which can lead to wrong conclusions when evaluation I/O
>> performance.
>>
>> Ancient doesn't mean non-representative. A good file-system
>> benchmark
>
> In this particular case it is: PostMark is a single-threaded
> application that performs read and write operations on a fixed set of
> files, at an unrealistically low directory depth; modern I/O
> workloads
> exhibit much more complicated behaviour than this.
Unless you are running a mail server. Granted, running multiple
postmarks in parallel might be a better test on today's many-core
servers, but it'd likely make no little or no difference on a
disk I/O bound test.
Gordan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
2013-07-09 15:27 Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell Lars Kurth
2013-07-09 15:40 ` Thanos Makatos
2013-07-09 15:54 ` Gordan Bobic
@ 2013-07-09 16:52 ` Alex Bligh
2 siblings, 0 replies; 13+ messages in thread
From: Alex Bligh @ 2013-07-09 16:52 UTC (permalink / raw)
To: lars.kurth, xen-devel; +Cc: Alex Bligh
--On 9 July 2013 16:27:31 +0100 Lars Kurth <lars.kurth@xen.org> wrote:
> Not sure whether anyone has seen this:
> http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virtuali
> zation
>
> Some of the comments are interesting, but not really as negative as they
> used to be. In any case, it may make sense to have a quick look
Last time I looked at the Phoronix benchmarks, they were using the default
disk caching with Xen and Qemu, and these were not identical. From memory
KVM was using writethrough and Xen was using no caching.
This one says "Xen and KVM virtualization were setup through virt-manager".
I don't know whether that evens things out, as I don't use it.
--
Alex Bligh
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
2013-07-09 15:54 ` Gordan Bobic
@ 2013-07-11 10:53 ` Dario Faggioli
2013-07-11 16:23 ` George Dunlap
0 siblings, 1 reply; 13+ messages in thread
From: Dario Faggioli @ 2013-07-11 10:53 UTC (permalink / raw)
To: Gordan Bobic; +Cc: lars.kurth, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1258 bytes --]
On mar, 2013-07-09 at 16:54 +0100, Gordan Bobic wrote:
> The process migration overheads are _expensive_
>
Indeed!
> - I found that on bare
> metal pining CPU/RAM intensive processes to cores made a ~20%
> difference to overall throughput on a C2Q class CPU (no shared caches
> between the two dies made it worse). I expect 4.3.x will be a
> substantial improvement with NUMA awareness improvements to the
> scheduler (looking forward to trying it this weekend).
>
Well, yes, something good could be expected, although the actual
improvement will depend on the number of involved VMs, their sizes, the
workload they're running, etc.
When I tried to use kernel compile as a benchmark for the NUMA effects,
it did not turn out that useful to me (and that's why I switched to
SpecJBB), but perhaps it was me that was doing something wrong...
Anyway, if you do anything like this, please, do let us know here (and,
please, Cc me :-P).
Thanks and Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
2013-07-11 10:53 ` Dario Faggioli
@ 2013-07-11 16:23 ` George Dunlap
2013-07-11 16:27 ` Dario Faggioli
0 siblings, 1 reply; 13+ messages in thread
From: George Dunlap @ 2013-07-11 16:23 UTC (permalink / raw)
To: Dario Faggioli; +Cc: Gordan Bobic, Lars Kurth, xen-devel@lists.xen.org
On Thu, Jul 11, 2013 at 11:53 AM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On mar, 2013-07-09 at 16:54 +0100, Gordan Bobic wrote:
>> The process migration overheads are _expensive_
>>
> Indeed!
>
>> - I found that on bare
>> metal pining CPU/RAM intensive processes to cores made a ~20%
>> difference to overall throughput on a C2Q class CPU (no shared caches
>> between the two dies made it worse). I expect 4.3.x will be a
>> substantial improvement with NUMA awareness improvements to the
>> scheduler (looking forward to trying it this weekend).
>>
> Well, yes, something good could be expected, although the actual
> improvement will depend on the number of involved VMs, their sizes, the
> workload they're running, etc.
>
> When I tried to use kernel compile as a benchmark for the NUMA effects,
> it did not turn out that useful to me (and that's why I switched to
> SpecJBB), but perhaps it was me that was doing something wrong...
In my experience, kernel-build has excellent memory locality. One
effect is that the effect of nested paging on TLB time is almostt nil;
I'm not surprised that the caches make the effect of NUMA almost nil
as well.
-George
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
2013-07-11 16:23 ` George Dunlap
@ 2013-07-11 16:27 ` Dario Faggioli
2013-07-11 17:49 ` Gordan Bobic
0 siblings, 1 reply; 13+ messages in thread
From: Dario Faggioli @ 2013-07-11 16:27 UTC (permalink / raw)
To: George Dunlap; +Cc: Gordan Bobic, Lars Kurth, xen-devel@lists.xen.org
[-- Attachment #1.1: Type: text/plain, Size: 1022 bytes --]
On gio, 2013-07-11 at 17:23 +0100, George Dunlap wrote:
> On Thu, Jul 11, 2013 at 11:53 AM, Dario Faggioli
> > When I tried to use kernel compile as a benchmark for the NUMA effects,
> > it did not turn out that useful to me (and that's why I switched to
> > SpecJBB), but perhaps it was me that was doing something wrong...
>
> In my experience, kernel-build has excellent memory locality. One
> effect is that the effect of nested paging on TLB time is almostt nil;
> I'm not surprised that the caches make the effect of NUMA almost nil
> as well.
>
Not to mention I/O, unless you setup a ramfs backed building
environment. Again, when I tried, that was my intention, but perhaps I
failed right at that... Gordan, what about you?
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
2013-07-11 16:27 ` Dario Faggioli
@ 2013-07-11 17:49 ` Gordan Bobic
0 siblings, 0 replies; 13+ messages in thread
From: Gordan Bobic @ 2013-07-11 17:49 UTC (permalink / raw)
To: Dario Faggioli; +Cc: George Dunlap, Lars Kurth, xen-devel@lists.xen.org
On 07/11/2013 05:27 PM, Dario Faggioli wrote:
> On gio, 2013-07-11 at 17:23 +0100, George Dunlap wrote:
>> On Thu, Jul 11, 2013 at 11:53 AM, Dario Faggioli
>>> When I tried to use kernel compile as a benchmark for the NUMA effects,
>>> it did not turn out that useful to me (and that's why I switched to
>>> SpecJBB), but perhaps it was me that was doing something wrong...
>>
>> In my experience, kernel-build has excellent memory locality. One
>> effect is that the effect of nested paging on TLB time is almostt nil;
>> I'm not surprised that the caches make the effect of NUMA almost nil
>> as well.
>>
> Not to mention I/O, unless you setup a ramfs backed building
> environment. Again, when I tried, that was my intention, but perhaps I
> failed right at that... Gordan, what about you?
IIRC in my tests the disk I/O was relatively minimal. If you read the
details here:
http://www.altechnative.net/2012/08/04/virtual-performance-part-1-vmware/
you may notice that I actually primed the test by catting everything to
/dev/null, so all the reads should have been coming from the page cache.
I didn't have enough RAM in the machine (only 8GB) to fit all the
produced binaries in tmpfs at the time.
I don't think this had a large impact, though - the iowait time was
about 0% all the time because there were plenty of threads that had
productive compiling work to do while some were waiting to commit to
disk. Since this was on a C2Q, there was no NUMA in play, so if I had to
guess at the major cause of performance degradation, it would be related
to context switching; having said that, I didn't get around to doing any
in-depth profiling to be able to tell for sure. (Speaking of which, how
would one go about profiling things at bare-metal hypervisor level?
I will re-run the test on a new machine at some point and see how it
compares, and this time I will have enough RAM for the whole lot to fit.
Gordan
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-07-11 17:49 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-09 15:27 Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell Lars Kurth
2013-07-09 15:40 ` Thanos Makatos
2013-07-09 15:53 ` Ian Murray
2013-07-09 15:56 ` Thanos Makatos
2013-07-09 16:14 ` Gordan Bobic
2013-07-09 16:21 ` Thanos Makatos
2013-07-09 16:26 ` Gordan Bobic
2013-07-09 15:54 ` Gordan Bobic
2013-07-11 10:53 ` Dario Faggioli
2013-07-11 16:23 ` George Dunlap
2013-07-11 16:27 ` Dario Faggioli
2013-07-11 17:49 ` Gordan Bobic
2013-07-09 16:52 ` Alex Bligh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).