* Why do additional cores reduce performance?
@ 2014-12-15 23:33 Oleg Ovechko
0 siblings, 0 replies; 5+ messages in thread
From: Oleg Ovechko @ 2014-12-15 23:33 UTC (permalink / raw)
To: qemu-discuss-qX2TKyscuCcdnm+yROfE0A, kvm-u79uwXL29TY76Z2rM5mHXA
[-- Attachment #1: Type: text/plain, Size: 1692 bytes --]
Hi,
I am novice to kvm/qemu, perhaps you can explain something to me.
I pass-through sata controller with fakeraid. It works, but I am getting
strange results during performance testing:
A. Host Windows, 6 cores (no HT, turbo boost off): 6:23 (+- 10 secs)
B. Host Windows, 1 CPU core (other are turned off in BIOS): 7:13 (+-10 secs)
C. Host 1 core, Guest Windows 1 core: 7:15 - same as B, no degradation
D. Host 6 cores, Guest Windows 1 core: 7:57
E. Host 6 cores, Guest Windows 4 cores: 8:17
Perhaps I am doing something very wrong, D->E looks very surprising to me,
and C->D is just unbelievable. I am adding cores and it works worth. How
can it be?
1. What is correct way of passing CPU cores to VM?
2. Can I assign CPU core solely to VM to avoid cache resets and context
switching to Host?
3. Also I am unsure about HT. When I specify "cores=2", is there any
guaranty that whole core with both HT parts is passed to VM? Or it can be
mix of two real cores with separate caches?
4. Any other comment on qemu command line?
I am using Ubuntu 14.10, Qemu 2.1.95, Windows 2012R2
uname -a Linux meta 3.16.0-28-generic #37-Ubuntu SMP Mon Dec 8 17:15:28 UTC
2014 x86_64 x86_64 x86_64 GNU/Linux
Hardware is: i7 5930K, AsRock X99 Professional
Here is my qemu command line:
qemu-system-x86_64 -machine type=pc -nodefaults -m 16G -enable-kvm \
-cpu host -smp 4,sockets=1,cores=4,threads=1 \
-rtc base=localtime,clock=host \
-bios /home/oleg/seabios175.bin \
-hda /home/oleg/windows.img \
-device vfio-pci,host=00:1f.0,addr=1f.0,multifunction=on \
-device
vfio-pci,host=00:1f.2,addr=1f.2,romfile=/home/oleg/8086-2822_v13.2.0.2134_patched.bin
\
-device vfio-pci,host=00:1f.3,addr=1f.3 \
-vga std
[-- Attachment #2: Type: text/html, Size: 2022 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Why do additional cores reduce performance?
@ 2014-12-15 23:40 Oleg Ovechko
2014-12-16 9:26 ` Paolo Bonzini
0 siblings, 1 reply; 5+ messages in thread
From: Oleg Ovechko @ 2014-12-15 23:40 UTC (permalink / raw)
To: qemu-discuss, kvm
Hi,
I am novice to kvm/qemu, perhaps you can explain something to me.
I pass-through sata controller with fakeraid. It works, but I am getting
strange results during performance testing:
A. Host Windows, 6 cores (no HT, turbo boost off): 6:23 (+- 10 secs)
B. Host Windows, 1 CPU core (other are turned off in BIOS): 7:13 (+-10 secs)
C. Host 1 core, Guest Windows 1 core: 7:15 - same as B, no degradation
D. Host 6 cores, Guest Windows 1 core: 7:57
E. Host 6 cores, Guest Windows 4 cores: 8:17
Perhaps I am doing something very wrong, D->E looks very surprising to me,
and C->D is just unbelievable. I am adding cores and it works worth. How
can it be?
1. What is correct way of passing CPU cores to VM?
2. Can I assign CPU core solely to VM to avoid cache resets and context
switching to Host?
3. Also I am unsure about HT. When I specify "cores=2", is there any
guaranty that whole core with both HT parts is passed to VM? Or it can be
mix of two real cores with separate caches?
4. Any other comment on qemu command line?
I am using Ubuntu 14.10, Qemu 2.1.95, Windows 2012R2
uname -a Linux meta 3.16.0-28-generic #37-Ubuntu SMP Mon Dec 8 17:15:28 UTC
2014 x86_64 x86_64 x86_64 GNU/Linux
Hardware is: i7 5930K, AsRock X99 Professional
Here is my qemu command line:
qemu-system-x86_64 -machine type=pc -nodefaults -m 16G -enable-kvm \
-cpu host -smp 4,sockets=1,cores=4,threads=1 \
-rtc base=localtime,clock=host \
-bios /home/oleg/seabios175.bin \
-hda /home/oleg/windows.img \
-device vfio-pci,host=00:1f.0,addr=1f.
0,multifunction=on \
-device
vfio-pci,host=00:1f.2,addr=1f.2,romfile=/home/oleg/8086-2822_v13.2.0.2134_patched.bin
\
-device vfio-pci,host=00:1f.3,addr=1f.3 \
-vga std
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Why do additional cores reduce performance?
2014-12-15 23:40 Why do additional cores reduce performance? Oleg Ovechko
@ 2014-12-16 9:26 ` Paolo Bonzini
2014-12-16 16:22 ` Oleg Ovechko
0 siblings, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2014-12-16 9:26 UTC (permalink / raw)
To: Oleg Ovechko, qemu-discuss, kvm
On 16/12/2014 00:40, Oleg Ovechko wrote:
> A. Host Windows, 6 cores (no HT, turbo boost off): 6:23 (+- 10 secs)
> B. Host Windows, 1 CPU core (other are turned off in BIOS): 7:13 (+-10 secs)
> C. Host 1 core, Guest Windows 1 core: 7:15 - same as B, no degradation
> D. Host 6 cores, Guest Windows 1 core: 7:57
> E. Host 6 cores, Guest Windows 4 cores: 8:17
What is your benchmark?
Windows sometimes has scalability problems due to the way it does
timing. Try replacing "-cpu host" with "-no-hpet -cpu
host,hv_time,hv_vapic".
> 3. Also I am unsure about HT. When I specify "cores=2",
I suppose you mean "threads=2".
> is there any
> guaranty that whole core with both HT parts is passed to VM? Or it can be
> mix of two real cores with separate caches?
It will be a mix. Do not specify HT in the guest, unless you have HT in
the host _and_ you are pinning the two threads of each guest core to the
two threads of a host core.
Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Why do additional cores reduce performance?
2014-12-16 9:26 ` Paolo Bonzini
@ 2014-12-16 16:22 ` Oleg Ovechko
2014-12-16 16:31 ` Paolo Bonzini
0 siblings, 1 reply; 5+ messages in thread
From: Oleg Ovechko @ 2014-12-16 16:22 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-discuss, kvm
> What is your benchmark?
I've tried different ways (CrystalDiskMark 3.0.3 x64, ATTO Disk
Banchmark v2.47) all give same result.
The numbers I've provided in 1st mail are for 100G file copied over. I
simply subtract stop and start times. 50 seconds is so huge difference
(three sigma rule gives 10 secs for 10 tries), I can even use wall
clocks.
When everything is enabled in BIOS it is 6:23 on real Windows versus
9:03 on virtualized...
Phil Ehrens has sent me link
https://lists.gnu.org/archive/html/qemu-discuss/2014-10/msg00036.html
If I don't misunderstand, it means kvm/qemu simply is not designed for
multi-threading.
I guess I need to try different hypervisor. 50% performance is too
high price especially when VT-x and VT-d are meant to make it 0%
> Windows sometimes has scalability problems due to the way it does
> timing. Try replacing "-cpu host" with "-no-hpet -cpu
> host,hv_time,hv_vapic".
Does not change results.
> It will be a mix. Do not specify HT in the guest, unless you have HT in
> the host _and_ you are pinning the two threads of each guest core to the
> two threads of a host core.
Do you mean "-smp 4,sockets=1,cores=2,threads=2" for 2 cores with HT
enabled? Gives even worth result - 9:17
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Why do additional cores reduce performance?
2014-12-16 16:22 ` Oleg Ovechko
@ 2014-12-16 16:31 ` Paolo Bonzini
0 siblings, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2014-12-16 16:31 UTC (permalink / raw)
To: Oleg Ovechko; +Cc: qemu-discuss, kvm
On 16/12/2014 17:22, Oleg Ovechko wrote:
>> What is your benchmark?
>
> I've tried different ways (CrystalDiskMark 3.0.3 x64, ATTO Disk
> Banchmark v2.47) all give same result.
All are run on the AHCI passthrough disk(s), right?
> When everything is enabled in BIOS it is 6:23 on real Windows versus
> 9:03 on virtualized...
>
> Phil Ehrens has sent me link
> https://lists.gnu.org/archive/html/qemu-discuss/2014-10/msg00036.html
> If I don't misunderstand, it means kvm/qemu simply is not designed for
> multi-threading.
No, it means TCG does not support multithreading. KVM does, and you are
using it.
> I guess I need to try different hypervisor. 50% performance is too
> high price especially when VT-x and VT-d are meant to make it 0%
It is surprising to me too.
Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-12-16 16:31 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-15 23:40 Why do additional cores reduce performance? Oleg Ovechko
2014-12-16 9:26 ` Paolo Bonzini
2014-12-16 16:22 ` Oleg Ovechko
2014-12-16 16:31 ` Paolo Bonzini
-- strict thread matches above, loose matches on Subject: below --
2014-12-15 23:33 Oleg Ovechko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox