qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] QEMU throughput is down with SMP
@ 2010-09-30  0:50 Venkateswararao Jujjuri (JV)
  2010-09-30  9:13 ` Stefan Hajnoczi
  0 siblings, 1 reply; 7+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2010-09-30  0:50 UTC (permalink / raw)
  To: Qemu-development List

Code: Mainline QEMU (git://git.qemu.org/qemu.git)
Machine: LS21 blade.
Disk: Local disk through VirtIO.
Did not select any cache option. Defaulting to writethrough.

Command tested:
3 parallel instances of : dd if=/dev/zero of=/pmnt/my_pw bs=4k count=100000

QEMU with  smp=1
19.3 MB/s + 19.2 MB/s + 18.6 MB/s = 57.1 MB/s

QEMU with smp=4
15.3 MB/s + 14.1 MB/s + 13.6 MB/s = 43.0 MB/s

Is this expected?


Thanks,
JV

=== Details ===
smp = 1

time dd if=/dev/zero of=/pmnt/my_pw bs=4k count=100000 & time dd if=/dev/zero 
of=/pmnt/my_pw1 bs=4k count=100000 & time dd if=/dev/zero of=/pmnt/my_pw2 bs=4k 
count=100000 &


[root@localhost ~]# 100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 21.2747 s, 19.3 MB/s

real	0m21.377s
user	0m0.040s
sys	0m1.655s
100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 21.2799 s, 19.2 MB/s

real	0m21.374s
user	0m0.046s
sys	0m1.660s
100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 22.0735 s, 18.6 MB/s

real	0m22.153s
user	0m0.043s
sys	0m1.642s


smp = 4
[root@localhost ~]# time dd if=/dev/zero of=/pmnt/my_pw bs=4k count=100000 & 
time dd if=/dev/zero of=/pmnt/my_pw1 bs=4k count=100000 & time dd if=/dev/zero 
of=/pmnt/my_pw2 bs=4k count=100000 &
[root@localhost ~]# 100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 26.8055 s, 15.3 MB/s

real	0m26.869s
user	0m0.079s
sys	0m3.333s
100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 28.9583 s, 14.1 MB/s

real	0m29.018s
user	0m0.053s
sys	0m4.313s
100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 30.1739 s, 13.6 MB/s

real	0m30.238s
user	0m0.065s
sys	0m4.124s

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU throughput is down with SMP
  2010-09-30  0:50 [Qemu-devel] QEMU throughput is down with SMP Venkateswararao Jujjuri (JV)
@ 2010-09-30  9:13 ` Stefan Hajnoczi
  2010-09-30 19:19   ` Venkateswararao Jujjuri (JV)
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2010-09-30  9:13 UTC (permalink / raw)
  To: Venkateswararao Jujjuri (JV); +Cc: Qemu-development List

On Thu, Sep 30, 2010 at 1:50 AM, Venkateswararao Jujjuri (JV)
<jvrao@linux.vnet.ibm.com> wrote:
> Code: Mainline QEMU (git://git.qemu.org/qemu.git)
> Machine: LS21 blade.
> Disk: Local disk through VirtIO.
> Did not select any cache option. Defaulting to writethrough.
>
> Command tested:
> 3 parallel instances of : dd if=/dev/zero of=/pmnt/my_pw bs=4k count=100000
>
> QEMU with  smp=1
> 19.3 MB/s + 19.2 MB/s + 18.6 MB/s = 57.1 MB/s
>
> QEMU with smp=4
> 15.3 MB/s + 14.1 MB/s + 13.6 MB/s = 43.0 MB/s
>
> Is this expected?

Did you configure with --enable-io-thread?

Also, try using dd oflag=direct to eliminate effects introduced by the
guest page cache and really hit the disk.

Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU throughput is down with SMP
  2010-09-30  9:13 ` Stefan Hajnoczi
@ 2010-09-30 19:19   ` Venkateswararao Jujjuri (JV)
  2010-10-01  8:41     ` Stefan Hajnoczi
  0 siblings, 1 reply; 7+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2010-09-30 19:19 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Qemu-development List

On 9/30/2010 2:13 AM, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2010 at 1:50 AM, Venkateswararao Jujjuri (JV)
> <jvrao@linux.vnet.ibm.com>  wrote:
>> Code: Mainline QEMU (git://git.qemu.org/qemu.git)
>> Machine: LS21 blade.
>> Disk: Local disk through VirtIO.
>> Did not select any cache option. Defaulting to writethrough.
>>
>> Command tested:
>> 3 parallel instances of : dd if=/dev/zero of=/pmnt/my_pw bs=4k count=100000
>>
>> QEMU with  smp=1
>> 19.3 MB/s + 19.2 MB/s + 18.6 MB/s = 57.1 MB/s
>>
>> QEMU with smp=4
>> 15.3 MB/s + 14.1 MB/s + 13.6 MB/s = 43.0 MB/s
>>
>> Is this expected?
>
> Did you configure with --enable-io-thread?

Yes I did.
>
> Also, try using dd oflag=direct to eliminate effects introduced by the
> guest page cache and really hit the disk.

With oflag=direct , I see no difference and the throughput is so slow and I 
would not
expect to see any difference.
It is 225 kb/s  for each thread either with smp=1 or with smp=4.

- JV

>
> Stefan
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU throughput is down with SMP
  2010-09-30 19:19   ` Venkateswararao Jujjuri (JV)
@ 2010-10-01  8:41     ` Stefan Hajnoczi
  2010-10-01 13:38       ` Ryan Harper
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2010-10-01  8:41 UTC (permalink / raw)
  To: Venkateswararao Jujjuri (JV); +Cc: Qemu-development List

On Thu, Sep 30, 2010 at 8:19 PM, Venkateswararao Jujjuri (JV)
<jvrao@linux.vnet.ibm.com> wrote:
> On 9/30/2010 2:13 AM, Stefan Hajnoczi wrote:
>>
>> On Thu, Sep 30, 2010 at 1:50 AM, Venkateswararao Jujjuri (JV)
>> <jvrao@linux.vnet.ibm.com>  wrote:
>>>
>>> Code: Mainline QEMU (git://git.qemu.org/qemu.git)
>>> Machine: LS21 blade.
>>> Disk: Local disk through VirtIO.
>>> Did not select any cache option. Defaulting to writethrough.
>>>
>>> Command tested:
>>> 3 parallel instances of : dd if=/dev/zero of=/pmnt/my_pw bs=4k
>>> count=100000
>>>
>>> QEMU with  smp=1
>>> 19.3 MB/s + 19.2 MB/s + 18.6 MB/s = 57.1 MB/s
>>>
>>> QEMU with smp=4
>>> 15.3 MB/s + 14.1 MB/s + 13.6 MB/s = 43.0 MB/s
>>>
>>> Is this expected?
>>
>> Did you configure with --enable-io-thread?
>
> Yes I did.
>>
>> Also, try using dd oflag=direct to eliminate effects introduced by the
>> guest page cache and really hit the disk.
>
> With oflag=direct , I see no difference and the throughput is so slow and I
> would not
> expect to see any difference.
> It is 225 kb/s  for each thread either with smp=1 or with smp=4.

If I understand correctly you are getting:

QEMU oflag=direct with smp=1
225 KB/s + 225 KB/s + 225 KB/s = 675 KB/s

QEMU oflag=direct with smp=4
225 KB/s + 225 KB/s + 225 KB/s = 675 KB/s

This suggests the degradation for smp=4 is guest kernel page cache or
buffered I/O related.  Perhaps lockholder preemption?

Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU throughput is down with SMP
  2010-10-01  8:41     ` Stefan Hajnoczi
@ 2010-10-01 13:38       ` Ryan Harper
  2010-10-01 15:04         ` Venkateswararao Jujjuri (JV)
  0 siblings, 1 reply; 7+ messages in thread
From: Ryan Harper @ 2010-10-01 13:38 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Venkateswararao Jujjuri (JV), Qemu-development List

* Stefan Hajnoczi <stefanha@gmail.com> [2010-10-01 03:48]:
> On Thu, Sep 30, 2010 at 8:19 PM, Venkateswararao Jujjuri (JV)
> <jvrao@linux.vnet.ibm.com> wrote:
> > On 9/30/2010 2:13 AM, Stefan Hajnoczi wrote:
> >>
> >> On Thu, Sep 30, 2010 at 1:50 AM, Venkateswararao Jujjuri (JV)
> >> <jvrao@linux.vnet.ibm.com>  wrote:
> >>>
> >>> Code: Mainline QEMU (git://git.qemu.org/qemu.git)
> >>> Machine: LS21 blade.
> >>> Disk: Local disk through VirtIO.
> >>> Did not select any cache option. Defaulting to writethrough.
> >>>
> >>> Command tested:
> >>> 3 parallel instances of : dd if=/dev/zero of=/pmnt/my_pw bs=4k
> >>> count=100000
> >>>
> >>> QEMU with  smp=1
> >>> 19.3 MB/s + 19.2 MB/s + 18.6 MB/s = 57.1 MB/s
> >>>
> >>> QEMU with smp=4
> >>> 15.3 MB/s + 14.1 MB/s + 13.6 MB/s = 43.0 MB/s
> >>>
> >>> Is this expected?
> >>
> >> Did you configure with --enable-io-thread?
> >
> > Yes I did.
> >>
> >> Also, try using dd oflag=direct to eliminate effects introduced by the
> >> guest page cache and really hit the disk.
> >
> > With oflag=direct , I see no difference and the throughput is so slow and I
> > would not
> > expect to see any difference.
> > It is 225 kb/s  for each thread either with smp=1 or with smp=4.
> 
> If I understand correctly you are getting:
> 
> QEMU oflag=direct with smp=1
> 225 KB/s + 225 KB/s + 225 KB/s = 675 KB/s
> 
> QEMU oflag=direct with smp=4
> 225 KB/s + 225 KB/s + 225 KB/s = 675 KB/s
> 
> This suggests the degradation for smp=4 is guest kernel page cache or
> buffered I/O related.  Perhaps lockholder preemption?

or just a single spindle maxed out because the blade hard drive doesn't
have writecache enabled (it's disabled by default).  

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU throughput is down with SMP
  2010-10-01 13:38       ` Ryan Harper
@ 2010-10-01 15:04         ` Venkateswararao Jujjuri (JV)
  2010-10-01 15:09           ` Stefan Hajnoczi
  0 siblings, 1 reply; 7+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2010-10-01 15:04 UTC (permalink / raw)
  To: Ryan Harper; +Cc: Stefan Hajnoczi, Qemu-development List

On 10/1/2010 6:38 AM, Ryan Harper wrote:
> * Stefan Hajnoczi<stefanha@gmail.com>  [2010-10-01 03:48]:
>> On Thu, Sep 30, 2010 at 8:19 PM, Venkateswararao Jujjuri (JV)
>> <jvrao@linux.vnet.ibm.com>  wrote:
>>> On 9/30/2010 2:13 AM, Stefan Hajnoczi wrote:
>>>>
>>>> On Thu, Sep 30, 2010 at 1:50 AM, Venkateswararao Jujjuri (JV)
>>>> <jvrao@linux.vnet.ibm.com>    wrote:
>>>>>
>>>>> Code: Mainline QEMU (git://git.qemu.org/qemu.git)
>>>>> Machine: LS21 blade.
>>>>> Disk: Local disk through VirtIO.
>>>>> Did not select any cache option. Defaulting to writethrough.
>>>>>
>>>>> Command tested:
>>>>> 3 parallel instances of : dd if=/dev/zero of=/pmnt/my_pw bs=4k
>>>>> count=100000
>>>>>
>>>>> QEMU with  smp=1
>>>>> 19.3 MB/s + 19.2 MB/s + 18.6 MB/s = 57.1 MB/s
>>>>>
>>>>> QEMU with smp=4
>>>>> 15.3 MB/s + 14.1 MB/s + 13.6 MB/s = 43.0 MB/s
>>>>>
>>>>> Is this expected?
>>>>
>>>> Did you configure with --enable-io-thread?
>>>
>>> Yes I did.
>>>>
>>>> Also, try using dd oflag=direct to eliminate effects introduced by the
>>>> guest page cache and really hit the disk.
>>>
>>> With oflag=direct , I see no difference and the throughput is so slow and I
>>> would not
>>> expect to see any difference.
>>> It is 225 kb/s  for each thread either with smp=1 or with smp=4.
>>
>> If I understand correctly you are getting:
>>
>> QEMU oflag=direct with smp=1
>> 225 KB/s + 225 KB/s + 225 KB/s = 675 KB/s
>>
>> QEMU oflag=direct with smp=4
>> 225 KB/s + 225 KB/s + 225 KB/s = 675 KB/s
>>
>> This suggests the degradation for smp=4 is guest kernel page cache or
>> buffered I/O related.  Perhaps lockholder preemption?
>
> or just a single spindle maxed out because the blade hard drive doesn't
> have writecache enabled (it's disabled by default).

Yes, I am sure we are hitting the max limit on the blade local disk.
Question is why the smp=4 degraded the performance in the cached mode.

I am running latest kernel from upstream on the guest(2.6.36-rc5)..and using 
block IO.
Do we have any know issues in there which could explain performance degradation?

I am trying to get to a test which proves that the QEMU SMP improves/scales.
I would like to use it in validating our new VirtFS threading code (yet to hit 
mailing list).

Thanks,
JV

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU throughput is down with SMP
  2010-10-01 15:04         ` Venkateswararao Jujjuri (JV)
@ 2010-10-01 15:09           ` Stefan Hajnoczi
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2010-10-01 15:09 UTC (permalink / raw)
  To: Venkateswararao Jujjuri (JV); +Cc: Ryan Harper, Qemu-development List

On Fri, Oct 1, 2010 at 4:04 PM, Venkateswararao Jujjuri (JV)
<jvrao@linux.vnet.ibm.com> wrote:
> On 10/1/2010 6:38 AM, Ryan Harper wrote:
>>
>> * Stefan Hajnoczi<stefanha@gmail.com>  [2010-10-01 03:48]:
>>>
>>> On Thu, Sep 30, 2010 at 8:19 PM, Venkateswararao Jujjuri (JV)
>>> <jvrao@linux.vnet.ibm.com>  wrote:
>>>>
>>>> On 9/30/2010 2:13 AM, Stefan Hajnoczi wrote:
>>>>>
>>>>> On Thu, Sep 30, 2010 at 1:50 AM, Venkateswararao Jujjuri (JV)
>>>>> <jvrao@linux.vnet.ibm.com>    wrote:
>>>>>>
>>>>>> Code: Mainline QEMU (git://git.qemu.org/qemu.git)
>>>>>> Machine: LS21 blade.
>>>>>> Disk: Local disk through VirtIO.
>>>>>> Did not select any cache option. Defaulting to writethrough.
>>>>>>
>>>>>> Command tested:
>>>>>> 3 parallel instances of : dd if=/dev/zero of=/pmnt/my_pw bs=4k
>>>>>> count=100000
>>>>>>
>>>>>> QEMU with  smp=1
>>>>>> 19.3 MB/s + 19.2 MB/s + 18.6 MB/s = 57.1 MB/s
>>>>>>
>>>>>> QEMU with smp=4
>>>>>> 15.3 MB/s + 14.1 MB/s + 13.6 MB/s = 43.0 MB/s
>>>>>>
>>>>>> Is this expected?
>>>>>
>>>>> Did you configure with --enable-io-thread?
>>>>
>>>> Yes I did.
>>>>>
>>>>> Also, try using dd oflag=direct to eliminate effects introduced by the
>>>>> guest page cache and really hit the disk.
>>>>
>>>> With oflag=direct , I see no difference and the throughput is so slow
>>>> and I
>>>> would not
>>>> expect to see any difference.
>>>> It is 225 kb/s  for each thread either with smp=1 or with smp=4.
>>>
>>> If I understand correctly you are getting:
>>>
>>> QEMU oflag=direct with smp=1
>>> 225 KB/s + 225 KB/s + 225 KB/s = 675 KB/s
>>>
>>> QEMU oflag=direct with smp=4
>>> 225 KB/s + 225 KB/s + 225 KB/s = 675 KB/s
>>>
>>> This suggests the degradation for smp=4 is guest kernel page cache or
>>> buffered I/O related.  Perhaps lockholder preemption?
>>
>> or just a single spindle maxed out because the blade hard drive doesn't
>> have writecache enabled (it's disabled by default).
>
> Yes, I am sure we are hitting the max limit on the blade local disk.
> Question is why the smp=4 degraded the performance in the cached mode.
>
> I am running latest kernel from upstream on the guest(2.6.36-rc5)..and using
> block IO.
> Do we have any know issues in there which could explain performance
> degradation?

I suggested that lockholder preemption might be the issue.  If you
check /proc/lock_stat in a guest debug kernel after seeing poor
performance, do the lock statistics look suspicious (very long hold
times)?

Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-10-01 15:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-30  0:50 [Qemu-devel] QEMU throughput is down with SMP Venkateswararao Jujjuri (JV)
2010-09-30  9:13 ` Stefan Hajnoczi
2010-09-30 19:19   ` Venkateswararao Jujjuri (JV)
2010-10-01  8:41     ` Stefan Hajnoczi
2010-10-01 13:38       ` Ryan Harper
2010-10-01 15:04         ` Venkateswararao Jujjuri (JV)
2010-10-01 15:09           ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).