All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: VT is comically slow
@ 2006-07-06 19:16 alex
  2006-07-06 20:59 ` Anthony Liguori
  2006-07-10  7:51 ` Rami Rosen
  0 siblings, 2 replies; 6+ messages in thread
From: alex @ 2006-07-06 19:16 UTC (permalink / raw)
  To: xen-devel


Rik van Riel wrote:
> VT by itself seems fine, but once a VT domain is running a workload that
> is network intensive combined with a disk/cpu intensive workload, things
> get incredibly slow.
>
> Operations that take less than a second with either workload running
> alone can now take many seconds, sometimes the better part of a minute!
>
> Is this some limitation of the qemu device model?

We (Virtual Iron) are in a process of developing accelerated drivers for the HVM guests.  Our goal for this effort is to get as close to native performance as possible and to make paravirtualization of guests unnecessary.  The drivers currently support most flavors of RHEL, SLES and Windows.  The early performance numbers are encouraging.  Some numbers are many times faster than QEMU emulation and are close to native performance numbers (and we are just beginning to tune the performance).

Just to give people a flavor of the performance that we are getting, here are some preliminary results on Intel Woodcrest (51xx series), with a Gigabit network, with SAN storage and all of the VMs were 1 CPU.  These numbers are very early, disks numbers are very good and we are still tuning the network numbers.

Bonnie-SAN - bigger is better        RHEL-4.0 (32-bit)   VI-accel RHEL-4.0(32-bit)
Write, KB/sec                          52,106                 49,500
Read, KB/sec                           59,392                 57,186 

netperf - bigger is better           RHEL-4.0 (32-bit)   VI-accel RHEL-4.0(32-bit)
tcp req/resp (t/sec)	               6,831                  5,648

SPECjbb2000 - bigger is better       RHEL-4.0 (32-bit)   VI-accel RHEL-4.0(32-bit)
JRockit JVM thruput                    43,061                 40,364

This code is modeled on Xen backend/frontend architecture concepts and will be GPLed.  
 
-Alex V.

Alex Vasilevsky
Chief Technology Officer, Founder
Virtual Iron Software, Inc

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: VT is comically slow
  2006-07-06 19:16 VT is comically slow alex
@ 2006-07-06 20:59 ` Anthony Liguori
  2006-07-07  1:49   ` Rik van Riel
  2006-07-10  7:51 ` Rami Rosen
  1 sibling, 1 reply; 6+ messages in thread
From: Anthony Liguori @ 2006-07-06 20:59 UTC (permalink / raw)
  To: xen-devel

On Thu, 06 Jul 2006 11:16:18 -0800, alex wrote:

> We (Virtual Iron) are in a process of developing accelerated drivers for
> the HVM guests.  Our goal for this effort is to get as close to native
> performance as possible and to make paravirtualization of guests
> unnecessary.  The drivers currently support most flavors of RHEL, SLES and
> Windows.  The early performance numbers are encouraging.  Some numbers are
> many times faster than QEMU emulation and are close to native performance
> numbers (and we are just beginning to tune the performance).

I don't think paravirtual drivers are necessary for good performance. 
There are a number of things about QEMU's device emulation that are less
than ideal.

Namely, QEMU's disk emulation is IDE w/DMA.  Apparently, DMA is busted
right now but even if it worked, IDE only allows one outstanding request
at a time (which doesn't let the host scheduler do it's thing properly). 
Emulating either SCSI (which is in QEMU CVS) or SATA would allow multiple
outstanding requests which would be a big benefit.

Also, and I suspect this has more to do with your performance numbers,
QEMU currently does disk IO via read()/write() syscalls on an fd that's
open()'d without O_DIRECT.  This means everything's going through the page
cache.

I suspect that SCSI + linux-aio would result in close to native
performance.  Since SCSI is already in QEMU CVS, it's not that far off.

I think that the same applies to network IO but I'm not qualified to
comment on what things are important there.

Regards,

Anthony Liguori

> Just to give people a flavor of the performance that we are getting,
> here are some preliminary results on Intel Woodcrest (51xx series), with
> a Gigabit network, with SAN storage and all of the VMs were 1 CPU. These
> numbers are very early, disks numbers are very good and we are still
> tuning the network numbers.
> 
> Bonnie-SAN - bigger is better        RHEL-4.0 (32-bit)   VI-accel
> RHEL-4.0(32-bit) Write, KB/sec                          52,106
>     49,500 Read, KB/sec                           59,392
> 57,186
> 
> netperf - bigger is better           RHEL-4.0 (32-bit)   VI-accel
> RHEL-4.0(32-bit) tcp req/resp (t/sec)	               6,831
>  5,648
> 
> SPECjbb2000 - bigger is better       RHEL-4.0 (32-bit)   VI-accel
> RHEL-4.0(32-bit) JRockit JVM thruput                    43,061
>     40,364
> 
> This code is modeled on Xen backend/frontend architecture concepts and
> will be GPLed.
>  
> -Alex V.
> 
> Alex Vasilevsky
> Chief Technology Officer, Founder
> Virtual Iron Software, Inc

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Re: VT is comically slow
  2006-07-06 20:59 ` Anthony Liguori
@ 2006-07-07  1:49   ` Rik van Riel
  0 siblings, 0 replies; 6+ messages in thread
From: Rik van Riel @ 2006-07-07  1:49 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: xen-devel

Anthony Liguori wrote:

> I don't think paravirtual drivers are necessary for good performance. 

> I think that the same applies to network IO but I'm not qualified to
> comment on what things are important there.

Especially if we emulate a network card that does checksumming
"in hardware" and knows how to do zero-copy sending...

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Re: VT is comically slow
  2006-07-07  1:43 alex
@ 2006-07-07  2:01 ` Andrew Warfield
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Warfield @ 2006-07-07  2:01 UTC (permalink / raw)
  To: alex@vasilevsky.name; +Cc: xen-devel

> The QEMU code that we use doesn't go through the dom0 buffer cache, we modified the
> code to use O_DIRECT.  Can't user buffer cache and accelerated drivers (they go right
> to the disk) together, it can cause disk corruption.  The performance numbers we get
> from this version of QEMU is still 4 to 6 times slower that native disk I/O.

I doubt O_DIRECT buys you much in the way of performance -- as you say
it's just a correctness thing.  Still, the qemu block code is all
completely synchronous -- the fact that you simply can't have more
than a single outstanding block request at a time is going to
seriously harm performance.  As Anthony explained, some form of
asynchronous IO in the qemu drivers would clearly improve performance.

> You might be right, however even with pipelining and async I/O, I don't think it is going to get close to native I/O numbers.  I guess we'll just have to wait
> and see.

I'd expect that disk can be made to perform reasonably well with qemu,
using dma emulation and async IO.  The old vmware workstation paper on
device virtualization does a pretty good job of demonstrating that
trap and emulate device access sucks, and would seem to imply that
it's pretty unlikely to be practical for high-rate networking.

a.

[1] http://www.usenix.org/event/usenix01/sugerman/sugerman.pdf

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Re: VT is comically slow
@ 2006-07-07  2:35 alex
  0 siblings, 0 replies; 6+ messages in thread
From: alex @ 2006-07-07  2:35 UTC (permalink / raw)
  To: xen-devel

Andrew Warfield wrote:
> > The QEMU code that we use doesn't go through the dom0 buffer cache, we modified the
> > code to use O_DIRECT.  Can't user buffer cache and accelerated drivers (they go right
> > to the disk) together, it can cause disk corruption.  The performance numbers we get
> > from this version of QEMU is still 4 to 6 times slower that native disk I/O.
>
> I doubt O_DIRECT buys you much in the way of performance -- as you say
> it's just a correctness thing.  Still, the qemu block code is all
> completely synchronous -- the fact that you simply can't have more
> than a single outstanding block request at a time is going to
> seriously harm performance.  As Anthony explained, some form of
> asynchronous IO in the qemu drivers would clearly improve performance.
>
That was exactly my point, that O_DIRECT doesn't improve performance. Anthony had a 
a point in his e-mail that buffered I/O could be one of the reasons that performance 
of QEMU is slow.  
>
> > You might be right, however even with pipelining and async I/O, I don't think it is going to >> > > get close to native I/O numbers.  I guess we'll just have to wait and see.
> 
> I'd expect that disk can be made to perform reasonably well with qemu,
> using dma emulation and async IO.  The old vmware workstation paper on
> device virtualization does a pretty good job of demonstrating that
> trap and emulate device access sucks, and would seem to imply that
> it's pretty unlikely to be practical for high-rate networking.
>
I understand what you guys are proposing, and I look forward to see your implementation and to 
your performance numbers.  In particular it would be very interesting to see what kind of CPU overhead you'll get. With regard to networking I agree with the VMWare guys, it is not practical to do traps & emulation to achieve high rate networking throughput.  For example, with our accel drivers on certain network benchmarks we can drive network almost at wire speeds from an HVM domain and consume very few CPU cycles in doing this.

Cheers,

-Alex V.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: VT is comically slow
  2006-07-06 19:16 VT is comically slow alex
  2006-07-06 20:59 ` Anthony Liguori
@ 2006-07-10  7:51 ` Rami Rosen
  1 sibling, 0 replies; 6+ messages in thread
From: Rami Rosen @ 2006-07-10  7:51 UTC (permalink / raw)
  To: alex@vasilevsky.name; +Cc: xen-devel

Hello,

>This code is modeled on Xen backend/frontend architecture concepts and will be
>GPLed.

A little question for clarification, if I may:
These accelerated drivers, as I underastand, are running in Dom0.
(as domU runs unmodified OS kernels).
I assume that you are talking about generic drivers (like a driver for
IDE, driver
a driver for Net, etc) which will work in
conjunction with the real drivers ; am I right ?
or are these hardware specific drivers (like
one driver for e1000 nic, a driver for tg3 nic,a driver for realtek nic, etc.)?

Regards,
Rami Rosen

On 7/6/06, alex@vasilevsky.name <alex@vasilevsky.name> wrote:
>
> Rik van Riel wrote:
> > VT by itself seems fine, but once a VT domain is running a workload that
> > is network intensive combined with a disk/cpu intensive workload, things
> > get incredibly slow.
> >
> > Operations that take less than a second with either workload running
> > alone can now take many seconds, sometimes the better part of a minute!
> >
> > Is this some limitation of the qemu device model?
>
> We (Virtual Iron) are in a process of developing accelerated drivers for the HVM guests.  Our goal for this effort is to get as close to native performance as possible and to make paravirtualization of guests unnecessary.  The drivers currently support most flavors of RHEL, SLES and Windows.  The early performance numbers are encouraging.  Some numbers are many times faster than QEMU emulation and are close to native performance numbers (and we are just beginning to tune the performance).
>
> Just to give people a flavor of the performance that we are getting, here are some preliminary results on Intel Woodcrest (51xx series), with a Gigabit network, with SAN storage and all of the VMs were 1 CPU.  These numbers are very early, disks numbers are very good and we are still tuning the network numbers.
>
> Bonnie-SAN - bigger is better        RHEL-4.0 (32-bit)   VI-accel RHEL-4.0(32-bit)
> Write, KB/sec                          52,106                 49,500
> Read, KB/sec                           59,392                 57,186
>
> netperf - bigger is better           RHEL-4.0 (32-bit)   VI-accel RHEL-4.0(32-bit)
> tcp req/resp (t/sec)                   6,831                  5,648
>
> SPECjbb2000 - bigger is better       RHEL-4.0 (32-bit)   VI-accel RHEL-4.0(32-bit)
> JRockit JVM thruput                    43,061                 40,364
>
> This code is modeled on Xen backend/frontend architecture concepts and will be GPLed.
>
> -Alex V.
>
> Alex Vasilevsky
> Chief Technology Officer, Founder
> Virtual Iron Software, Inc
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-07-10  7:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-06 19:16 VT is comically slow alex
2006-07-06 20:59 ` Anthony Liguori
2006-07-07  1:49   ` Rik van Riel
2006-07-10  7:51 ` Rami Rosen
  -- strict thread matches above, loose matches on Subject: below --
2006-07-07  1:43 alex
2006-07-07  2:01 ` Andrew Warfield
2006-07-07  2:35 alex

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.