[Qemu-devel] Question about qemu firmware configuration (fw

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
@ 2010-07-17  9:50 Richard W.M. Jones
  2010-07-17  9:53 ` Richard W.M. Jones
                   ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Richard W.M. Jones @ 2010-07-17  9:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: agraf, Gleb Natapov

I'm trying to speed up the process of loading kernel and initrd.

I found that the main loop which loads these into qemu memory does it
via executing in the guest:

  rep insb (%dx),%es:(%edi)

In other words, reading it byte-at-a-time from an emulated IO port.
This is very slow[1] when your initrd is > 100MB like mine is.

Questions:

Is fw_cfg a purely qemu concept?  Does this BIOS firmware port
0x510-0x511 exist in real hardware?

I understand from the git logs that fw_cfg was added because the old
way was to load kernel & initrd into RAM directly, but this didn't
work because SeaBIOS would clear the RAM, clobbering kernel & initrd.
Could we change to loading these directly into RAM, and instead
provide some indication to SeaBIOS so it doesn't clobber the RAM?  I'm
quite prepared to do the work, just wondering if there's something
else I'm not getting about this.

Rich.

[1] Several seconds of wallclock time, and according to gprof, the
function 'fw_cfg_io_readb' accounts for > 50% of the time taken in
qemu between qemu starting and us entering the Linux kernel.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-17  9:50 [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device Richard W.M. Jones
@ 2010-07-17  9:53 ` Richard W.M. Jones
  2010-07-18 17:26   ` Alexander Graf
  2010-07-19  6:14 ` [Qemu-devel] " Gleb Natapov
  2010-07-20 22:22 ` [Qemu-devel] " Blue Swirl
  2 siblings, 1 reply; 56+ messages in thread
From: Richard W.M. Jones @ 2010-07-17  9:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: agraf, Gleb Natapov

On Sat, Jul 17, 2010 at 10:50:59AM +0100, Richard W.M. Jones wrote:
> I understand from the git logs that fw_cfg was added because the old
> way was to load kernel & initrd into RAM directly, but this didn't
> work because SeaBIOS would clear the RAM, clobbering kernel & initrd.
> Could we change to loading these directly into RAM, and instead
> provide some indication to SeaBIOS so it doesn't clobber the RAM?  I'm
> quite prepared to do the work, just wondering if there's something
> else I'm not getting about this.

Or thinking around the subject:

Change fw_cfg so that you send a command + a physical address, and
fw_cfg memcpy's the kernel / initrd / etc to that physical address.
Then linuxboot.bin doesn't have to do the manual copying.

Or just change linuxboot.bin so it does 32 bit inl instructions, which
might at least be a bit faster ...

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://et.redhat.com/~rjones/virt-top

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-17  9:53 ` Richard W.M. Jones
@ 2010-07-18 17:26   ` Alexander Graf
  2010-07-18 20:09     ` Richard W.M. Jones
  2010-07-19  6:12     ` Gleb Natapov
  0 siblings, 2 replies; 56+ messages in thread
From: Alexander Graf @ 2010-07-18 17:26 UTC (permalink / raw)
  To: Richard W.M.Jones; +Cc: qemu-devel, Gleb Natapov


On 17.07.2010, at 11:53, Richard W.M. Jones wrote:

> On Sat, Jul 17, 2010 at 10:50:59AM +0100, Richard W.M. Jones wrote:
>> I understand from the git logs that fw_cfg was added because the old
>> way was to load kernel & initrd into RAM directly, but this didn't
>> work because SeaBIOS would clear the RAM, clobbering kernel & initrd.
>> Could we change to loading these directly into RAM, and instead
>> provide some indication to SeaBIOS so it doesn't clobber the RAM?  I'm
>> quite prepared to do the work, just wondering if there's something
>> else I'm not getting about this.
> 
> Or thinking around the subject:
> 
> Change fw_cfg so that you send a command + a physical address, and
> fw_cfg memcpy's the kernel / initrd / etc to that physical address.
> Then linuxboot.bin doesn't have to do the manual copying.
> 
> Or just change linuxboot.bin so it does 32 bit inl instructions, which
> might at least be a bit faster ...

I don't see why it would be slow. ins should be emulated using coalesced mmio, no?

Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-18 17:26   ` Alexander Graf
@ 2010-07-18 20:09     ` Richard W.M. Jones
  2010-07-18 20:32       ` Alexander Graf
  2010-07-19  6:12     ` Gleb Natapov
  1 sibling, 1 reply; 56+ messages in thread
From: Richard W.M. Jones @ 2010-07-18 20:09 UTC (permalink / raw)
  To: Alexander Graf; +Cc: qemu-devel, Gleb Natapov

On Sun, Jul 18, 2010 at 07:26:57PM +0200, Alexander Graf wrote:
> On 17.07.2010, at 11:53, Richard W.M. Jones wrote:
> > On Sat, Jul 17, 2010 at 10:50:59AM +0100, Richard W.M. Jones wrote:
> >> I understand from the git logs that fw_cfg was added because the old
> >> way was to load kernel & initrd into RAM directly, but this didn't
> >> work because SeaBIOS would clear the RAM, clobbering kernel & initrd.
> >> Could we change to loading these directly into RAM, and instead
> >> provide some indication to SeaBIOS so it doesn't clobber the RAM?  I'm
> >> quite prepared to do the work, just wondering if there's something
> >> else I'm not getting about this.
> > 
> > Or thinking around the subject:
> > 
> > Change fw_cfg so that you send a command + a physical address, and
> > fw_cfg memcpy's the kernel / initrd / etc to that physical address.
> > Then linuxboot.bin doesn't have to do the manual copying.
> > 
> > Or just change linuxboot.bin so it does 32 bit inl instructions, which
> > might at least be a bit faster ...
> 
> I don't see why it would be slow. ins should be emulated using coalesced mmio, no?

It knocks 1 second off libguestfs boot times on my faster 64 bit
desktop machine, and 2 seconds off with my old 32 bit laptop.
(Roughly 15% faster in both cases)

The 64 bit machine times are:

Without my patch:

real	0m7.581s
user	0m4.730s
sys	0m2.124s

With my patch:

real	0m6.579s
user	0m3.614s
sys	0m1.941s

If you want to reproduce this (you'll need a recent Fedora machine)
you can do:

  $ cat qemu-wrapper 
  #!/bin/sh -
  qemudir=/home/rjones/d/qemu
  exec $qemudir/x86_64-softmmu/qemu-system-x86_64 -L $qemudir/pc-bios "$@"

  $ export LIBGUESTFS_QEMU=/home/rjones/d/qemu/qemu-wrapper
  $ time guestfish --ro -a /dev/null run

Obviously I'm running that several times over, discarding the first
few runs, because I'm only interested in the "hot cache" case.

By the way, even if you reject the patch as a whole, part 1/2 of the
patch is just an obvious bug fix, and I think should be applied
anyway.

http://lists.gnu.org/archive/html/qemu-devel/2010-07/threads.html#00967

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://et.redhat.com/~rjones/virt-top

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-18 20:09     ` Richard W.M. Jones
@ 2010-07-18 20:32       ` Alexander Graf
  2010-07-19  6:23         ` Gleb Natapov
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2010-07-18 20:32 UTC (permalink / raw)
  To: Richard W.M.Jones; +Cc: qemu-devel, Gleb Natapov


On 18.07.2010, at 22:09, Richard W.M. Jones wrote:

> On Sun, Jul 18, 2010 at 07:26:57PM +0200, Alexander Graf wrote:
>> On 17.07.2010, at 11:53, Richard W.M. Jones wrote:
>>> On Sat, Jul 17, 2010 at 10:50:59AM +0100, Richard W.M. Jones wrote:
>>>> I understand from the git logs that fw_cfg was added because the old
>>>> way was to load kernel & initrd into RAM directly, but this didn't
>>>> work because SeaBIOS would clear the RAM, clobbering kernel & initrd.
>>>> Could we change to loading these directly into RAM, and instead
>>>> provide some indication to SeaBIOS so it doesn't clobber the RAM?  I'm
>>>> quite prepared to do the work, just wondering if there's something
>>>> else I'm not getting about this.
>>> 
>>> Or thinking around the subject:
>>> 
>>> Change fw_cfg so that you send a command + a physical address, and
>>> fw_cfg memcpy's the kernel / initrd / etc to that physical address.
>>> Then linuxboot.bin doesn't have to do the manual copying.
>>> 
>>> Or just change linuxboot.bin so it does 32 bit inl instructions, which
>>> might at least be a bit faster ...
>> 
>> I don't see why it would be slow. ins should be emulated using coalesced mmio, no?
> 
> It knocks 1 second off libguestfs boot times on my faster 64 bit
> desktop machine, and 2 seconds off with my old 32 bit laptop.
> (Roughly 15% faster in both cases)
> 
> The 64 bit machine times are:
> 
> Without my patch:
> 
> real	0m7.581s
> user	0m4.730s
> sys	0m2.124s
> 
> With my patch:
> 
> real	0m6.579s
> user	0m3.614s
> sys	0m1.941s
> 
> If you want to reproduce this (you'll need a recent Fedora machine)
> you can do:
> 
>  $ cat qemu-wrapper 
>  #!/bin/sh -
>  qemudir=/home/rjones/d/qemu
>  exec $qemudir/x86_64-softmmu/qemu-system-x86_64 -L $qemudir/pc-bios "$@"
> 
>  $ export LIBGUESTFS_QEMU=/home/rjones/d/qemu/qemu-wrapper
>  $ time guestfish --ro -a /dev/null run
> 
> Obviously I'm running that several times over, discarding the first
> few runs, because I'm only interested in the "hot cache" case.
> 
> By the way, even if you reject the patch as a whole, part 1/2 of the
> patch is just an obvious bug fix, and I think should be applied
> anyway.
> 
> http://lists.gnu.org/archive/html/qemu-devel/2010-07/threads.html#00967

I haven't rejected anything yet - in general I like the idea of DMA'ing fw_cfg variables. I guess since it's basically an ISA PV device, we also don't need to care about bus mastering or anything, right? Or do we? Hrm.


Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-18 17:26   ` Alexander Graf
  2010-07-18 20:09     ` Richard W.M. Jones
@ 2010-07-19  6:12     ` Gleb Natapov
  1 sibling, 0 replies; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  6:12 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Richard W.M.Jones, qemu-devel

On Sun, Jul 18, 2010 at 07:26:57PM +0200, Alexander Graf wrote:
> 
> On 17.07.2010, at 11:53, Richard W.M. Jones wrote:
> 
> > On Sat, Jul 17, 2010 at 10:50:59AM +0100, Richard W.M. Jones wrote:
> >> I understand from the git logs that fw_cfg was added because the old
> >> way was to load kernel & initrd into RAM directly, but this didn't
> >> work because SeaBIOS would clear the RAM, clobbering kernel & initrd.
> >> Could we change to loading these directly into RAM, and instead
> >> provide some indication to SeaBIOS so it doesn't clobber the RAM?  I'm
> >> quite prepared to do the work, just wondering if there's something
> >> else I'm not getting about this.
> > 
> > Or thinking around the subject:
> > 
> > Change fw_cfg so that you send a command + a physical address, and
> > fw_cfg memcpy's the kernel / initrd / etc to that physical address.
> > Then linuxboot.bin doesn't have to do the manual copying.
> > 
> > Or just change linuxboot.bin so it does 32 bit inl instructions, which
> > might at least be a bit faster ...
> 
> I don't see why it would be slow. ins should be emulated using coalesced mmio, no?
> 
Coalesced mmio is for mmio, not pio. But we do try to optimize pio
string instructions otherwise loading of 100M would take hours.

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] Re: Question about qemu firmware configuration (fw_cfg) device
  2010-07-17  9:50 [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device Richard W.M. Jones
  2010-07-17  9:53 ` Richard W.M. Jones
@ 2010-07-19  6:14 ` Gleb Natapov
  2010-07-20 22:22 ` [Qemu-devel] " Blue Swirl
  2 siblings, 0 replies; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  6:14 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: qemu-devel, agraf

On Sat, Jul 17, 2010 at 10:50:59AM +0100, Richard W.M. Jones wrote:
> I'm trying to speed up the process of loading kernel and initrd.
> 
> I found that the main loop which loads these into qemu memory does it
> via executing in the guest:
> 
>   rep insb (%dx),%es:(%edi)
> 
> In other words, reading it byte-at-a-time from an emulated IO port.
> This is very slow[1] when your initrd is > 100MB like mine is.
> 
> Questions:
> 
> Is fw_cfg a purely qemu concept?  Does this BIOS firmware port
> 0x510-0x511 exist in real hardware?
> 
It is purely qemu concept.

> I understand from the git logs that fw_cfg was added because the old
> way was to load kernel & initrd into RAM directly, but this didn't
> work because SeaBIOS would clear the RAM, clobbering kernel & initrd.
> Could we change to loading these directly into RAM, and instead
> provide some indication to SeaBIOS so it doesn't clobber the RAM?  I'm
> quite prepared to do the work, just wondering if there's something
> else I'm not getting about this.
> 
> Rich.
> 
> [1] Several seconds of wallclock time, and according to gprof, the
> function 'fw_cfg_io_readb' accounts for > 50% of the time taken in
> qemu between qemu starting and us entering the Linux kernel.
> 
Several seconds is not too bad for 100M. Have you tested how much insl
improves this?

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-18 20:32       ` Alexander Graf
@ 2010-07-19  6:23         ` Gleb Natapov
  2010-07-19  7:28           ` Richard W.M. Jones
  0 siblings, 1 reply; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  6:23 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Richard W.M.Jones, qemu-devel

On Sun, Jul 18, 2010 at 10:32:53PM +0200, Alexander Graf wrote:
> 
> On 18.07.2010, at 22:09, Richard W.M. Jones wrote:
> 
> > On Sun, Jul 18, 2010 at 07:26:57PM +0200, Alexander Graf wrote:
> >> On 17.07.2010, at 11:53, Richard W.M. Jones wrote:
> >>> On Sat, Jul 17, 2010 at 10:50:59AM +0100, Richard W.M. Jones wrote:
> >>>> I understand from the git logs that fw_cfg was added because the old
> >>>> way was to load kernel & initrd into RAM directly, but this didn't
> >>>> work because SeaBIOS would clear the RAM, clobbering kernel & initrd.
> >>>> Could we change to loading these directly into RAM, and instead
> >>>> provide some indication to SeaBIOS so it doesn't clobber the RAM?  I'm
> >>>> quite prepared to do the work, just wondering if there's something
> >>>> else I'm not getting about this.
> >>> 
> >>> Or thinking around the subject:
> >>> 
> >>> Change fw_cfg so that you send a command + a physical address, and
> >>> fw_cfg memcpy's the kernel / initrd / etc to that physical address.
> >>> Then linuxboot.bin doesn't have to do the manual copying.
> >>> 
> >>> Or just change linuxboot.bin so it does 32 bit inl instructions, which
> >>> might at least be a bit faster ...
> >> 
> >> I don't see why it would be slow. ins should be emulated using coalesced mmio, no?
> > 
> > It knocks 1 second off libguestfs boot times on my faster 64 bit
> > desktop machine, and 2 seconds off with my old 32 bit laptop.
> > (Roughly 15% faster in both cases)
> > 
> > The 64 bit machine times are:
> > 
> > Without my patch:
> > 
> > real	0m7.581s
> > user	0m4.730s
> > sys	0m2.124s
> > 
> > With my patch:
> > 
> > real	0m6.579s
> > user	0m3.614s
> > sys	0m1.941s
> > 
> > If you want to reproduce this (you'll need a recent Fedora machine)
> > you can do:
> > 
> >  $ cat qemu-wrapper 
> >  #!/bin/sh -
> >  qemudir=/home/rjones/d/qemu
> >  exec $qemudir/x86_64-softmmu/qemu-system-x86_64 -L $qemudir/pc-bios "$@"
> > 
> >  $ export LIBGUESTFS_QEMU=/home/rjones/d/qemu/qemu-wrapper
> >  $ time guestfish --ro -a /dev/null run
> > 
> > Obviously I'm running that several times over, discarding the first
> > few runs, because I'm only interested in the "hot cache" case.
> > 
> > By the way, even if you reject the patch as a whole, part 1/2 of the
> > patch is just an obvious bug fix, and I think should be applied
> > anyway.
> > 
> > http://lists.gnu.org/archive/html/qemu-devel/2010-07/threads.html#00967
> 
> I haven't rejected anything yet - in general I like the idea of DMA'ing fw_cfg variables. I guess since it's basically an ISA PV device, we also don't need to care about bus mastering or anything, right? Or do we? Hrm.
> 
That what I am warring about too. If we are adding device we have to be
sure such device can actually exist on real hw too otherwise we may have
problems later. Also 1 second on 100M file does not look like huge gain
to me.

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  6:23         ` Gleb Natapov
@ 2010-07-19  7:28           ` Richard W.M. Jones
  2010-07-19  7:33             ` Gleb Natapov
  0 siblings, 1 reply; 56+ messages in thread
From: Richard W.M. Jones @ 2010-07-19  7:28 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Alexander Graf, qemu-devel

On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> That what I am warring about too. If we are adding device we have to be
> sure such device can actually exist on real hw too otherwise we may have
> problems later.

I don't understand why the constraints of real h/w have anything to do
with this.  Can you explain?

> Also 1 second on 100M file does not look like huge gain to me.

Every second counts.  We're trying to get libguestfs boot times down
from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
program.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  7:28           ` Richard W.M. Jones
@ 2010-07-19  7:33             ` Gleb Natapov
  2010-07-19  7:40               ` Alexander Graf
                                 ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  7:33 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Alexander Graf, qemu-devel

On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> > That what I am warring about too. If we are adding device we have to be
> > sure such device can actually exist on real hw too otherwise we may have
> > problems later.
> 
> I don't understand why the constraints of real h/w have anything to do
> with this.  Can you explain?
> 
Each time we do something not architectural it cause us troubles later.
So constraints of real h/w is our constrains to.

> > Also 1 second on 100M file does not look like huge gain to me.
> 
> Every second counts.  We're trying to get libguestfs boot times down
> from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
> program.
> 
So what about making initrd smaller? I remember managing two
distribution in 64M flash in embedded project.

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  7:33             ` Gleb Natapov
@ 2010-07-19  7:40               ` Alexander Graf
  2010-07-19  7:51                 ` Gleb Natapov
  2010-07-19  9:19                 ` Richard W.M. Jones
  2010-07-19  7:44               ` Richard W.M. Jones
  2010-07-19 14:45               ` Anthony Liguori
  2 siblings, 2 replies; 56+ messages in thread
From: Alexander Graf @ 2010-07-19  7:40 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Richard W.M. Jones, qemu-devel


On 19.07.2010, at 09:33, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
>> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
>>> That what I am warring about too. If we are adding device we have to be
>>> sure such device can actually exist on real hw too otherwise we may have
>>> problems later.
>> 
>> I don't understand why the constraints of real h/w have anything to do
>> with this.  Can you explain?
>> 
> Each time we do something not architectural it cause us troubles later.
> So constraints of real h/w is our constrains to.
> 
>>> Also 1 second on 100M file does not look like huge gain to me.
>> 
>> Every second counts.  We're trying to get libguestfs boot times down
>> from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
>> program.
>> 
> So what about making initrd smaller? I remember managing two
> distribution in 64M flash in embedded project.

Having a huge initrd basically helps in reusing a lot of existing code. We do the same - in general the initrd is just a subset of the applications of the host OS. And if you start putting perl or the likes into it, it becomes big.

I guess the best thing for now really is to try and see which code paths insb goes along. It should really be coalesced.

Richard, what does kvm_stat tell you while loading the initrd? Are there a lot of PIO requests or are we simply looping inside qemu code?


Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  7:33             ` Gleb Natapov
  2010-07-19  7:40               ` Alexander Graf
@ 2010-07-19  7:44               ` Richard W.M. Jones
  2010-07-19  7:55                 ` Gleb Natapov
  2010-07-19 14:45               ` Anthony Liguori
  2 siblings, 1 reply; 56+ messages in thread
From: Richard W.M. Jones @ 2010-07-19  7:44 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Alexander Graf, qemu-devel

On Mon, Jul 19, 2010 at 10:33:12AM +0300, Gleb Natapov wrote:
> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> > On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> > > That what I am warring about too. If we are adding device we have to be
> > > sure such device can actually exist on real hw too otherwise we may have
> > > problems later.
> > 
> > I don't understand why the constraints of real h/w have anything to do
> > with this.  Can you explain?
> > 
> Each time we do something not architectural it cause us troubles later.

Can you explain more or point to some examples?  I really don't
understand what these troubles could be.  But I'm prepared to be
enlightened.

> So what about making initrd smaller? I remember managing two
> distribution in 64M flash in embedded project.

The distribution is the size that it is, because (a) it has to be
based on Fedora and because (b) it has to include a certain number of
programs.

The reason for (a) is so that we don't need to compile our own tools
and we can benefit from bug fixes from Fedora (and contribute bug
fixes back).  The reason for (b) is that we want to implement a rich
API[1], and having a rich API means we simply have to include many
binaries.

We're already doing a lot of minimization on the image[2], deleting
man pages, language files, etc., so the image mainly just contains
binaries and libraries and kernel modules, which we cannot get rid of
because of (b).  The original pre-minimization image is 600MB or so.

Rich.

[1] http://libguestfs.org/guestfs.3.html
[2] http://manpages.ubuntu.com/manpages/lucid/man8/febootstrap-minimize.8.html

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://et.redhat.com/~rjones/libguestfs/
See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  7:40               ` Alexander Graf
@ 2010-07-19  7:51                 ` Gleb Natapov
  2010-07-19  7:57                   ` Alexander Graf
  2010-07-20 13:15                   ` Jamie Lokier
  2010-07-19  9:19                 ` Richard W.M. Jones
  1 sibling, 2 replies; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  7:51 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Richard W.M. Jones, qemu-devel

On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 09:33, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> >> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> >>> That what I am warring about too. If we are adding device we have to be
> >>> sure such device can actually exist on real hw too otherwise we may have
> >>> problems later.
> >> 
> >> I don't understand why the constraints of real h/w have anything to do
> >> with this.  Can you explain?
> >> 
> > Each time we do something not architectural it cause us troubles later.
> > So constraints of real h/w is our constrains to.
> > 
> >>> Also 1 second on 100M file does not look like huge gain to me.
> >> 
> >> Every second counts.  We're trying to get libguestfs boot times down
> >> from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
> >> program.
> >> 
> > So what about making initrd smaller? I remember managing two
> > distribution in 64M flash in embedded project.
> 
> Having a huge initrd basically helps in reusing a lot of existing code. We do the same - in general the initrd is just a subset of the applications of the host OS. And if you start putting perl or the likes into it, it becomes big.
> 
Why not provide small disk/cdrom with all those utilities installed?

> I guess the best thing for now really is to try and see which code paths insb goes along. It should really be coalesced.
> 
It is coalesced to a certain extent (reenter guest every 1024 bytes,
read from userspace page at a time). You need to continue injecting
interrupt into a guest during long string operation and checking
exception condition on a page boundaries.

> Richard, what does kvm_stat tell you while loading the initrd? Are there a lot of PIO requests or are we simply looping inside qemu code?
> 
> 
> Alex

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  7:44               ` Richard W.M. Jones
@ 2010-07-19  7:55                 ` Gleb Natapov
  2010-07-19  8:34                   ` Richard W.M. Jones
  0 siblings, 1 reply; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  7:55 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Alexander Graf, qemu-devel

On Mon, Jul 19, 2010 at 08:44:16AM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 19, 2010 at 10:33:12AM +0300, Gleb Natapov wrote:
> > On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> > > On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> > > > That what I am warring about too. If we are adding device we have to be
> > > > sure such device can actually exist on real hw too otherwise we may have
> > > > problems later.
> > > 
> > > I don't understand why the constraints of real h/w have anything to do
> > > with this.  Can you explain?
> > > 
> > Each time we do something not architectural it cause us troubles later.
> 
> Can you explain more or point to some examples?  I really don't
> understand what these troubles could be.  But I'm prepared to be
> enlightened.
> 
There are many. Look at vmware backdoor interface for instance. Such
beast can't exist on real HW, so now we have to have hacks in emulator
since io operation can change cpu registers. And I am not saying that
what you are proposing can't exist on real HW. If such device can exist
we can do it that way too. The gain is too small though.

> > So what about making initrd smaller? I remember managing two
> > distribution in 64M flash in embedded project.
> 
> The distribution is the size that it is, because (a) it has to be
> based on Fedora and because (b) it has to include a certain number of
> programs.
Why not put then on cdrom or disk?

> 
> The reason for (a) is so that we don't need to compile our own tools
> and we can benefit from bug fixes from Fedora (and contribute bug
> fixes back).  The reason for (b) is that we want to implement a rich
> API[1], and having a rich API means we simply have to include many
> binaries.
> 
> We're already doing a lot of minimization on the image[2], deleting
> man pages, language files, etc., so the image mainly just contains
> binaries and libraries and kernel modules, which we cannot get rid of
> because of (b).  The original pre-minimization image is 600MB or so.
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  7:51                 ` Gleb Natapov
@ 2010-07-19  7:57                   ` Alexander Graf
  2010-07-19  8:01                     ` Gleb Natapov
  2010-07-20 13:15                   ` Jamie Lokier
  1 sibling, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2010-07-19  7:57 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Richard W.M. Jones, qemu-devel


On 19.07.2010, at 09:51, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 09:33, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
>>>> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
>>>>> That what I am warring about too. If we are adding device we have to be
>>>>> sure such device can actually exist on real hw too otherwise we may have
>>>>> problems later.
>>>> 
>>>> I don't understand why the constraints of real h/w have anything to do
>>>> with this.  Can you explain?
>>>> 
>>> Each time we do something not architectural it cause us troubles later.
>>> So constraints of real h/w is our constrains to.
>>> 
>>>>> Also 1 second on 100M file does not look like huge gain to me.
>>>> 
>>>> Every second counts.  We're trying to get libguestfs boot times down
>>>> from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
>>>> program.
>>>> 
>>> So what about making initrd smaller? I remember managing two
>>> distribution in 64M flash in embedded project.
>> 
>> Having a huge initrd basically helps in reusing a lot of existing code. We do the same - in general the initrd is just a subset of the applications of the host OS. And if you start putting perl or the likes into it, it becomes big.
>> 
> Why not provide small disk/cdrom with all those utilities installed?

Because - if the loading is done fast - this way everything's in RAM instantly. And you still have all devices available for use inside the system - that makes enumeration a lot easier. There are several reasons why and I don't think we should force different ways on people just because one component of our system is ineffective.

> 
>> I guess the best thing for now really is to try and see which code paths insb goes along. It should really be coalesced.
>> 
> It is coalesced to a certain extent (reenter guest every 1024 bytes,
> read from userspace page at a time). You need to continue injecting
> interrupt into a guest during long string operation and checking
> exception condition on a page boundaries.

That still sounds slow. So yeah, adding DMA is probably the right way to go. But then again - if we model it after real hw it would be asynchronous, giving us an interrupt, causing even more headache. Ugh.

Can't we just ignore real hw constraints here and have it available in guest ram once one particular PIO is done? No bus master, no interrupts, but full speed and simplicity/atomicity which also helps migration.

Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  7:57                   ` Alexander Graf
@ 2010-07-19  8:01                     ` Gleb Natapov
  2010-07-19  8:08                       ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  8:01 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Richard W.M. Jones, qemu-devel

On Mon, Jul 19, 2010 at 09:57:02AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 09:51, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 09:33, Gleb Natapov wrote:
> >> 
> >>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> >>>> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> >>>>> That what I am warring about too. If we are adding device we have to be
> >>>>> sure such device can actually exist on real hw too otherwise we may have
> >>>>> problems later.
> >>>> 
> >>>> I don't understand why the constraints of real h/w have anything to do
> >>>> with this.  Can you explain?
> >>>> 
> >>> Each time we do something not architectural it cause us troubles later.
> >>> So constraints of real h/w is our constrains to.
> >>> 
> >>>>> Also 1 second on 100M file does not look like huge gain to me.
> >>>> 
> >>>> Every second counts.  We're trying to get libguestfs boot times down
> >>>> from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
> >>>> program.
> >>>> 
> >>> So what about making initrd smaller? I remember managing two
> >>> distribution in 64M flash in embedded project.
> >> 
> >> Having a huge initrd basically helps in reusing a lot of existing code. We do the same - in general the initrd is just a subset of the applications of the host OS. And if you start putting perl or the likes into it, it becomes big.
> >> 
> > Why not provide small disk/cdrom with all those utilities installed?
> 
> Because - if the loading is done fast - this way everything's in RAM instantly. And you still have all devices available for use inside the system - that makes enumeration a lot easier. There are several reasons why and I don't think we should force different ways on people just because one component of our system is ineffective.
> 
Loading huge initrd on real HW takes noticeably longer time that small
one, so I would say that it is your design that is to blame here, not
KVM.

> > 
> >> I guess the best thing for now really is to try and see which code paths insb goes along. It should really be coalesced.
> >> 
> > It is coalesced to a certain extent (reenter guest every 1024 bytes,
> > read from userspace page at a time). You need to continue injecting
> > interrupt into a guest during long string operation and checking
> > exception condition on a page boundaries.
> 
> That still sounds slow. So yeah, adding DMA is probably the right way to go. But then again - if we model it after real hw it would be asynchronous, giving us an interrupt, causing even more headache. Ugh.
> 
> Can't we just ignore real hw constraints here and have it available in guest ram once one particular PIO is done? No bus master, no interrupts, but full speed and simplicity/atomicity which also helps migration.
> 
We shouldn't add devices that work not like real HW to speed up some
pathological cases (and are slow on real HW too).

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  8:01                     ` Gleb Natapov
@ 2010-07-19  8:08                       ` Alexander Graf
  2010-07-19  8:19                         ` Gleb Natapov
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2010-07-19  8:08 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Richard W.M. Jones, qemu-devel


On 19.07.2010, at 10:01, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 09:57:02AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 09:51, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
>>>> 
>>>> On 19.07.2010, at 09:33, Gleb Natapov wrote:
>>>> 
>>>>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
>>>>>> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
>>>>>>> That what I am warring about too. If we are adding device we have to be
>>>>>>> sure such device can actually exist on real hw too otherwise we may have
>>>>>>> problems later.
>>>>>> 
>>>>>> I don't understand why the constraints of real h/w have anything to do
>>>>>> with this.  Can you explain?
>>>>>> 
>>>>> Each time we do something not architectural it cause us troubles later.
>>>>> So constraints of real h/w is our constrains to.
>>>>> 
>>>>>>> Also 1 second on 100M file does not look like huge gain to me.
>>>>>> 
>>>>>> Every second counts.  We're trying to get libguestfs boot times down
>>>>>> from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
>>>>>> program.
>>>>>> 
>>>>> So what about making initrd smaller? I remember managing two
>>>>> distribution in 64M flash in embedded project.
>>>> 
>>>> Having a huge initrd basically helps in reusing a lot of existing code. We do the same - in general the initrd is just a subset of the applications of the host OS. And if you start putting perl or the likes into it, it becomes big.
>>>> 
>>> Why not provide small disk/cdrom with all those utilities installed?
>> 
>> Because - if the loading is done fast - this way everything's in RAM instantly. And you still have all devices available for use inside the system - that makes enumeration a lot easier. There are several reasons why and I don't think we should force different ways on people just because one component of our system is ineffective.
>> 
> Loading huge initrd on real HW takes noticeably longer time that small
> one, so I would say that it is your design that is to blame here, not
> KVM.

I disagree. Virtualization enables new use cases. The -initrd parameter is a very good example for that. It's something that you simply couldn't do on real hw.

> 
>>> 
>>>> I guess the best thing for now really is to try and see which code paths insb goes along. It should really be coalesced.
>>>> 
>>> It is coalesced to a certain extent (reenter guest every 1024 bytes,
>>> read from userspace page at a time). You need to continue injecting
>>> interrupt into a guest during long string operation and checking
>>> exception condition on a page boundaries.
>> 
>> That still sounds slow. So yeah, adding DMA is probably the right way to go. But then again - if we model it after real hw it would be asynchronous, giving us an interrupt, causing even more headache. Ugh.
>> 
>> Can't we just ignore real hw constraints here and have it available in guest ram once one particular PIO is done? No bus master, no interrupts, but full speed and simplicity/atomicity which also helps migration.
>> 
> We shouldn't add devices that work not like real HW to speed up some
> pathological cases (and are slow on real HW too).

Just because you don't use them doesn't mean they're pathological, really. We simply chose a bad interface for transferring reasonable big chunks of data and we need to fix that. If you want to look at it from a different perspective, it's a regression. Older qemu versions did map the kernel and initrd directly into guest ram, so now we're slower than back then.


Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  8:08                       ` Alexander Graf
@ 2010-07-19  8:19                         ` Gleb Natapov
  2010-07-19  8:24                           ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  8:19 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Richard W.M. Jones, qemu-devel

On Mon, Jul 19, 2010 at 10:08:57AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 10:01, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 09:57:02AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 09:51, Gleb Natapov wrote:
> >> 
> >>> On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
> >>>> 
> >>>> On 19.07.2010, at 09:33, Gleb Natapov wrote:
> >>>> 
> >>>>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> >>>>>> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> >>>>>>> That what I am warring about too. If we are adding device we have to be
> >>>>>>> sure such device can actually exist on real hw too otherwise we may have
> >>>>>>> problems later.
> >>>>>> 
> >>>>>> I don't understand why the constraints of real h/w have anything to do
> >>>>>> with this.  Can you explain?
> >>>>>> 
> >>>>> Each time we do something not architectural it cause us troubles later.
> >>>>> So constraints of real h/w is our constrains to.
> >>>>> 
> >>>>>>> Also 1 second on 100M file does not look like huge gain to me.
> >>>>>> 
> >>>>>> Every second counts.  We're trying to get libguestfs boot times down
> >>>>>> from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
> >>>>>> program.
> >>>>>> 
> >>>>> So what about making initrd smaller? I remember managing two
> >>>>> distribution in 64M flash in embedded project.
> >>>> 
> >>>> Having a huge initrd basically helps in reusing a lot of existing code. We do the same - in general the initrd is just a subset of the applications of the host OS. And if you start putting perl or the likes into it, it becomes big.
> >>>> 
> >>> Why not provide small disk/cdrom with all those utilities installed?
> >> 
> >> Because - if the loading is done fast - this way everything's in RAM instantly. And you still have all devices available for use inside the system - that makes enumeration a lot easier. There are several reasons why and I don't think we should force different ways on people just because one component of our system is ineffective.
> >> 
> > Loading huge initrd on real HW takes noticeably longer time that small
> > one, so I would say that it is your design that is to blame here, not
> > KVM.
> 
> I disagree. Virtualization enables new use cases. The -initrd parameter is a very good example for that. It's something that you simply couldn't do on real hw.
> 
How is it different from starting kernel/initrd from usb flash drive?

> > 
> >>> 
> >>>> I guess the best thing for now really is to try and see which code paths insb goes along. It should really be coalesced.
> >>>> 
> >>> It is coalesced to a certain extent (reenter guest every 1024 bytes,
> >>> read from userspace page at a time). You need to continue injecting
> >>> interrupt into a guest during long string operation and checking
> >>> exception condition on a page boundaries.
> >> 
> >> That still sounds slow. So yeah, adding DMA is probably the right way to go. But then again - if we model it after real hw it would be asynchronous, giving us an interrupt, causing even more headache. Ugh.
> >> 
> >> Can't we just ignore real hw constraints here and have it available in guest ram once one particular PIO is done? No bus master, no interrupts, but full speed and simplicity/atomicity which also helps migration.
> >> 
> > We shouldn't add devices that work not like real HW to speed up some
> > pathological cases (and are slow on real HW too).
> 
> Just because you don't use them doesn't mean they're pathological, really. We simply chose a bad interface for transferring reasonable big chunks of data and we need to fix that. If you want to look at it from a different perspective, it's a regression. Older qemu versions did map the kernel and initrd directly into guest ram, so now we're slower than back then.
> 
I use them hundred time each day (at least -kernel part). If the
interface is slow for your use case I have no problem with introducing
new one, but the one that make sense in x86 architecture. I do not agree
this is regression BTW. You can't compare buggy way of doing things and
non-buggy way and say that bug fixing is a regression.

What about adding new PCI card that holds kernel initrd in ROM bar?

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  8:19                         ` Gleb Natapov
@ 2010-07-19  8:24                           ` Alexander Graf
  2010-07-19  8:30                             ` Gleb Natapov
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2010-07-19  8:24 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Richard W.M. Jones, qemu-devel


On 19.07.2010, at 10:19, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 10:08:57AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 10:01, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 19, 2010 at 09:57:02AM +0200, Alexander Graf wrote:
>>>> 
>>>> On 19.07.2010, at 09:51, Gleb Natapov wrote:
>>>> 
>>>>> On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
>>>>>> 
>>>>>> On 19.07.2010, at 09:33, Gleb Natapov wrote:
>>>>>> 
>>>>>>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
>>>>>>>> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
>>>>>>>>> That what I am warring about too. If we are adding device we have to be
>>>>>>>>> sure such device can actually exist on real hw too otherwise we may have
>>>>>>>>> problems later.
>>>>>>>> 
>>>>>>>> I don't understand why the constraints of real h/w have anything to do
>>>>>>>> with this.  Can you explain?
>>>>>>>> 
>>>>>>> Each time we do something not architectural it cause us troubles later.
>>>>>>> So constraints of real h/w is our constrains to.
>>>>>>> 
>>>>>>>>> Also 1 second on 100M file does not look like huge gain to me.
>>>>>>>> 
>>>>>>>> Every second counts.  We're trying to get libguestfs boot times down
>>>>>>>> from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
>>>>>>>> program.
>>>>>>>> 
>>>>>>> So what about making initrd smaller? I remember managing two
>>>>>>> distribution in 64M flash in embedded project.
>>>>>> 
>>>>>> Having a huge initrd basically helps in reusing a lot of existing code. We do the same - in general the initrd is just a subset of the applications of the host OS. And if you start putting perl or the likes into it, it becomes big.
>>>>>> 
>>>>> Why not provide small disk/cdrom with all those utilities installed?
>>>> 
>>>> Because - if the loading is done fast - this way everything's in RAM instantly. And you still have all devices available for use inside the system - that makes enumeration a lot easier. There are several reasons why and I don't think we should force different ways on people just because one component of our system is ineffective.
>>>> 
>>> Loading huge initrd on real HW takes noticeably longer time that small
>>> one, so I would say that it is your design that is to blame here, not
>>> KVM.
>> 
>> I disagree. Virtualization enables new use cases. The -initrd parameter is a very good example for that. It's something that you simply couldn't do on real hw.
>> 
> How is it different from starting kernel/initrd from usb flash drive?

The kernel and initrd are read directly from the host fs. It's more like a 9p grub boot.

> 
>>> 
>>>>> 
>>>>>> I guess the best thing for now really is to try and see which code paths insb goes along. It should really be coalesced.
>>>>>> 
>>>>> It is coalesced to a certain extent (reenter guest every 1024 bytes,
>>>>> read from userspace page at a time). You need to continue injecting
>>>>> interrupt into a guest during long string operation and checking
>>>>> exception condition on a page boundaries.
>>>> 
>>>> That still sounds slow. So yeah, adding DMA is probably the right way to go. But then again - if we model it after real hw it would be asynchronous, giving us an interrupt, causing even more headache. Ugh.
>>>> 
>>>> Can't we just ignore real hw constraints here and have it available in guest ram once one particular PIO is done? No bus master, no interrupts, but full speed and simplicity/atomicity which also helps migration.
>>>> 
>>> We shouldn't add devices that work not like real HW to speed up some
>>> pathological cases (and are slow on real HW too).
>> 
>> Just because you don't use them doesn't mean they're pathological, really. We simply chose a bad interface for transferring reasonable big chunks of data and we need to fix that. If you want to look at it from a different perspective, it's a regression. Older qemu versions did map the kernel and initrd directly into guest ram, so now we're slower than back then.
>> 
> I use them hundred time each day (at least -kernel part). If the
> interface is slow for your use case I have no problem with introducing
> new one, but the one that make sense in x86 architecture. I do not agree
> this is regression BTW. You can't compare buggy way of doing things and
> non-buggy way and say that bug fixing is a regression.
> 
> What about adding new PCI card that holds kernel initrd in ROM bar?

Yes and no. It sounds nice at first, but doesn't quite fit. There are two issues:

1) We need a new PCI ID
2) There can be a lot of initrd binaries with multiboot. We only have a limited amount of BARs


Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  8:24                           ` Alexander Graf
@ 2010-07-19  8:30                             ` Gleb Natapov
  2010-07-19  8:41                               ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  8:30 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Richard W.M. Jones, qemu-devel

On Mon, Jul 19, 2010 at 10:24:46AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 10:19, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 10:08:57AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 10:01, Gleb Natapov wrote:
> >> 
> >>> On Mon, Jul 19, 2010 at 09:57:02AM +0200, Alexander Graf wrote:
> >>>> 
> >>>> On 19.07.2010, at 09:51, Gleb Natapov wrote:
> >>>> 
> >>>>> On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
> >>>>>> 
> >>>>>> On 19.07.2010, at 09:33, Gleb Natapov wrote:
> >>>>>> 
> >>>>>>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> >>>>>>>> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> >>>>>>>>> That what I am warring about too. If we are adding device we have to be
> >>>>>>>>> sure such device can actually exist on real hw too otherwise we may have
> >>>>>>>>> problems later.
> >>>>>>>> 
> >>>>>>>> I don't understand why the constraints of real h/w have anything to do
> >>>>>>>> with this.  Can you explain?
> >>>>>>>> 
> >>>>>>> Each time we do something not architectural it cause us troubles later.
> >>>>>>> So constraints of real h/w is our constrains to.
> >>>>>>> 
> >>>>>>>>> Also 1 second on 100M file does not look like huge gain to me.
> >>>>>>>> 
> >>>>>>>> Every second counts.  We're trying to get libguestfs boot times down
> >>>>>>>> from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
> >>>>>>>> program.
> >>>>>>>> 
> >>>>>>> So what about making initrd smaller? I remember managing two
> >>>>>>> distribution in 64M flash in embedded project.
> >>>>>> 
> >>>>>> Having a huge initrd basically helps in reusing a lot of existing code. We do the same - in general the initrd is just a subset of the applications of the host OS. And if you start putting perl or the likes into it, it becomes big.
> >>>>>> 
> >>>>> Why not provide small disk/cdrom with all those utilities installed?
> >>>> 
> >>>> Because - if the loading is done fast - this way everything's in RAM instantly. And you still have all devices available for use inside the system - that makes enumeration a lot easier. There are several reasons why and I don't think we should force different ways on people just because one component of our system is ineffective.
> >>>> 
> >>> Loading huge initrd on real HW takes noticeably longer time that small
> >>> one, so I would say that it is your design that is to blame here, not
> >>> KVM.
> >> 
> >> I disagree. Virtualization enables new use cases. The -initrd parameter is a very good example for that. It's something that you simply couldn't do on real hw.
> >> 
> > How is it different from starting kernel/initrd from usb flash drive?
> 
> The kernel and initrd are read directly from the host fs. It's more like a 9p grub boot.
> 
There is no "host" on real HW :) But conceptually it's almost the same.
9p grub boot would be also nice. Hmm, I think PXE is closest to
-kernel/-initrd option on real HW.

> > 
> >>> 
> >>>>> 
> >>>>>> I guess the best thing for now really is to try and see which code paths insb goes along. It should really be coalesced.
> >>>>>> 
> >>>>> It is coalesced to a certain extent (reenter guest every 1024 bytes,
> >>>>> read from userspace page at a time). You need to continue injecting
> >>>>> interrupt into a guest during long string operation and checking
> >>>>> exception condition on a page boundaries.
> >>>> 
> >>>> That still sounds slow. So yeah, adding DMA is probably the right way to go. But then again - if we model it after real hw it would be asynchronous, giving us an interrupt, causing even more headache. Ugh.
> >>>> 
> >>>> Can't we just ignore real hw constraints here and have it available in guest ram once one particular PIO is done? No bus master, no interrupts, but full speed and simplicity/atomicity which also helps migration.
> >>>> 
> >>> We shouldn't add devices that work not like real HW to speed up some
> >>> pathological cases (and are slow on real HW too).
> >> 
> >> Just because you don't use them doesn't mean they're pathological, really. We simply chose a bad interface for transferring reasonable big chunks of data and we need to fix that. If you want to look at it from a different perspective, it's a regression. Older qemu versions did map the kernel and initrd directly into guest ram, so now we're slower than back then.
> >> 
> > I use them hundred time each day (at least -kernel part). If the
> > interface is slow for your use case I have no problem with introducing
> > new one, but the one that make sense in x86 architecture. I do not agree
> > this is regression BTW. You can't compare buggy way of doing things and
> > non-buggy way and say that bug fixing is a regression.
> > 
> > What about adding new PCI card that holds kernel initrd in ROM bar?
> 
> Yes and no. It sounds nice at first, but doesn't quite fit. There are two issues:
> 
> 1) We need a new PCI ID
We have our range. We can allocate from there.

> 2) There can be a lot of initrd binaries with multiboot. We only have a limited amount of BARs
> 
Is it supported now with fw_cfg interface? My main concern with this
approach is huge BAR size that may take a lot of space from PCI MMIO range
if guest OS decide to configure it.

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  7:55                 ` Gleb Natapov
@ 2010-07-19  8:34                   ` Richard W.M. Jones
  2010-07-19  8:40                     ` Gleb Natapov
  0 siblings, 1 reply; 56+ messages in thread
From: Richard W.M. Jones @ 2010-07-19  8:34 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Alexander Graf, qemu-devel

On Mon, Jul 19, 2010 at 10:55:33AM +0300, Gleb Natapov wrote:
> Why not put then on cdrom or disk?

It simplifies device and mountpoint enumeration not to have a separate
disk.  It would also mean we couldn't use standard Fedora paths, or
we'd have to have bind-mount /bin etc on to the disk mount point,
which again complicates things.

Anyway, what we're talking about here is a problem in qemu.  How is
making initrd loading faster not a benefit for everyone?  Every boot
has to load an initrd of some size, so making that operation faster
benefits every user, even if individually only by a small amount.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  8:34                   ` Richard W.M. Jones
@ 2010-07-19  8:40                     ` Gleb Natapov
  2010-07-19  9:00                       ` Richard W.M. Jones
  0 siblings, 1 reply; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  8:40 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Alexander Graf, qemu-devel

On Mon, Jul 19, 2010 at 09:34:11AM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 19, 2010 at 10:55:33AM +0300, Gleb Natapov wrote:
> > Why not put then on cdrom or disk?
> 
> It simplifies device and mountpoint enumeration not to have a separate
> disk.  It would also mean we couldn't use standard Fedora paths, or
> we'd have to have bind-mount /bin etc on to the disk mount point,
> which again complicates things.
> 
Can't help you here, but if it's doable you can speedup your startup
time much more then by a second.

> Anyway, what we're talking about here is a problem in qemu.  How is
The problem is that you want to speed up your application. There is more
then one solution to the problem. If you come up with reasonable
solution in qemu that it OK. 

> making initrd loading faster not a benefit for everyone?  Every boot
> has to load an initrd of some size, so making that operation faster
> benefits every user, even if individually only by a small amount.
> 
Most users load initrd from a disk not by -initrd option.

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  8:30                             ` Gleb Natapov
@ 2010-07-19  8:41                               ` Alexander Graf
  2010-07-19  8:48                                 ` Gleb Natapov
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2010-07-19  8:41 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Richard W.M. Jones, qemu-devel


On 19.07.2010, at 10:30, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 10:24:46AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 10:19, Gleb Natapov wrote:
>> 
>> Yes and no. It sounds nice at first, but doesn't quite fit. There are two issues:
>> 
>> 1) We need a new PCI ID
> We have our range. We can allocate from there.
> 
>> 2) There can be a lot of initrd binaries with multiboot. We only have a limited amount of BARs
>> 
> Is it supported now with fw_cfg interface? My main concern with this
> approach is huge BAR size that may take a lot of space from PCI MMIO range
> if guest OS decide to configure it.

Oh, right. I think I combined all the modules into the INITRD blob. Yeah, that would work. Is coalesced MMIO more efficient than coalesced PIO? Or do we have to do some RAM mapping for those special BAR regions?

Were there DMA capable devices back in ISA times? There must be. If so, we can just take a look at what they do and do it similarly. Bus mastering was a new thing for PCI, right?


Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  8:41                               ` Alexander Graf
@ 2010-07-19  8:48                                 ` Gleb Natapov
  2010-07-19  8:54                                   ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  8:48 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Richard W.M. Jones, qemu-devel

On Mon, Jul 19, 2010 at 10:41:48AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 10:30, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 10:24:46AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 10:19, Gleb Natapov wrote:
> >> 
> >> Yes and no. It sounds nice at first, but doesn't quite fit. There are two issues:
> >> 
> >> 1) We need a new PCI ID
> > We have our range. We can allocate from there.
> > 
> >> 2) There can be a lot of initrd binaries with multiboot. We only have a limited amount of BARs
> >> 
> > Is it supported now with fw_cfg interface? My main concern with this
> > approach is huge BAR size that may take a lot of space from PCI MMIO range
> > if guest OS decide to configure it.
> 
> Oh, right. I think I combined all the modules into the INITRD blob. Yeah, that would work. Is coalesced MMIO more efficient than coalesced PIO? Or do we have to do some RAM mapping for those special BAR regions?
> 
I think we will have to do RAM mapping. Otherwise it may be slow to.
Coalesced MMIO is for write not read IIRC.

> Were there DMA capable devices back in ISA times? There must be. If so, we can just take a look at what they do and do it similarly. Bus mastering was a new thing for PCI, right?
> 
I think IDE can be considered DMA capable ISA device, no? At least
it works by writing to PIO ports and getting result into memory, but
with interrupts and status bits and everything that real device should
have. On board DMA engine is also ISA device. 

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  8:48                                 ` Gleb Natapov
@ 2010-07-19  8:54                                   ` Alexander Graf
  2010-07-19  9:00                                     ` Gleb Natapov
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2010-07-19  8:54 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Richard W.M. Jones, qemu-devel


On 19.07.2010, at 10:48, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 10:41:48AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 10:30, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 19, 2010 at 10:24:46AM +0200, Alexander Graf wrote:
>>>> 
>>>> On 19.07.2010, at 10:19, Gleb Natapov wrote:
>>>> 
>>>> Yes and no. It sounds nice at first, but doesn't quite fit. There are two issues:
>>>> 
>>>> 1) We need a new PCI ID
>>> We have our range. We can allocate from there.
>>> 
>>>> 2) There can be a lot of initrd binaries with multiboot. We only have a limited amount of BARs
>>>> 
>>> Is it supported now with fw_cfg interface? My main concern with this
>>> approach is huge BAR size that may take a lot of space from PCI MMIO range
>>> if guest OS decide to configure it.
>> 
>> Oh, right. I think I combined all the modules into the INITRD blob. Yeah, that would work. Is coalesced MMIO more efficient than coalesced PIO? Or do we have to do some RAM mapping for those special BAR regions?
>> 
> I think we will have to do RAM mapping. Otherwise it may be slow to.
> Coalesced MMIO is for write not read IIRC.

Oh, right. Makes sense.

> 
>> Were there DMA capable devices back in ISA times? There must be. If so, we can just take a look at what they do and do it similarly. Bus mastering was a new thing for PCI, right?
>> 
> I think IDE can be considered DMA capable ISA device, no? At least
> it works by writing to PIO ports and getting result into memory, but
> with interrupts and status bits and everything that real device should
> have. On board DMA engine is also ISA device. 

We could define our device to be polling. So all we need is a status bit that the guest sets when it starts the DMA and the device unsets when the DMA is done. In our case that should be immediate, because the PIO invokes the full code paths, but it would look more like a real device, no?

outb(PORT_DMA_CTL, FWCFG_DMA_ENABLE);
while (inb(PORT_DMA_CTL) & FWCFG_DMA_ENABLE) {
    /* DMA going on */
}
/* DMA done */


Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  8:40                     ` Gleb Natapov
@ 2010-07-19  9:00                       ` Richard W.M. Jones
  2010-07-19  9:04                         ` Richard W.M. Jones
  2010-07-19  9:06                         ` Gleb Natapov
  0 siblings, 2 replies; 56+ messages in thread
From: Richard W.M. Jones @ 2010-07-19  9:00 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Alexander Graf, qemu-devel

On Mon, Jul 19, 2010 at 11:40:41AM +0300, Gleb Natapov wrote:
> On Mon, Jul 19, 2010 at 09:34:11AM +0100, Richard W.M. Jones wrote:
> > On Mon, Jul 19, 2010 at 10:55:33AM +0300, Gleb Natapov wrote:
> > > Why not put then on cdrom or disk?
> > 
> > It simplifies device and mountpoint enumeration not to have a separate
> > disk.  It would also mean we couldn't use standard Fedora paths, or
> > we'd have to have bind-mount /bin etc on to the disk mount point,
> > which again complicates things.
> > 
> Can't help you here, but if it's doable you can speedup your startup
> time much more then by a second.

This isn't true.

The most we could save is 0.8 seconds [time taken to load the 100MB
initrd by the kernel] less the time taken to probe and mount a CD ISO
[0.2 seconds - measured using guestfish] less the time taken to load
programs from this CD.  So the most we could save would be 0.6
seconds, and in reality it'd be less than this if we actually loaded
and ran any programs from the CD at all.

My patch saves 1 second, and all the programs are in RAM.

> Most users load initrd from a disk not by -initrd option.

It's unusual, but on my production webserver I use -kernel and -initrd
options explicitly.  That's because I want all my VMs to share a
single kernel.

virt-install is another program that uses explicit -initrd.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  8:54                                   ` Alexander Graf
@ 2010-07-19  9:00                                     ` Gleb Natapov
  2010-07-19  9:02                                       ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  9:00 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Richard W.M. Jones, qemu-devel

On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 10:48, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 10:41:48AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 10:30, Gleb Natapov wrote:
> >> 
> >>> On Mon, Jul 19, 2010 at 10:24:46AM +0200, Alexander Graf wrote:
> >>>> 
> >>>> On 19.07.2010, at 10:19, Gleb Natapov wrote:
> >>>> 
> >>>> Yes and no. It sounds nice at first, but doesn't quite fit. There are two issues:
> >>>> 
> >>>> 1) We need a new PCI ID
> >>> We have our range. We can allocate from there.
> >>> 
> >>>> 2) There can be a lot of initrd binaries with multiboot. We only have a limited amount of BARs
> >>>> 
> >>> Is it supported now with fw_cfg interface? My main concern with this
> >>> approach is huge BAR size that may take a lot of space from PCI MMIO range
> >>> if guest OS decide to configure it.
> >> 
> >> Oh, right. I think I combined all the modules into the INITRD blob. Yeah, that would work. Is coalesced MMIO more efficient than coalesced PIO? Or do we have to do some RAM mapping for those special BAR regions?
> >> 
> > I think we will have to do RAM mapping. Otherwise it may be slow to.
> > Coalesced MMIO is for write not read IIRC.
> 
> Oh, right. Makes sense.
> 
> > 
> >> Were there DMA capable devices back in ISA times? There must be. If so, we can just take a look at what they do and do it similarly. Bus mastering was a new thing for PCI, right?
> >> 
> > I think IDE can be considered DMA capable ISA device, no? At least
> > it works by writing to PIO ports and getting result into memory, but
> > with interrupts and status bits and everything that real device should
> > have. On board DMA engine is also ISA device. 
> 
> We could define our device to be polling. So all we need is a status bit that the guest sets when it starts the DMA and the device unsets when the DMA is done. In our case that should be immediate, because the PIO invokes the full code paths, but it would look more like a real device, no?
> 
This is better, but it shouldn't be synchronous. Kernel and initrd are
on disk so why not setup aio and read them from io thread allowing vcpu
thread immediately return to guest mode to process interrupts. Or why
not use virtio-serial while we are at it? After all virtio-serial is
there to allow host and guest communication.

> outb(PORT_DMA_CTL, FWCFG_DMA_ENABLE);
> while (inb(PORT_DMA_CTL) & FWCFG_DMA_ENABLE) {
>     /* DMA going on */
> }
> /* DMA done */
> 
> 
> Alex

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  9:00                                     ` Gleb Natapov
@ 2010-07-19  9:02                                       ` Alexander Graf
  2010-07-19  9:10                                         ` Gleb Natapov
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2010-07-19  9:02 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Richard W.M. Jones, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1572 bytes --]


On 19.07.2010, at 11:00, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 10:48, Gleb Natapov wrote:
>> 
>>> 
>>>> Were there DMA capable devices back in ISA times? There must be. If so, we can just take a look at what they do and do it similarly. Bus mastering was a new thing for PCI, right?
>>>> 
>>> I think IDE can be considered DMA capable ISA device, no? At least
>>> it works by writing to PIO ports and getting result into memory, but
>>> with interrupts and status bits and everything that real device should
>>> have. On board DMA engine is also ISA device. 
>> 
>> We could define our device to be polling. So all we need is a status bit that the guest sets when it starts the DMA and the device unsets when the DMA is done. In our case that should be immediate, because the PIO invokes the full code paths, but it would look more like a real device, no?
>> 
> This is better, but it shouldn't be synchronous. Kernel and initrd are
> on disk so why not setup aio and read them from io thread allowing vcpu
> thread immediately return to guest mode to process interrupts.

That would work with the above described device model. If we're going synchronous or asynchronous would become an implementation detail.

> Or why
> not use virtio-serial while we are at it? After all virtio-serial is
> there to allow host and guest communication.

Because virtio-serial needs us to set up the full virtio-pci stack. That's too much to mess with in an option rom IMHO.

Alex


[-- Attachment #2: Type: text/html, Size: 2829 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  9:00                       ` Richard W.M. Jones
@ 2010-07-19  9:04                         ` Richard W.M. Jones
  2010-07-19  9:06                         ` Gleb Natapov
  1 sibling, 0 replies; 56+ messages in thread
From: Richard W.M. Jones @ 2010-07-19  9:04 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Alexander Graf, qemu-devel

On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote:
[...]

OK, it's early in the morning and I can't do maths.  But we're still
asking a big increase in complexity versus optimizing something which
is just slow in qemu at the moment.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://et.redhat.com/~rjones/virt-top

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  9:00                       ` Richard W.M. Jones
  2010-07-19  9:04                         ` Richard W.M. Jones
@ 2010-07-19  9:06                         ` Gleb Natapov
  2010-07-19  9:09                           ` Alexander Graf
  1 sibling, 1 reply; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  9:06 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Alexander Graf, qemu-devel

On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 19, 2010 at 11:40:41AM +0300, Gleb Natapov wrote:
> > On Mon, Jul 19, 2010 at 09:34:11AM +0100, Richard W.M. Jones wrote:
> > > On Mon, Jul 19, 2010 at 10:55:33AM +0300, Gleb Natapov wrote:
> > > > Why not put then on cdrom or disk?
> > > 
> > > It simplifies device and mountpoint enumeration not to have a separate
> > > disk.  It would also mean we couldn't use standard Fedora paths, or
> > > we'd have to have bind-mount /bin etc on to the disk mount point,
> > > which again complicates things.
> > > 
> > Can't help you here, but if it's doable you can speedup your startup
> > time much more then by a second.
> 
> This isn't true.
> 
> The most we could save is 0.8 seconds [time taken to load the 100MB
> initrd by the kernel] less the time taken to probe and mount a CD ISO
But you do not need all 100MB of application, so with disk approach
you load things you need on demand.

> [0.2 seconds - measured using guestfish] less the time taken to load
> programs from this CD.  So the most we could save would be 0.6
> seconds, and in reality it'd be less than this if we actually loaded
> and ran any programs from the CD at all.
> 
> My patch saves 1 second, and all the programs are in RAM.
> 
And it will take 100M of a host ram.

> > Most users load initrd from a disk not by -initrd option.
> 
> It's unusual, but on my production webserver I use -kernel and -initrd
> options explicitly.  That's because I want all my VMs to share a
> single kernel.
> 
How often you restart them?

> virt-install is another program that uses explicit -initrd.
> 
Installation takes a lot of time. Saving 1 second there will not be
noticeable. And during lifetime of installed VM initrd will be loaded
from its disk.

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  9:06                         ` Gleb Natapov
@ 2010-07-19  9:09                           ` Alexander Graf
  2010-07-19  9:15                             ` Gleb Natapov
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2010-07-19  9:09 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Richard W.M. Jones, qemu-devel


On 19.07.2010, at 11:06, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote:
>> 
>> virt-install is another program that uses explicit -initrd.
>> 
> Installation takes a lot of time. Saving 1 second there will not be
> noticeable. And during lifetime of installed VM initrd will be loaded
> from its disk.

Guys, please. It shouldn't be one or the other. Let's make sure both ways of doing things are fast. That's what users want: fast.

Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  9:02                                       ` Alexander Graf
@ 2010-07-19  9:10                                         ` Gleb Natapov
  2010-07-19  9:13                                           ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  9:10 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Richard W.M. Jones, qemu-devel

On Mon, Jul 19, 2010 at 11:02:54AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 11:00, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 10:48, Gleb Natapov wrote:
> >> 
> >>> 
> >>>> Were there DMA capable devices back in ISA times? There must be. If so, we can just take a look at what they do and do it similarly. Bus mastering was a new thing for PCI, right?
> >>>> 
> >>> I think IDE can be considered DMA capable ISA device, no? At least
> >>> it works by writing to PIO ports and getting result into memory, but
> >>> with interrupts and status bits and everything that real device should
> >>> have. On board DMA engine is also ISA device. 
> >> 
> >> We could define our device to be polling. So all we need is a status bit that the guest sets when it starts the DMA and the device unsets when the DMA is done. In our case that should be immediate, because the PIO invokes the full code paths, but it would look more like a real device, no?
> >> 
> > This is better, but it shouldn't be synchronous. Kernel and initrd are
> > on disk so why not setup aio and read them from io thread allowing vcpu
> > thread immediately return to guest mode to process interrupts.
> 
> That would work with the above described device model. If we're going synchronous or asynchronous would become an implementation detail.
> 
If vcpu thread will sleep for too much time without processing events we can see strange timeouts in a guest.

> > Or why
> > not use virtio-serial while we are at it? After all virtio-serial is
> > there to allow host and guest communication.
> 
> Because virtio-serial needs us to set up the full virtio-pci stack. That's too much to mess with in an option rom IMHO.
> 
We already do it for virtio-blk. Read only support is very small in
LOC there. Don't know about virtio-serial protocol.

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  9:10                                         ` Gleb Natapov
@ 2010-07-19  9:13                                           ` Alexander Graf
  2010-07-19  9:19                                             ` Gleb Natapov
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2010-07-19  9:13 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Richard W.M. Jones, qemu-devel


On 19.07.2010, at 11:10, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 11:02:54AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 11:00, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote:
>>>> 
>>>> On 19.07.2010, at 10:48, Gleb Natapov wrote:
>>>> 
>>>>> 
>>>>>> Were there DMA capable devices back in ISA times? There must be. If so, we can just take a look at what they do and do it similarly. Bus mastering was a new thing for PCI, right?
>>>>>> 
>>>>> I think IDE can be considered DMA capable ISA device, no? At least
>>>>> it works by writing to PIO ports and getting result into memory, but
>>>>> with interrupts and status bits and everything that real device should
>>>>> have. On board DMA engine is also ISA device. 
>>>> 
>>>> We could define our device to be polling. So all we need is a status bit that the guest sets when it starts the DMA and the device unsets when the DMA is done. In our case that should be immediate, because the PIO invokes the full code paths, but it would look more like a real device, no?
>>>> 
>>> This is better, but it shouldn't be synchronous. Kernel and initrd are
>>> on disk so why not setup aio and read them from io thread allowing vcpu
>>> thread immediately return to guest mode to process interrupts.
>> 
>> That would work with the above described device model. If we're going synchronous or asynchronous would become an implementation detail.
>> 
> If vcpu thread will sleep for too much time without processing events we can see strange timeouts in a guest.

I don't think I understand what you mean?

> 
>>> Or why
>>> not use virtio-serial while we are at it? After all virtio-serial is
>>> there to allow host and guest communication.
>> 
>> Because virtio-serial needs us to set up the full virtio-pci stack. That's too much to mess with in an option rom IMHO.
>> 
> We already do it for virtio-blk. Read only support is very small in
> LOC there. Don't know about virtio-serial protocol.

The virtio-blk model uses the whole pxe framework. For our in-tree option roms we're trying to be simple. And I'd like to keep it that way. I really don't want to add PCI enumeration and BAR setup to that code.


Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  9:09                           ` Alexander Graf
@ 2010-07-19  9:15                             ` Gleb Natapov
  2010-07-19  9:16                               ` Alexander Graf
                                                 ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  9:15 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Richard W.M. Jones, qemu-devel

On Mon, Jul 19, 2010 at 11:09:13AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 11:06, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote:
> >> 
> >> virt-install is another program that uses explicit -initrd.
> >> 
> > Installation takes a lot of time. Saving 1 second there will not be
> > noticeable. And during lifetime of installed VM initrd will be loaded
> > from its disk.
> 
> Guys, please. It shouldn't be one or the other. Let's make sure both ways of doing things are fast. That's what users want: fast.
> 
That what we are talking about, no? We are trying to find faster way to
load kernel/initrd and stay architectural. Honestly I would expect much
greater speedup from Richard's approach like 2 seconds vs 8 seconds. It
is hard to justify code complication just for 1 second speedup.

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  9:15                             ` Gleb Natapov
@ 2010-07-19  9:16                               ` Alexander Graf
  2010-07-19 13:06                               ` Richard W.M. Jones
  2010-07-19 14:52                               ` Anthony Liguori
  2 siblings, 0 replies; 56+ messages in thread
From: Alexander Graf @ 2010-07-19  9:16 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Richard W.M. Jones, qemu-devel


On 19.07.2010, at 11:15, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 11:09:13AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 11:06, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote:
>>>> 
>>>> virt-install is another program that uses explicit -initrd.
>>>> 
>>> Installation takes a lot of time. Saving 1 second there will not be
>>> noticeable. And during lifetime of installed VM initrd will be loaded
>>> from its disk.
>> 
>> Guys, please. It shouldn't be one or the other. Let's make sure both ways of doing things are fast. That's what users want: fast.
>> 
> That what we are talking about, no? We are trying to find faster way to
> load kernel/initrd and stay architectural. Honestly I would expect much
> greater speedup from Richard's approach like 2 seconds vs 8 seconds. It
> is hard to justify code complication just for 1 second speedup.

I agree. Hence I'd like to keep the complication as simple as possible.

Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  7:40               ` Alexander Graf
  2010-07-19  7:51                 ` Gleb Natapov
@ 2010-07-19  9:19                 ` Richard W.M. Jones
  1 sibling, 0 replies; 56+ messages in thread
From: Richard W.M. Jones @ 2010-07-19  9:19 UTC (permalink / raw)
  To: Alexander Graf; +Cc: qemu-devel, Gleb Natapov

[-- Attachment #1: Type: text/plain, Size: 950 bytes --]

On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
> Richard, what does kvm_stat tell you while loading the initrd? Are
> there a lot of PIO requests or are we simply looping inside qemu
> code?

The two attached files were made by running kvm_stat -l > /tmp/...
during a single run starting libguestfs.  This use of kvm_stat is as
described in Chris's blog entry here:
http://clalance.blogspot.com/2009/01/kvm-performance-tools.html

The first attachment ('no-patch') is without the proposed patch.

The second attachment ('with-patch') is with the proposed patch.

It seems some numbers such as #vmexits are lower with the proposed
patch, although not by a very much.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v

[-- Attachment #2: no-patch --]
[-- Type: text/plain, Size: 4224 bytes --]

 efer_relo      exits  fpu_reloa  halt_exit  halt_wake  host_stat  hypercall  insn_emul  insn_emul     invlpg   io_exits  irq_exits  irq_injec  irq_windo  largepage  mmio_exit  mmu_cache  mmu_flood  mmu_pde_z  mmu_pte_u  mmu_pte_w  mmu_recyc  mmu_shado  mmu_unsyn  nmi_injec  nmi_windo   pf_fixed   pf_guest  remote_tl  request_i  signal_ex  tlb_flush
         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0
         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0
         0     147464      10991          0          0      75176          0      70151          0          0      75122        106          8         11          0          7         49          0          0          0      56360          0         94          0          0          0        390          0          0          0         39          0
         0      21165          0          0          0      21159          0      21139          0          0      21151         18          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0         16          0
         0     108370       1024          0          0      57189          0       9936          0       8674      55697        342        108        209          0       1014       2828        610        180          0        776          0       2754          0          0          0      24981          0          0          0        286      10996
         0      78421          0          0          0      16362          0       1828          0          0      12189       2136       1073       1371          0       1827         93          0          1          0          1          0          0          2          0          0      46659          0          0          0       2041          1
         0      52784        611          8          0      13498          0       2722          0       1084      10562       1236        769       1106          0       1317        769        276       1138          0       1405          0        630         42          0          0      31694       4779          0          0       1227       2799
         0     122866       7220          0          0      11116          0      11677          0       7619       6596       1655       1137       1560          0       2035       5133       1956       7687          0       9642          0       4375        472          0          0      62778      32417          0          0       1976      19957
         0      48502      19852          9          0      22262          0       3173          0       1607      19781        802       1199       1700          0        799       1063        452       1918          0       2370          0       1125         49          0          0      14706       7666          0          0        642       4795
         0    -579572     -39698        -17          0    -216762          0    -120626          0     -18984    -201098      -6295      -4294      -5957          0      -6999      -9935      -3294     -10924          0     -70554          0      -8978       -565          0          0    -181208     -44862          0          0      -6227     -38548
         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0

[-- Attachment #3: with-patch --]
[-- Type: text/plain, Size: 5632 bytes --]

 efer_relo      exits  fpu_reloa  halt_exit  halt_wake  host_stat  hypercall  insn_emul  insn_emul     invlpg   io_exits  irq_exits  irq_injec  irq_windo  largepage  mmio_exit  mmu_cache  mmu_flood  mmu_pde_z  mmu_pte_u  mmu_pte_w  mmu_recyc  mmu_shado  mmu_unsyn  nmi_injec  nmi_windo   pf_fixed   pf_guest  remote_tl  request_i  signal_ex  tlb_flush
         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0
         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0
         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0
         0     153571      12053          0          0      75788          0      70663          0          0      74676        186          9         13          0       1007         67          0          0          0      56360          0         94          0          0          0       5819          0          0          0         93          0
         0     130357          1          0          0      62092          0       1558          0       8674      59699       1388        650        886          0        781       2827        610        181          0        777          0       2754          2          0          0      27246          0          0          0       1299      10997
         0      69127          0          1          0       7362          0       1983          0          0       2960       2002        993       1344          0       1983        118          0          0          0          0          0          0          1          0          0      60866          1          0          0       2065          0
         0      85260       3696          7          0      13338          0       7089          0       4826      10226       1020        847       1228          0       1344       3583       1156       4584          0       5730          0       2560        281          0          0      40829      20586          0          0       1298      13181
         0      70517       4555          7          0       6803          0       7038          0       4415       4066        979        671        913          0       1225       2499       1174       4654          0       5828          0       2786        237          0          0      35782      18756          0          0       1207      10910
         0      38824      20237          6          0      21919          0       2185          0        797      19687        870       1120       1597          0        656        713        294       1231          0       1525          0        741          9          0          0       9360       4779          0          0        543       3089
         0    -547656     -40542        -21          0    -187302          0     -90516          0     -18712    -171314      -6445      -4290      -5981          0      -6996      -9807      -3234     -10650          0     -70220          0      -8935       -530          0          0    -179902     -44122          0          0      -6505     -38177
         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0
         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0
         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0
         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0
         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  9:13                                           ` Alexander Graf
@ 2010-07-19  9:19                                             ` Gleb Natapov
  2010-07-19  9:21                                               ` Alexander Graf
  2010-07-19  9:23                                               ` Richard W.M. Jones
  0 siblings, 2 replies; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  9:19 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Richard W.M. Jones, qemu-devel

On Mon, Jul 19, 2010 at 11:13:38AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 11:10, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 11:02:54AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 11:00, Gleb Natapov wrote:
> >> 
> >>> On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote:
> >>>> 
> >>>> On 19.07.2010, at 10:48, Gleb Natapov wrote:
> >>>> 
> >>>>> 
> >>>>>> Were there DMA capable devices back in ISA times? There must be. If so, we can just take a look at what they do and do it similarly. Bus mastering was a new thing for PCI, right?
> >>>>>> 
> >>>>> I think IDE can be considered DMA capable ISA device, no? At least
> >>>>> it works by writing to PIO ports and getting result into memory, but
> >>>>> with interrupts and status bits and everything that real device should
> >>>>> have. On board DMA engine is also ISA device. 
> >>>> 
> >>>> We could define our device to be polling. So all we need is a status bit that the guest sets when it starts the DMA and the device unsets when the DMA is done. In our case that should be immediate, because the PIO invokes the full code paths, but it would look more like a real device, no?
> >>>> 
> >>> This is better, but it shouldn't be synchronous. Kernel and initrd are
> >>> on disk so why not setup aio and read them from io thread allowing vcpu
> >>> thread immediately return to guest mode to process interrupts.
> >> 
> >> That would work with the above described device model. If we're going synchronous or asynchronous would become an implementation detail.
> >> 
> > If vcpu thread will sleep for too much time without processing events we can see strange timeouts in a guest.
> 
> I don't think I understand what you mean?
> 
Vcpu executes "in %ax". Next instruction is executed 6 seconds later.
All timers that should have been processed during this time fire at the
same moment triggering all kind of timeouts. Think about watchdog that
should be written into every two seconds otherwise it does reset.

> > 
> >>> Or why
> >>> not use virtio-serial while we are at it? After all virtio-serial is
> >>> there to allow host and guest communication.
> >> 
> >> Because virtio-serial needs us to set up the full virtio-pci stack. That's too much to mess with in an option rom IMHO.
> >> 
> > We already do it for virtio-blk. Read only support is very small in
> > LOC there. Don't know about virtio-serial protocol.
> 
> The virtio-blk model uses the whole pxe framework. For our in-tree option roms we're trying to be simple. And I'd like to keep it that way. I really don't want to add PCI enumeration and BAR setup to that code.
> 
The virtio-blk is entirely in seabios and does not use pxe at all!

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  9:19                                             ` Gleb Natapov
@ 2010-07-19  9:21                                               ` Alexander Graf
  2010-07-19  9:32                                                 ` Gleb Natapov
  2010-07-19  9:23                                               ` Richard W.M. Jones
  1 sibling, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2010-07-19  9:21 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Richard W.M. Jones, qemu-devel


On 19.07.2010, at 11:19, Gleb Natapov wrote:

> On Mon, Jul 19, 2010 at 11:13:38AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2010, at 11:10, Gleb Natapov wrote:
>> 
>>> On Mon, Jul 19, 2010 at 11:02:54AM +0200, Alexander Graf wrote:
>>>> 
>>>> On 19.07.2010, at 11:00, Gleb Natapov wrote:
>>>> 
>>>>> On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote:
>>>>>> 
>>>>>> On 19.07.2010, at 10:48, Gleb Natapov wrote:
>>>>>> 
>>>>>>> 
>>>>>>>> Were there DMA capable devices back in ISA times? There must be. If so, we can just take a look at what they do and do it similarly. Bus mastering was a new thing for PCI, right?
>>>>>>>> 
>>>>>>> I think IDE can be considered DMA capable ISA device, no? At least
>>>>>>> it works by writing to PIO ports and getting result into memory, but
>>>>>>> with interrupts and status bits and everything that real device should
>>>>>>> have. On board DMA engine is also ISA device. 
>>>>>> 
>>>>>> We could define our device to be polling. So all we need is a status bit that the guest sets when it starts the DMA and the device unsets when the DMA is done. In our case that should be immediate, because the PIO invokes the full code paths, but it would look more like a real device, no?
>>>>>> 
>>>>> This is better, but it shouldn't be synchronous. Kernel and initrd are
>>>>> on disk so why not setup aio and read them from io thread allowing vcpu
>>>>> thread immediately return to guest mode to process interrupts.
>>>> 
>>>> That would work with the above described device model. If we're going synchronous or asynchronous would become an implementation detail.
>>>> 
>>> If vcpu thread will sleep for too much time without processing events we can see strange timeouts in a guest.
>> 
>> I don't think I understand what you mean?
>> 
> Vcpu executes "in %ax". Next instruction is executed 6 seconds later.
> All timers that should have been processed during this time fire at the
> same moment triggering all kind of timeouts. Think about watchdog that
> should be written into every two seconds otherwise it does reset.

That's a hypervisor implementation detail! If we want to go synchronously, we do. If something breaks, we don't. Doing it synchronously simpllifies things a lot. And we're talking about a device that's only used before the OS kicks in. There's no use in pretending we're running a watchdog there.

> 
>>> 
>>>>> Or why
>>>>> not use virtio-serial while we are at it? After all virtio-serial is
>>>>> there to allow host and guest communication.
>>>> 
>>>> Because virtio-serial needs us to set up the full virtio-pci stack. That's too much to mess with in an option rom IMHO.
>>>> 
>>> We already do it for virtio-blk. Read only support is very small in
>>> LOC there. Don't know about virtio-serial protocol.
>> 
>> The virtio-blk model uses the whole pxe framework. For our in-tree option roms we're trying to be simple. And I'd like to keep it that way. I really don't want to add PCI enumeration and BAR setup to that code.
>> 
> The virtio-blk is entirely in seabios and does not use pxe at all!

So it uses even more framework :). The linuxboot stuff is completely separate in its very own option rom.

Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  9:19                                             ` Gleb Natapov
  2010-07-19  9:21                                               ` Alexander Graf
@ 2010-07-19  9:23                                               ` Richard W.M. Jones
  1 sibling, 0 replies; 56+ messages in thread
From: Richard W.M. Jones @ 2010-07-19  9:23 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Alexander Graf, qemu-devel

On Mon, Jul 19, 2010 at 12:19:22PM +0300, Gleb Natapov wrote:
> Vcpu executes "in %ax". Next instruction is executed 6 seconds later.
> All timers that should have been processed during this time fire at the
> same moment triggering all kind of timeouts. Think about watchdog that
> should be written into every two seconds otherwise it does reset.

This particular code runs very early in boot, and the atomic copy
operation is very quick even with a 100MB initrd.

But the question I think should be: If a guest maliciously (after
boot) tried to use this mechanism, could it do harm to the host?  Or
would it just harm itself?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://et.redhat.com/~rjones/libguestfs/
See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  9:21                                               ` Alexander Graf
@ 2010-07-19  9:32                                                 ` Gleb Natapov
  0 siblings, 0 replies; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19  9:32 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Richard W.M. Jones, qemu-devel

On Mon, Jul 19, 2010 at 11:21:34AM +0200, Alexander Graf wrote:
> 
> On 19.07.2010, at 11:19, Gleb Natapov wrote:
> 
> > On Mon, Jul 19, 2010 at 11:13:38AM +0200, Alexander Graf wrote:
> >> 
> >> On 19.07.2010, at 11:10, Gleb Natapov wrote:
> >> 
> >>> On Mon, Jul 19, 2010 at 11:02:54AM +0200, Alexander Graf wrote:
> >>>> 
> >>>> On 19.07.2010, at 11:00, Gleb Natapov wrote:
> >>>> 
> >>>>> On Mon, Jul 19, 2010 at 10:54:43AM +0200, Alexander Graf wrote:
> >>>>>> 
> >>>>>> On 19.07.2010, at 10:48, Gleb Natapov wrote:
> >>>>>> 
> >>>>>>> 
> >>>>>>>> Were there DMA capable devices back in ISA times? There must be. If so, we can just take a look at what they do and do it similarly. Bus mastering was a new thing for PCI, right?
> >>>>>>>> 
> >>>>>>> I think IDE can be considered DMA capable ISA device, no? At least
> >>>>>>> it works by writing to PIO ports and getting result into memory, but
> >>>>>>> with interrupts and status bits and everything that real device should
> >>>>>>> have. On board DMA engine is also ISA device. 
> >>>>>> 
> >>>>>> We could define our device to be polling. So all we need is a status bit that the guest sets when it starts the DMA and the device unsets when the DMA is done. In our case that should be immediate, because the PIO invokes the full code paths, but it would look more like a real device, no?
> >>>>>> 
> >>>>> This is better, but it shouldn't be synchronous. Kernel and initrd are
> >>>>> on disk so why not setup aio and read them from io thread allowing vcpu
> >>>>> thread immediately return to guest mode to process interrupts.
> >>>> 
> >>>> That would work with the above described device model. If we're going synchronous or asynchronous would become an implementation detail.
> >>>> 
> >>> If vcpu thread will sleep for too much time without processing events we can see strange timeouts in a guest.
> >> 
> >> I don't think I understand what you mean?
> >> 
> > Vcpu executes "in %ax". Next instruction is executed 6 seconds later.
> > All timers that should have been processed during this time fire at the
> > same moment triggering all kind of timeouts. Think about watchdog that
> > should be written into every two seconds otherwise it does reset.
> 
> That's a hypervisor implementation detail! If we want to go synchronously, we do. If something breaks, we don't.
It is. And it is a bug in the interface that we knowingly introduce. Do
we want to do that?

> Doing it synchronously simpllifies things a lot. And we're talking about a device that's only used before the OS kicks in. There's no use in pretending we're running a watchdog there.
On sane (embedded) HW that uses watchdog firmware tickle it too.
We do not want to stuck in firmware. Actually sane watchdog can't be
stopped after it is started. I see a compelling use case for watchdog
support in seabios.

And Seabios nowadays is complicated an runs a lot of code that use
interrupts.

> 
> > 
> >>> 
> >>>>> Or why
> >>>>> not use virtio-serial while we are at it? After all virtio-serial is
> >>>>> there to allow host and guest communication.
> >>>> 
> >>>> Because virtio-serial needs us to set up the full virtio-pci stack. That's too much to mess with in an option rom IMHO.
> >>>> 
> >>> We already do it for virtio-blk. Read only support is very small in
> >>> LOC there. Don't know about virtio-serial protocol.
> >> 
> >> The virtio-blk model uses the whole pxe framework. For our in-tree option roms we're trying to be simple. And I'd like to keep it that way. I really don't want to add PCI enumeration and BAR setup to that code.
> >> 
> > The virtio-blk is entirely in seabios and does not use pxe at all!
> 
> So it uses even more framework :). The linuxboot stuff is completely separate in its very own option rom.
> 
How this option rom is loaded?

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  9:15                             ` Gleb Natapov
  2010-07-19  9:16                               ` Alexander Graf
@ 2010-07-19 13:06                               ` Richard W.M. Jones
  2010-07-19 13:12                                 ` Gleb Natapov
  2010-07-19 14:52                               ` Anthony Liguori
  2 siblings, 1 reply; 56+ messages in thread
From: Richard W.M. Jones @ 2010-07-19 13:06 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Alexander Graf, qemu-devel

On Mon, Jul 19, 2010 at 12:15:43PM +0300, Gleb Natapov wrote:
> That what we are talking about, no? We are trying to find faster way to
> load kernel/initrd and stay architectural. Honestly I would expect much
> greater speedup from Richard's approach like 2 seconds vs 8 seconds. It
> is hard to justify code complication just for 1 second speedup.

I've no idea where this "8 seconds" comes from.  Total boot time
currently is < 8 seconds even without my patch.  My patch takes it
from 7.5 seconds to 6.5 seconds.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
New in Fedora 11: Fedora Windows cross-compiler. Compile Windows
programs, test, and build Windows installers. Over 70 libraries supprt'd
http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19 13:06                               ` Richard W.M. Jones
@ 2010-07-19 13:12                                 ` Gleb Natapov
  0 siblings, 0 replies; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19 13:12 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Alexander Graf, qemu-devel

On Mon, Jul 19, 2010 at 02:06:27PM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 19, 2010 at 12:15:43PM +0300, Gleb Natapov wrote:
> > That what we are talking about, no? We are trying to find faster way to
> > load kernel/initrd and stay architectural. Honestly I would expect much
> > greater speedup from Richard's approach like 2 seconds vs 8 seconds. It
> > is hard to justify code complication just for 1 second speedup.
> 
> I've no idea where this "8 seconds" comes from.  Total boot time
That was number generated by may random number generator. I was just
trying to say that I would have expected much more gain from copying
kernel/initrd directly into the memory considering how much is going
on during pio string emulation.

> currently is < 8 seconds even without my patch.  My patch takes it
> from 7.5 seconds to 6.5 seconds.
> 
It shows that we are not so bad at emulating pio string operations.

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  7:33             ` Gleb Natapov
  2010-07-19  7:40               ` Alexander Graf
  2010-07-19  7:44               ` Richard W.M. Jones
@ 2010-07-19 14:45               ` Anthony Liguori
  2010-07-19 14:53                 ` Gleb Natapov
  2 siblings, 1 reply; 56+ messages in thread
From: Anthony Liguori @ 2010-07-19 14:45 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: qemu-devel, Richard W.M. Jones, Alexander Graf

On 07/19/2010 02:33 AM, Gleb Natapov wrote:
> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
>    
>> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
>>      
>>> That what I am warring about too. If we are adding device we have to be
>>> sure such device can actually exist on real hw too otherwise we may have
>>> problems later.
>>>        
>> I don't understand why the constraints of real h/w have anything to do
>> with this.  Can you explain?
>>
>>      
> Each time we do something not architectural it cause us troubles later.
> So constraints of real h/w is our constrains to.
>    

Your constraints are purely artificial.

There are plenty of places that something like fw_cfg could live and 
still do DMA.  It can directly hang off of the Southbridge.  It doesn't 
necessary need to be connected to the ISA/LPC buses.

Buses exist to multiplex I/O devices because of limited wiring space on 
motherboards.  There's no reason we need to constrain ourselves to 
minimize the number of virtual wires we emulate.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  9:15                             ` Gleb Natapov
  2010-07-19  9:16                               ` Alexander Graf
  2010-07-19 13:06                               ` Richard W.M. Jones
@ 2010-07-19 14:52                               ` Anthony Liguori
  2010-07-19 14:54                                 ` Gleb Natapov
  2 siblings, 1 reply; 56+ messages in thread
From: Anthony Liguori @ 2010-07-19 14:52 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: qemu-devel, Alexander Graf, Richard W.M. Jones

On 07/19/2010 04:15 AM, Gleb Natapov wrote:
> On Mon, Jul 19, 2010 at 11:09:13AM +0200, Alexander Graf wrote:
>    
>> On 19.07.2010, at 11:06, Gleb Natapov wrote:
>>
>>      
>>> On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote:
>>>        
>>>> virt-install is another program that uses explicit -initrd.
>>>>
>>>>          
>>> Installation takes a lot of time. Saving 1 second there will not be
>>> noticeable. And during lifetime of installed VM initrd will be loaded
>>> from its disk.
>>>        
>> Guys, please. It shouldn't be one or the other. Let's make sure both ways of doing things are fast. That's what users want: fast.
>>
>>      
> That what we are talking about, no? We are trying to find faster way to
> load kernel/initrd and stay architectural.

Modern platforms are not nearly as "architectural" as you would think.

It's not unusual to hang a custom chip off of the Southbridge that 
implements platform specific services along with an array of "legacy" 
devices that are implemented mostly in software to cost.

Other buses (like PS/2) are largely implemented in SMM today by the BIOS.

Regards,

Anthony Liguori

>   Honestly I would expect much
> greater speedup from Richard's approach like 2 seconds vs 8 seconds. It
> is hard to justify code complication just for 1 second speedup.
>
> --
> 			Gleb.
>
>    

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19 14:45               ` Anthony Liguori
@ 2010-07-19 14:53                 ` Gleb Natapov
  2010-07-19 15:54                   ` Anthony Liguori
  0 siblings, 1 reply; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19 14:53 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, Richard W.M. Jones, Alexander Graf

On Mon, Jul 19, 2010 at 09:45:58AM -0500, Anthony Liguori wrote:
> On 07/19/2010 02:33 AM, Gleb Natapov wrote:
> >On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> >>On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> >>>That what I am warring about too. If we are adding device we have to be
> >>>sure such device can actually exist on real hw too otherwise we may have
> >>>problems later.
> >>I don't understand why the constraints of real h/w have anything to do
> >>with this.  Can you explain?
> >>
> >Each time we do something not architectural it cause us troubles later.
> >So constraints of real h/w is our constrains to.
> 
> Your constraints are purely artificial.
> 
What is artificial about it? Each time we break them we safer.

> There are plenty of places that something like fw_cfg could live and
> still do DMA.  It can directly hang off of the Southbridge.  It
> doesn't necessary need to be connected to the ISA/LPC buses.
Examples of real HW? And I am not against something that does DMA,
but that is not what proposed patch does. It provides magic io
instruction that CPU calls and when instruction completes memory is
updated. This is nothing like DMA. Of course it is possible to add
proper DMA interface to fw_cfg, but should we do it for such a small
gain?

> 
> Buses exist to multiplex I/O devices because of limited wiring space
> on motherboards.  There's no reason we need to constrain ourselves
> to minimize the number of virtual wires we emulate.
> 
> Regards,
> 
> Anthony Liguori

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19 14:52                               ` Anthony Liguori
@ 2010-07-19 14:54                                 ` Gleb Natapov
  0 siblings, 0 replies; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19 14:54 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, Alexander Graf, Richard W.M. Jones

On Mon, Jul 19, 2010 at 09:52:23AM -0500, Anthony Liguori wrote:
> On 07/19/2010 04:15 AM, Gleb Natapov wrote:
> >On Mon, Jul 19, 2010 at 11:09:13AM +0200, Alexander Graf wrote:
> >>On 19.07.2010, at 11:06, Gleb Natapov wrote:
> >>
> >>>On Mon, Jul 19, 2010 at 10:00:04AM +0100, Richard W.M. Jones wrote:
> >>>>virt-install is another program that uses explicit -initrd.
> >>>>
> >>>Installation takes a lot of time. Saving 1 second there will not be
> >>>noticeable. And during lifetime of installed VM initrd will be loaded
> >>>from its disk.
> >>Guys, please. It shouldn't be one or the other. Let's make sure both ways of doing things are fast. That's what users want: fast.
> >>
> >That what we are talking about, no? We are trying to find faster way to
> >load kernel/initrd and stay architectural.
> 
> Modern platforms are not nearly as "architectural" as you would think.
> 
> It's not unusual to hang a custom chip off of the Southbridge that
> implements platform specific services along with an array of
> "legacy" devices that are implemented mostly in software to cost.
> 
> Other buses (like PS/2) are largely implemented in SMM today by the BIOS.
> 
I don't get your point. What is not "architectural" about all that?

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19 14:53                 ` Gleb Natapov
@ 2010-07-19 15:54                   ` Anthony Liguori
  2010-07-19 16:11                     ` Gleb Natapov
  0 siblings, 1 reply; 56+ messages in thread
From: Anthony Liguori @ 2010-07-19 15:54 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: qemu-devel, Richard W.M. Jones, Alexander Graf

On 07/19/2010 09:53 AM, Gleb Natapov wrote:
> On Mon, Jul 19, 2010 at 09:45:58AM -0500, Anthony Liguori wrote:
>    
>> On 07/19/2010 02:33 AM, Gleb Natapov wrote:
>>      
>>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
>>>        
>>>> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
>>>>          
>>>>> That what I am warring about too. If we are adding device we have to be
>>>>> sure such device can actually exist on real hw too otherwise we may have
>>>>> problems later.
>>>>>            
>>>> I don't understand why the constraints of real h/w have anything to do
>>>> with this.  Can you explain?
>>>>
>>>>          
>>> Each time we do something not architectural it cause us troubles later.
>>> So constraints of real h/w is our constrains to.
>>>        
>> Your constraints are purely artificial.
>>
>>      
> What is artificial about it? Each time we break them we safer.
>    

Just because something doesn't fit as an ISA or PCI device doesn't mean 
it can't exist in real life.  There are plenty of one-off devices with 
odd interfaces.

>> There are plenty of places that something like fw_cfg could live and
>> still do DMA.  It can directly hang off of the Southbridge.  It
>> doesn't necessary need to be connected to the ISA/LPC buses.
>>      
> Examples of real HW?

The IBM IMM, HP ILO, or Intel iAMT modules.  They basically play an 
identical role to fw_cfg.

>   And I am not against something that does DMA,
> but that is not what proposed patch does. It provides magic io
> instruction that CPU calls and when instruction completes memory is
> updated. This is nothing like DMA.

Isn't this exactly what the interface for PCI DMA looks like since 
there's no standard DMA implementation?

>   Of course it is possible to add
> proper DMA interface to fw_cfg, but should we do it for such a small
> gain?
>    

I think an ad-hoc DMA interface is perfectly reasonable to do.  I agree 
that adding a more generic DMA interface is overkill.

Regards,

Anthony Liguori

>> Buses exist to multiplex I/O devices because of limited wiring space
>> on motherboards.  There's no reason we need to constrain ourselves
>> to minimize the number of virtual wires we emulate.
>>
>> Regards,
>>
>> Anthony Liguori
>>      
> --
> 			Gleb.
>    

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19 15:54                   ` Anthony Liguori
@ 2010-07-19 16:11                     ` Gleb Natapov
  2010-07-19 16:47                       ` Richard W.M. Jones
  2010-07-19 19:06                       ` Anthony Liguori
  0 siblings, 2 replies; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19 16:11 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, Richard W.M. Jones, Alexander Graf

On Mon, Jul 19, 2010 at 10:54:03AM -0500, Anthony Liguori wrote:
> On 07/19/2010 09:53 AM, Gleb Natapov wrote:
> >On Mon, Jul 19, 2010 at 09:45:58AM -0500, Anthony Liguori wrote:
> >>On 07/19/2010 02:33 AM, Gleb Natapov wrote:
> >>>On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> >>>>On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> >>>>>That what I am warring about too. If we are adding device we have to be
> >>>>>sure such device can actually exist on real hw too otherwise we may have
> >>>>>problems later.
> >>>>I don't understand why the constraints of real h/w have anything to do
> >>>>with this.  Can you explain?
> >>>>
> >>>Each time we do something not architectural it cause us troubles later.
> >>>So constraints of real h/w is our constrains to.
> >>Your constraints are purely artificial.
> >>
> >What is artificial about it? Each time we break them we safer.
> 
> Just because something doesn't fit as an ISA or PCI device doesn't
> mean it can't exist in real life.  There are plenty of one-off
> devices with odd interfaces.
And there are such that cause cpu to stall for 6.5 seconds when you do
io to them? I never said that we should implement ISA or PCI device, I
don't know why you bring them here.

> 
> >>There are plenty of places that something like fw_cfg could live and
> >>still do DMA.  It can directly hang off of the Southbridge.  It
> >>doesn't necessary need to be connected to the ISA/LPC buses.
> >Examples of real HW?
> 
> The IBM IMM, HP ILO, or Intel iAMT modules.  They basically play an
> identical role to fw_cfg.
> 
So what are their interfaces?  May be we should emulate one.

> >  And I am not against something that does DMA,
> >but that is not what proposed patch does. It provides magic io
> >instruction that CPU calls and when instruction completes memory is
> >updated. This is nothing like DMA.
> 
> Isn't this exactly what the interface for PCI DMA looks like since
> there's no standard DMA implementation?
> 
Every DMA that I know about support polling for completion or they can
issue interrupt at the end of transaction.  I am not even sure you can
design such HW that will stall cpu in IO instruction till some operation
is completed.

> >  Of course it is possible to add
> >proper DMA interface to fw_cfg, but should we do it for such a small
> >gain?
> 
> I think an ad-hoc DMA interface is perfectly reasonable to do.  I
> agree that adding a more generic DMA interface is overkill.
> 
It should look like real DMA at least. The justification for it should
be better than "In our project we don't what to do this and we don't
what to do that so our initrd is 100M now, so why not add hack to qemu
to load it 1 second faster so we can grow it some more".

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19 16:11                     ` Gleb Natapov
@ 2010-07-19 16:47                       ` Richard W.M. Jones
  2010-07-19 17:04                         ` Gleb Natapov
  2010-07-19 19:06                       ` Anthony Liguori
  1 sibling, 1 reply; 56+ messages in thread
From: Richard W.M. Jones @ 2010-07-19 16:47 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Alexander Graf, qemu-devel

On Mon, Jul 19, 2010 at 07:11:37PM +0300, Gleb Natapov wrote:
> And there are such that cause cpu to stall for 6.5 seconds when you do
> io to them? I never said that we should implement ISA or PCI device, I
> don't know why you bring them here.

Where is "6.5 seconds" coming from?  That is the *total boot time*
of the libguestfs appliance, and includes far far more than the
time taken to do the memcpy.

I timed the call to cpu_physical_memory_write, and it takes 115
milliseconds with my patch (for an initrd which is 113 MB).

> It should look like real DMA at least. The justification for it should
> be better than "In our project we don't what to do this and we don't
> what to do that so our initrd is 100M now, so why not add hack to qemu
> to load it 1 second faster so we can grow it some more".

Please don't make stuff up.  We have a large initrd for perfectly good
reasons which I have outlined in a previous email.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://et.redhat.com/~rjones/virt-top

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19 16:47                       ` Richard W.M. Jones
@ 2010-07-19 17:04                         ` Gleb Natapov
  0 siblings, 0 replies; 56+ messages in thread
From: Gleb Natapov @ 2010-07-19 17:04 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Alexander Graf, qemu-devel

On Mon, Jul 19, 2010 at 05:47:40PM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 19, 2010 at 07:11:37PM +0300, Gleb Natapov wrote:
> > And there are such that cause cpu to stall for 6.5 seconds when you do
> > io to them? I never said that we should implement ISA or PCI device, I
> > don't know why you bring them here.
> 
> Where is "6.5 seconds" coming from?  That is the *total boot time*
> of the libguestfs appliance, and includes far far more than the
> time taken to do the memcpy.
> 
> I timed the call to cpu_physical_memory_write, and it takes 115
> milliseconds with my patch (for an initrd which is 113 MB).
> 
And how much time it takes to load it using string PIO? 1 second 115
millisecond? I thought 6.5 and 7.5 was image loading time, not total
boot time. Stalling vcpu execution for 115 millisecond may be unfortunate
but not as catastrophic as 6.5 seconds. But interface will be there for
everyone to use, so it may be eventually abused even more.

> > It should look like real DMA at least. The justification for it should
> > be better than "In our project we don't what to do this and we don't
> > what to do that so our initrd is 100M now, so why not add hack to qemu
> > to load it 1 second faster so we can grow it some more".
> 
> Please don't make stuff up.  We have a large initrd for perfectly good
> reasons which I have outlined in a previous email.

Those reasons does not look good for me at all. Cleaning up existing
distro is not much less work that creating basic distro with only things
you need, but result is much better. When I worked on embedded project
almost 10 years ago we tried to cleanup generic Red Hat Linux and result
was still huge. By building our own distro we were able to squeeze two
root partitions in 64M compressed. You also do not want to consider
putting things into cdrom because of some issues that should be
solvable.

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19 16:11                     ` Gleb Natapov
  2010-07-19 16:47                       ` Richard W.M. Jones
@ 2010-07-19 19:06                       ` Anthony Liguori
  1 sibling, 0 replies; 56+ messages in thread
From: Anthony Liguori @ 2010-07-19 19:06 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: qemu-devel, Richard W.M. Jones, Alexander Graf

On 07/19/2010 11:11 AM, Gleb Natapov wrote:
> On Mon, Jul 19, 2010 at 10:54:03AM -0500, Anthony Liguori wrote:
>    
>> On 07/19/2010 09:53 AM, Gleb Natapov wrote:
>>      
>>> On Mon, Jul 19, 2010 at 09:45:58AM -0500, Anthony Liguori wrote:
>>>        
>>>> On 07/19/2010 02:33 AM, Gleb Natapov wrote:
>>>>          
>>>>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
>>>>>            
>>>>>> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
>>>>>>              
>>>>>>> That what I am warring about too. If we are adding device we have to be
>>>>>>> sure such device can actually exist on real hw too otherwise we may have
>>>>>>> problems later.
>>>>>>>                
>>>>>> I don't understand why the constraints of real h/w have anything to do
>>>>>> with this.  Can you explain?
>>>>>>
>>>>>>              
>>>>> Each time we do something not architectural it cause us troubles later.
>>>>> So constraints of real h/w is our constrains to.
>>>>>            
>>>> Your constraints are purely artificial.
>>>>
>>>>          
>>> What is artificial about it? Each time we break them we safer.
>>>        
>> Just because something doesn't fit as an ISA or PCI device doesn't
>> mean it can't exist in real life.  There are plenty of one-off
>> devices with odd interfaces.
>>      
> And there are such that cause cpu to stall for 6.5 seconds when you do
> io to them?

That would certainly be a poorly designed interface.  I can appreciate 
your point and I think suggesting that we should implement an ad-hoc 
completion interface is reasonable.  For instance,

outl(FW_CFG_SET_INITRD_ADDR, addr)
while !inb(FW_CFG_INITRD_READY):
      // spin

>>>> There are plenty of places that something like fw_cfg could live and
>>>> still do DMA.  It can directly hang off of the Southbridge.  It
>>>> doesn't necessary need to be connected to the ISA/LPC buses.
>>>>          
>>> Examples of real HW?
>>>        
>> The IBM IMM, HP ILO, or Intel iAMT modules.  They basically play an
>> identical role to fw_cfg.
>>
>>      
> So what are their interfaces?  May be we should emulate one.
>    

The interface to firmware is private and changes from platform to 
platform.  The IMM exposes various interfaces to the OS as it implements 
a number of legacy devices.  It also exposes a side-channel (very 
similar to virtio-console) as a USB RNDIS driver.  I believe it 
implements IPMI over a private ethernet type although I'd have to double 
check.  It may actually use TCP/IP. Of course it is possible to add

>>> proper DMA interface to fw_cfg, but should we do it for such a small
>>> gain?
>>>        
>> I think an ad-hoc DMA interface is perfectly reasonable to do.  I
>> agree that adding a more generic DMA interface is overkill.
>>
>>      
> It should look like real DMA at least. The justification for it should
> be better than "In our project we don't what to do this and we don't
> what to do that so our initrd is 100M now, so why not add hack to qemu
> to load it 1 second faster so we can grow it some more".
>    

I certainly agree that adding a polling interface for DMA completion is 
a reasonable requirement.

Regards,

Anthony Liguori

> --
> 			Gleb.
>    

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-19  7:51                 ` Gleb Natapov
  2010-07-19  7:57                   ` Alexander Graf
@ 2010-07-20 13:15                   ` Jamie Lokier
  2010-07-20 13:40                     ` Gleb Natapov
  1 sibling, 1 reply; 56+ messages in thread
From: Jamie Lokier @ 2010-07-20 13:15 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: qemu-devel, Alexander Graf, Richard W.M. Jones

Gleb Natapov wrote:
> On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
> > 
> > On 19.07.2010, at 09:33, Gleb Natapov wrote:
> > 
> > > On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> > >> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> > >>> That what I am warring about too. If we are adding device we have to be
> > >>> sure such device can actually exist on real hw too otherwise we may have
> > >>> problems later.
> > >> 
> > >> I don't understand why the constraints of real h/w have anything to do
> > >> with this.  Can you explain?
> > >> 
> > > Each time we do something not architectural it cause us troubles later.
> > > So constraints of real h/w is our constrains to.
> > > 
> > >>> Also 1 second on 100M file does not look like huge gain to me.
> > >> 
> > >> Every second counts.  We're trying to get libguestfs boot times down
> > >> from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
> > >> program.
> > >> 
> > > So what about making initrd smaller? I remember managing two
> > > distribution in 64M flash in embedded project.
> > 
> > Having a huge initrd basically helps in reusing a lot of existing code. We do the same - in general the initrd is just a subset of the applications of the host OS. And if you start putting perl or the likes into it, it becomes big.
> > 
> Why not provide small disk/cdrom with all those utilities installed?
> 
> > I guess the best thing for now really is to try and see which code paths insb goes along. It should really be coalesced.
> > 
> It is coalesced to a certain extent (reenter guest every 1024 bytes,
> read from userspace page at a time). You need to continue injecting
> interrupt into a guest during long string operation and checking
> exception condition on a page boundaries.

First obvious change is to make that 4k bytes (page size) when the I/O
port is the firmware port.  That'll make initrd 4 times faster straight away.

If that's not enough saving, it strikes me a cleaner approach than
inventing new kinds of DMA and/or new PCI devices, is to just detect
when the rep insb instruction is used for loading a firmware blob and
treat that as a different trap.

Is guest SeaBIOS in real mode at that point?  If yes, then it would be
best to trap this combination:

  rep insb is fetching a blob + CPU is in real mode

Because then it's safe to skip the exception check on page boundaries.

If no, the trap will need to be a bit smarter.

Advantages of this approach:

  - No need for new BIOS
  - Will work with older BIOSes using current method, and accelerate them
  - No need for distinct -initrd BIOS implementations for isapc and pc,
    (compared with the PCI proposal)
  - Doesn't add any new "extra-architectural" behaviour

-- Jamie

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-20 13:15                   ` Jamie Lokier
@ 2010-07-20 13:40                     ` Gleb Natapov
  2010-07-20 13:59                       ` Richard W.M. Jones
  0 siblings, 1 reply; 56+ messages in thread
From: Gleb Natapov @ 2010-07-20 13:40 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: qemu-devel, Alexander Graf, Richard W.M. Jones

On Tue, Jul 20, 2010 at 02:15:16PM +0100, Jamie Lokier wrote:
> Gleb Natapov wrote:
> > On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote:
> > > 
> > > On 19.07.2010, at 09:33, Gleb Natapov wrote:
> > > 
> > > > On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote:
> > > >> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote:
> > > >>> That what I am warring about too. If we are adding device we have to be
> > > >>> sure such device can actually exist on real hw too otherwise we may have
> > > >>> problems later.
> > > >> 
> > > >> I don't understand why the constraints of real h/w have anything to do
> > > >> with this.  Can you explain?
> > > >> 
> > > > Each time we do something not architectural it cause us troubles later.
> > > > So constraints of real h/w is our constrains to.
> > > > 
> > > >>> Also 1 second on 100M file does not look like huge gain to me.
> > > >> 
> > > >> Every second counts.  We're trying to get libguestfs boot times down
> > > >> from 8-12 seconds to 4-5 seconds.  For many cases it's an interactive
> > > >> program.
> > > >> 
> > > > So what about making initrd smaller? I remember managing two
> > > > distribution in 64M flash in embedded project.
> > > 
> > > Having a huge initrd basically helps in reusing a lot of existing code. We do the same - in general the initrd is just a subset of the applications of the host OS. And if you start putting perl or the likes into it, it becomes big.
> > > 
> > Why not provide small disk/cdrom with all those utilities installed?
> > 
> > > I guess the best thing for now really is to try and see which code paths insb goes along. It should really be coalesced.
> > > 
> > It is coalesced to a certain extent (reenter guest every 1024 bytes,
> > read from userspace page at a time). You need to continue injecting
> > interrupt into a guest during long string operation and checking
> > exception condition on a page boundaries.
> 
> First obvious change is to make that 4k bytes (page size) when the I/O
> port is the firmware port.  That'll make initrd 4 times faster straight away.
> 
Actually my description above is incorrect. We read 1024 bytes at a time
from userspace not page. I doubt changing this to be 4096 will make it 4
times faster. Many other things are going on during emulation. Will try
to measure benefit though.

> If that's not enough saving, it strikes me a cleaner approach than
> inventing new kinds of DMA and/or new PCI devices, is to just detect
> when the rep insb instruction is used for loading a firmware blob and
> treat that as a different trap.
We are not going to put hacks into the kernel for that. Kernel knows
nothing about firmware blobs.

> 
> Is guest SeaBIOS in real mode at that point?  If yes, then it would be
> best to trap this combination:
> 
>   rep insb is fetching a blob + CPU is in real mode
> 
> Because then it's safe to skip the exception check on page boundaries.
The interface between kernel an userspace allows for reading/writing 4K
of pio at a time max.

> 
> If no, the trap will need to be a bit smarter.
> 
> Advantages of this approach:
> 
>   - No need for new BIOS
Which remind me that ad-hoc DMA interface should be discoverable by a
guest.

>   - Will work with older BIOSes using current method, and accelerate them
>   - No need for distinct -initrd BIOS implementations for isapc and pc,
>     (compared with the PCI proposal)
>   - Doesn't add any new "extra-architectural" behaviour
> 
> -- Jamie

--
			Gleb.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-20 13:40                     ` Gleb Natapov
@ 2010-07-20 13:59                       ` Richard W.M. Jones
  0 siblings, 0 replies; 56+ messages in thread
From: Richard W.M. Jones @ 2010-07-20 13:59 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Alexander Graf, qemu-devel

On Tue, Jul 20, 2010 at 04:40:28PM +0300, Gleb Natapov wrote:
> Which remind me that ad-hoc DMA interface should be discoverable by a
> guest.

Judging by 'git annotate' this interface has already been extended 4
times without requiring this to be discoverable.  However I will add
an extra config bitmask which guests can fetch in my next version.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-17  9:50 [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device Richard W.M. Jones
  2010-07-17  9:53 ` Richard W.M. Jones
  2010-07-19  6:14 ` [Qemu-devel] " Gleb Natapov
@ 2010-07-20 22:22 ` Blue Swirl
  2010-07-21  7:27   ` Alexander Graf
  2 siblings, 1 reply; 56+ messages in thread
From: Blue Swirl @ 2010-07-20 22:22 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: qemu-devel, Gleb Natapov, agraf

On Sat, Jul 17, 2010 at 9:50 AM, Richard W.M. Jones <rjones@redhat.com> wrote:
> I'm trying to speed up the process of loading kernel and initrd.
>
> I found that the main loop which loads these into qemu memory does it
> via executing in the guest:
>
>  rep insb (%dx),%es:(%edi)
>
> In other words, reading it byte-at-a-time from an emulated IO port.
> This is very slow[1] when your initrd is > 100MB like mine is.
>
> Questions:
>
> Is fw_cfg a purely qemu concept?  Does this BIOS firmware port
> 0x510-0x511 exist in real hardware?
>
> I understand from the git logs that fw_cfg was added because the old
> way was to load kernel & initrd into RAM directly, but this didn't
> work because SeaBIOS would clear the RAM, clobbering kernel & initrd.
> Could we change to loading these directly into RAM, and instead
> provide some indication to SeaBIOS so it doesn't clobber the RAM?  I'm
> quite prepared to do the work, just wondering if there's something
> else I'm not getting about this.

The entire discussion after this very first message seems to focus on
the DMA method. But is it so hard to fix SeaBIOS from clobbering RAM?

>
> Rich.
>
> [1] Several seconds of wallclock time, and according to gprof, the
> function 'fw_cfg_io_readb' accounts for > 50% of the time taken in
> qemu between qemu starting and us entering the Linux kernel.
>
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming blog: http://rwmj.wordpress.com
> Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
> http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device
  2010-07-20 22:22 ` [Qemu-devel] " Blue Swirl
@ 2010-07-21  7:27   ` Alexander Graf
  0 siblings, 0 replies; 56+ messages in thread
From: Alexander Graf @ 2010-07-21  7:27 UTC (permalink / raw)
  To: Blue Swirl; +Cc: Richard W.M. Jones, Gleb Natapov, qemu-devel


On 21.07.2010, at 00:22, Blue Swirl wrote:

> On Sat, Jul 17, 2010 at 9:50 AM, Richard W.M. Jones <rjones@redhat.com> wrote:
>> I'm trying to speed up the process of loading kernel and initrd.
>> 
>> I found that the main loop which loads these into qemu memory does it
>> via executing in the guest:
>> 
>>  rep insb (%dx),%es:(%edi)
>> 
>> In other words, reading it byte-at-a-time from an emulated IO port.
>> This is very slow[1] when your initrd is > 100MB like mine is.
>> 
>> Questions:
>> 
>> Is fw_cfg a purely qemu concept?  Does this BIOS firmware port
>> 0x510-0x511 exist in real hardware?
>> 
>> I understand from the git logs that fw_cfg was added because the old
>> way was to load kernel & initrd into RAM directly, but this didn't
>> work because SeaBIOS would clear the RAM, clobbering kernel & initrd.
>> Could we change to loading these directly into RAM, and instead
>> provide some indication to SeaBIOS so it doesn't clobber the RAM?  I'm
>> quite prepared to do the work, just wondering if there's something
>> else I'm not getting about this.
> 
> The entire discussion after this very first message seems to focus on
> the DMA method. But is it so hard to fix SeaBIOS from clobbering RAM?

It was basically introduced to have a clean way of actually loading the blobs. This is a lot more flexible than trying to make sure every firmware out there doesn't accidently overwrite random ram regions.

The conclusion on the phone call was basically to try and look into optimizing the general rep ins case for now. That should also benefit others and isn't tightly coupled with this exact problem.


Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2010-07-21  7:27 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-17  9:50 [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device Richard W.M. Jones
2010-07-17  9:53 ` Richard W.M. Jones
2010-07-18 17:26   ` Alexander Graf
2010-07-18 20:09     ` Richard W.M. Jones
2010-07-18 20:32       ` Alexander Graf
2010-07-19  6:23         ` Gleb Natapov
2010-07-19  7:28           ` Richard W.M. Jones
2010-07-19  7:33             ` Gleb Natapov
2010-07-19  7:40               ` Alexander Graf
2010-07-19  7:51                 ` Gleb Natapov
2010-07-19  7:57                   ` Alexander Graf
2010-07-19  8:01                     ` Gleb Natapov
2010-07-19  8:08                       ` Alexander Graf
2010-07-19  8:19                         ` Gleb Natapov
2010-07-19  8:24                           ` Alexander Graf
2010-07-19  8:30                             ` Gleb Natapov
2010-07-19  8:41                               ` Alexander Graf
2010-07-19  8:48                                 ` Gleb Natapov
2010-07-19  8:54                                   ` Alexander Graf
2010-07-19  9:00                                     ` Gleb Natapov
2010-07-19  9:02                                       ` Alexander Graf
2010-07-19  9:10                                         ` Gleb Natapov
2010-07-19  9:13                                           ` Alexander Graf
2010-07-19  9:19                                             ` Gleb Natapov
2010-07-19  9:21                                               ` Alexander Graf
2010-07-19  9:32                                                 ` Gleb Natapov
2010-07-19  9:23                                               ` Richard W.M. Jones
2010-07-20 13:15                   ` Jamie Lokier
2010-07-20 13:40                     ` Gleb Natapov
2010-07-20 13:59                       ` Richard W.M. Jones
2010-07-19  9:19                 ` Richard W.M. Jones
2010-07-19  7:44               ` Richard W.M. Jones
2010-07-19  7:55                 ` Gleb Natapov
2010-07-19  8:34                   ` Richard W.M. Jones
2010-07-19  8:40                     ` Gleb Natapov
2010-07-19  9:00                       ` Richard W.M. Jones
2010-07-19  9:04                         ` Richard W.M. Jones
2010-07-19  9:06                         ` Gleb Natapov
2010-07-19  9:09                           ` Alexander Graf
2010-07-19  9:15                             ` Gleb Natapov
2010-07-19  9:16                               ` Alexander Graf
2010-07-19 13:06                               ` Richard W.M. Jones
2010-07-19 13:12                                 ` Gleb Natapov
2010-07-19 14:52                               ` Anthony Liguori
2010-07-19 14:54                                 ` Gleb Natapov
2010-07-19 14:45               ` Anthony Liguori
2010-07-19 14:53                 ` Gleb Natapov
2010-07-19 15:54                   ` Anthony Liguori
2010-07-19 16:11                     ` Gleb Natapov
2010-07-19 16:47                       ` Richard W.M. Jones
2010-07-19 17:04                         ` Gleb Natapov
2010-07-19 19:06                       ` Anthony Liguori
2010-07-19  6:12     ` Gleb Natapov
2010-07-19  6:14 ` [Qemu-devel] " Gleb Natapov
2010-07-20 22:22 ` [Qemu-devel] " Blue Swirl
2010-07-21  7:27   ` Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).