[Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
@ 2010-08-03 11:13 Richard W.M. Jones
  2010-08-03 11:33 ` Gleb Natapov
  0 siblings, 1 reply; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-03 11:13 UTC (permalink / raw)
  To: qemu-devel


qemu compiled from today's git.  Using the following command line:

$qemudir/x86_64-softmmu/qemu-system-x86_64 -L $qemudir/pc-bios \
    -drive file=/dev/null,if=virtio \
    -enable-kvm \
    -nodefaults \
    -nographic \
    -serial stdio \
    -m 500 \
    -no-reboot \
    -no-hpet \
    -net user,vlan=0,net=169.254.0.0/16 \
    -net nic,model=ne2k_pci,vlan=0 \
    -kernel /tmp/libguestfsEyAMut/kernel \
    -initrd /tmp/libguestfsEyAMut/initrd \
    -append 'panic=1 console=ttyS0 udevtimeout=300 noapic acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_vmchannel=tcp:169.254.2.2:35007 guestfs_verbose=1 TERM=xterm-color '

With kernel 2.6.35 [*], this takes about 1 min 20 s before the guest
starts.

If I revert back to kernel 2.6.34, it's pretty quick as usual.

strace is not very informative.  It's in a loop doing select and
reading/writing from some file descriptors, including the signalfd and
two pipe fds.

Anyone seen anything like this?

Rich.

[*] This Fedora kernel:
http://koji.fedoraproject.org/koji/buildinfo?buildID=187085

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://et.redhat.com/~rjones/libguestfs/
See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 11:13 [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? Richard W.M. Jones
@ 2010-08-03 11:33 ` Gleb Natapov
  2010-08-03 12:10   ` Richard W.M. Jones
  0 siblings, 1 reply; 151+ messages in thread
From: Gleb Natapov @ 2010-08-03 11:33 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: qemu-devel

On Tue, Aug 03, 2010 at 12:13:06PM +0100, Richard W.M. Jones wrote:
> 
> qemu compiled from today's git.  Using the following command line:
> 
> $qemudir/x86_64-softmmu/qemu-system-x86_64 -L $qemudir/pc-bios \
>     -drive file=/dev/null,if=virtio \
>     -enable-kvm \
>     -nodefaults \
>     -nographic \
>     -serial stdio \
>     -m 500 \
>     -no-reboot \
>     -no-hpet \
>     -net user,vlan=0,net=169.254.0.0/16 \
>     -net nic,model=ne2k_pci,vlan=0 \
>     -kernel /tmp/libguestfsEyAMut/kernel \
>     -initrd /tmp/libguestfsEyAMut/initrd \
>     -append 'panic=1 console=ttyS0 udevtimeout=300 noapic acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_vmchannel=tcp:169.254.2.2:35007 guestfs_verbose=1 TERM=xterm-color '
> 
> With kernel 2.6.35 [*], this takes about 1 min 20 s before the guest
> starts.
> 
> If I revert back to kernel 2.6.34, it's pretty quick as usual.
> 
> strace is not very informative.  It's in a loop doing select and
> reading/writing from some file descriptors, including the signalfd and
> two pipe fds.
> 
> Anyone seen anything like this?
> 
I assume your initrd is huge. In newer kernels ins/outs are much slower that
they were. They are much more correct too. It shouldn't be 1 min 20 sec for
100M initrd though, but it can take 20-30 sec. This belongs to kvm list BTW.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 11:33 ` Gleb Natapov
@ 2010-08-03 12:10   ` Richard W.M. Jones
  2010-08-03 12:37     ` Gleb Natapov
  0 siblings, 1 reply; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-03 12:10 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: qemu-devel, kvm

On Tue, Aug 03, 2010 at 02:33:02PM +0300, Gleb Natapov wrote:
> On Tue, Aug 03, 2010 at 12:13:06PM +0100, Richard W.M. Jones wrote:
> > 
> > qemu compiled from today's git.  Using the following command line:
> > 
> > $qemudir/x86_64-softmmu/qemu-system-x86_64 -L $qemudir/pc-bios \
> >     -drive file=/dev/null,if=virtio \
> >     -enable-kvm \
> >     -nodefaults \
> >     -nographic \
> >     -serial stdio \
> >     -m 500 \
> >     -no-reboot \
> >     -no-hpet \
> >     -net user,vlan=0,net=169.254.0.0/16 \
> >     -net nic,model=ne2k_pci,vlan=0 \
> >     -kernel /tmp/libguestfsEyAMut/kernel \
> >     -initrd /tmp/libguestfsEyAMut/initrd \
> >     -append 'panic=1 console=ttyS0 udevtimeout=300 noapic acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_vmchannel=tcp:169.254.2.2:35007 guestfs_verbose=1 TERM=xterm-color '
> > 
> > With kernel 2.6.35 [*], this takes about 1 min 20 s before the guest
> > starts.
> > 
> > If I revert back to kernel 2.6.34, it's pretty quick as usual.
> > 
> > strace is not very informative.  It's in a loop doing select and
> > reading/writing from some file descriptors, including the signalfd and
> > two pipe fds.
> > 
> > Anyone seen anything like this?
> > 
> I assume your initrd is huge.

It's ~110MB, yes.

> In newer kernels ins/outs are much slower that they were. They are
> much more correct too. It shouldn't be 1 min 20 sec for 100M initrd
> though, but it can take 20-30 sec. This belongs to kvm list BTW.

I can't see anything about this in the kernel changelog.  Can you
point me to the commit or the key phrase to look for?

Also, what's the point of making in/out "more correct" when they we
know we're talking to qemu (eg. from the CPUID) and we know it already
worked fine before with qemu?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 12:10   ` Richard W.M. Jones
@ 2010-08-03 12:37     ` Gleb Natapov
  2010-08-03 12:48       ` Richard W.M. Jones
  0 siblings, 1 reply; 151+ messages in thread
From: Gleb Natapov @ 2010-08-03 12:37 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: qemu-devel, kvm

On Tue, Aug 03, 2010 at 01:10:00PM +0100, Richard W.M. Jones wrote:
> On Tue, Aug 03, 2010 at 02:33:02PM +0300, Gleb Natapov wrote:
> > On Tue, Aug 03, 2010 at 12:13:06PM +0100, Richard W.M. Jones wrote:
> > > 
> > > qemu compiled from today's git.  Using the following command line:
> > > 
> > > $qemudir/x86_64-softmmu/qemu-system-x86_64 -L $qemudir/pc-bios \
> > >     -drive file=/dev/null,if=virtio \
> > >     -enable-kvm \
> > >     -nodefaults \
> > >     -nographic \
> > >     -serial stdio \
> > >     -m 500 \
> > >     -no-reboot \
> > >     -no-hpet \
> > >     -net user,vlan=0,net=169.254.0.0/16 \
> > >     -net nic,model=ne2k_pci,vlan=0 \
> > >     -kernel /tmp/libguestfsEyAMut/kernel \
> > >     -initrd /tmp/libguestfsEyAMut/initrd \
> > >     -append 'panic=1 console=ttyS0 udevtimeout=300 noapic acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_vmchannel=tcp:169.254.2.2:35007 guestfs_verbose=1 TERM=xterm-color '
> > > 
> > > With kernel 2.6.35 [*], this takes about 1 min 20 s before the guest
> > > starts.
> > > 
> > > If I revert back to kernel 2.6.34, it's pretty quick as usual.
> > > 
> > > strace is not very informative.  It's in a loop doing select and
> > > reading/writing from some file descriptors, including the signalfd and
> > > two pipe fds.
> > > 
> > > Anyone seen anything like this?
> > > 
> > I assume your initrd is huge.
> 
> It's ~110MB, yes.
> 
> > In newer kernels ins/outs are much slower that they were. They are
> > much more correct too. It shouldn't be 1 min 20 sec for 100M initrd
> > though, but it can take 20-30 sec. This belongs to kvm list BTW.
> 
> I can't see anything about this in the kernel changelog.  Can you
> point me to the commit or the key phrase to look for?
> 
7972995b0c346de76

> Also, what's the point of making in/out "more correct" when they we
> know we're talking to qemu (eg. from the CPUID) and we know it already
> worked fine before with qemu?
> 
Qemu has nothing to do with that. ins/outs didn't worked correctly for
some situation. They didn't work at all if destination/source memory
was MMIO (didn't work as in hang vcpu IIRC and this is security risk).
Direction flag wasn't handled at all (if it was set instruction injected
#GP into a gust). It didn't check that memory it writes to is shadowed
in which case special action should be taken. It didn't delivered events
during long string operations. May be more. Unfortunately adding all that
makes emulation much slower.  I already implemented some speedups, and
more is possible, but we will not be able to get to previous string io speed
which was our upper limit.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 12:37     ` Gleb Natapov
@ 2010-08-03 12:48       ` Richard W.M. Jones
  2010-08-03 13:19         ` Avi Kivity
  0 siblings, 1 reply; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-03 12:48 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: qemu-devel, kvm

On Tue, Aug 03, 2010 at 03:37:14PM +0300, Gleb Natapov wrote:
> On Tue, Aug 03, 2010 at 01:10:00PM +0100, Richard W.M. Jones wrote:
> > I can't see anything about this in the kernel changelog.  Can you
> > point me to the commit or the key phrase to look for?
> > 
> 7972995b0c346de76

Thanks - I see.

> > Also, what's the point of making in/out "more correct" when they we
> > know we're talking to qemu (eg. from the CPUID) and we know it already
> > worked fine before with qemu?
> > 
> Qemu has nothing to do with that. ins/outs didn't worked correctly for
> some situation. They didn't work at all if destination/source memory
> was MMIO (didn't work as in hang vcpu IIRC and this is security risk).
> Direction flag wasn't handled at all (if it was set instruction injected
> #GP into a gust). It didn't check that memory it writes to is shadowed
> in which case special action should be taken. It didn't delivered events
> during long string operations. May be more. Unfortunately adding all that
> makes emulation much slower.  I already implemented some speedups, and
> more is possible, but we will not be able to get to previous string io speed
> which was our upper limit.

Thanks for the explanation.  I'll repost my "DMA"-like fw-cfg patch
once I've rebased it and done some more testing.  This huge regression
for a common operation (implementing -initrd) needs to be solved
without using inb/rep ins.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 12:48       ` Richard W.M. Jones
@ 2010-08-03 13:19         ` Avi Kivity
  2010-08-03 14:05           ` Richard W.M. Jones
  0 siblings, 1 reply; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 13:19 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: qemu-devel, Gleb Natapov, kvm

  On 08/03/2010 03:48 PM, Richard W.M. Jones wrote:
>
> Thanks for the explanation.  I'll repost my "DMA"-like fw-cfg patch
> once I've rebased it and done some more testing.  This huge regression
> for a common operation (implementing -initrd) needs to be solved
> without using inb/rep ins.

Adding more interfaces is easy but a problem in the long term.  We'll 
optimize it as much as we can.  Meanwhile, why are you loading huge 
initrds?  Use a cdrom instead (it will also be faster since the guest 
doesn't need to unpack it).

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 13:19         ` Avi Kivity
@ 2010-08-03 14:05           ` Richard W.M. Jones
  2010-08-03 14:38             ` Avi Kivity
  0 siblings, 1 reply; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-03 14:05 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, Gleb Natapov, kvm

On Tue, Aug 03, 2010 at 04:19:39PM +0300, Avi Kivity wrote:
>  On 08/03/2010 03:48 PM, Richard W.M. Jones wrote:
> >
> >Thanks for the explanation.  I'll repost my "DMA"-like fw-cfg patch
> >once I've rebased it and done some more testing.  This huge regression
> >for a common operation (implementing -initrd) needs to be solved
> >without using inb/rep ins.
> 
> Adding more interfaces is easy but a problem in the long term.
> We'll optimize it as much as we can.  Meanwhile, why are you loading
> huge initrds?  Use a cdrom instead (it will also be faster since the
> guest doesn't need to unpack it).

Because it involves rewriting the entire appliance building process,
and we don't necessarily know if it'll be faster after we've done
that.

Look: currently we create the initrd on the fly in 700ms.  We've no
reason to believe that creating a CD-ROM on the fly wouldn't take
around the same time.  After all, both processes involve reading all
the host files from disk and writing a temporary file.

You have to create these things on the fly, because we don't actually
ship an appliance to end users, just a tiny (< 1 MB) skeleton.  You
can't ship a massive statically linked appliance to end users because
it's just unmanageable (think: security; updates; bandwidth).

Loading the initrd currently takes 115ms (or could do, if a sensible
50 line patch was permitted).

So the only possible saving would be the 115ms load time of the
initrd.  In theory the CD-ROM device could be detected in 0 time.

Total saving: 115ms.

But will it be any faster, since after spending 115ms, everything runs
from memory, versus being loaded from the CD?

Let's face the fact that qemu has suffered from an enormous
regression.  From some hundreds of milliseconds up to over a minute,
in the space of 6 months of development.  For a very simple operation:
loading a file into memory.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://et.redhat.com/~rjones/virt-top

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 14:05           ` Richard W.M. Jones
@ 2010-08-03 14:38             ` Avi Kivity
  2010-08-03 14:53               ` Richard W.M. Jones
  0 siblings, 1 reply; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 14:38 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: qemu-devel, Gleb Natapov, kvm

  On 08/03/2010 05:05 PM, Richard W.M. Jones wrote:
> On Tue, Aug 03, 2010 at 04:19:39PM +0300, Avi Kivity wrote:
>>   On 08/03/2010 03:48 PM, Richard W.M. Jones wrote:
>>> Thanks for the explanation.  I'll repost my "DMA"-like fw-cfg patch
>>> once I've rebased it and done some more testing.  This huge regression
>>> for a common operation (implementing -initrd) needs to be solved
>>> without using inb/rep ins.
>> Adding more interfaces is easy but a problem in the long term.
>> We'll optimize it as much as we can.  Meanwhile, why are you loading
>> huge initrds?  Use a cdrom instead (it will also be faster since the
>> guest doesn't need to unpack it).
> Because it involves rewriting the entire appliance building process,
> and we don't necessarily know if it'll be faster after we've done
> that.
>
> Look: currently we create the initrd on the fly in 700ms.  We've no
> reason to believe that creating a CD-ROM on the fly wouldn't take
> around the same time.  After all, both processes involve reading all
> the host files from disk and writing a temporary file.

The time will only continue to grow as you add features and as the 
distro bloats naturally.

Much better to create it once and only update it if some dependent file 
changes (basically the current on-the-fly code + save a list of file 
timestamps).

Alternatively, pass through the host filesystem.

> You have to create these things on the fly, because we don't actually
> ship an appliance to end users, just a tiny (<  1 MB) skeleton.  You
> can't ship a massive statically linked appliance to end users because
> it's just unmanageable (think: security; updates; bandwidth).

Shipping it is indeed out of the question.  But on-the-fly creation is 
not the only alternative.

> Loading the initrd currently takes 115ms (or could do, if a sensible
> 50 line patch was permitted).
>
> So the only possible saving would be the 115ms load time of the
> initrd.  In theory the CD-ROM device could be detected in 0 time.
>
> Total saving: 115ms.

815 ms by my arithmetic.  You also save 3*N-2*P memory where N is the 
size of your initrd and P is the actual amount used by the guest.

> But will it be any faster, since after spending 115ms, everything runs
> from memory, versus being loaded from the CD?
>
> Let's face the fact that qemu has suffered from an enormous
> regression.  From some hundreds of milliseconds up to over a minute,
> in the space of 6 months of development.

It wasn't qemu, but kvm.  And it didn't take six months, just a few 
commits.  Those aren't going back, they're a lot more important than 
some libguestfs problem which shouldn't have been coded differently in 
the first place.

> For a very simple operation:
> loading a file into memory.

Loading a file into memory is plenty fast if you use the standard 
interfaces.  -kernel -initrd is a specialized interface.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 14:38             ` Avi Kivity
@ 2010-08-03 14:53               ` Richard W.M. Jones
  2010-08-03 16:10                 ` Avi Kivity
  2010-08-03 16:39                 ` Anthony Liguori
  0 siblings, 2 replies; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-03 14:53 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, Gleb Natapov, kvm

On Tue, Aug 03, 2010 at 05:38:25PM +0300, Avi Kivity wrote:
> The time will only continue to grow as you add features and as the
> distro bloats naturally.
> 
> Much better to create it once and only update it if some dependent
> file changes (basically the current on-the-fly code + save a list of
> file timestamps).

This applies to both cases, the initrd could also be saved, so:

> >Total saving: 115ms.
> 
> 815 ms by my arithmetic.

no, not true, 115ms.

> You also save 3*N-2*P memory where N is the size of your initrd and
> P is the actual amount used by the guest.

Can you explain this?

> Loading a file into memory is plenty fast if you use the standard
> interfaces.  -kernel -initrd is a specialized interface.

Why bother with any command line options at all?  After all, they keep
changing and causing problems for qemu's users ...  Apparently we're
all doing stuff "wrong", in ways that are never explained by the
developers.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 14:53               ` Richard W.M. Jones
@ 2010-08-03 16:10                 ` Avi Kivity
  2010-08-03 16:28                   ` Richard W.M. Jones
  2010-08-03 16:39                 ` Anthony Liguori
  1 sibling, 1 reply; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 16:10 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: qemu-devel, Gleb Natapov, kvm

  On 08/03/2010 05:53 PM, Richard W.M. Jones wrote:
>
>>> Total saving: 115ms.
>> 815 ms by my arithmetic.
> no, not true, 115ms.

If you bypass creating the initrd/cdrom (700 ms) and loading it (115ms) 
you save 815ms.

>> You also save 3*N-2*P memory where N is the size of your initrd and
>> P is the actual amount used by the guest.
> Can you explain this?

(assuming ahead-of-time image generation)

initrd:
   qemu reads image (host pagecache): N
   qemu stores image in RAM: N
   guest copies image to its RAM: N
   guest faults working set (no XIP): P
   total: 3N+P

initramfs:
   qemu reads image (host pagecache): N
   qemu stores image: N
   guest copies image: N
   guest extracts image (XIP): N
   total: 4N

cdrom:
   guest faults working set: P
   kernel faults working set: P
   total: 2P

difference: 3N-P or 4N-2P depending on model


>> Loading a file into memory is plenty fast if you use the standard
>> interfaces.  -kernel -initrd is a specialized interface.
> Why bother with any command line options at all?  After all, they keep
> changing and causing problems for qemu's users ...  Apparently we're
> all doing stuff "wrong", in ways that are never explained by the
> developers.

That's a real problem.  It's hard to explain the intent behind 
something, especially when it's obvious to the author and not so obvious 
to the user.  However making everything do everything under all 
circumstances has its costs.

-kernel and -initrd is a developer's interface intended to make life 
easier for users that use qemu to develop kernels.  It was not intended 
as a high performance DMA engine.  Neither was the firmware 
_configuration_ interface.  That is what virtio and to a lesser extent 
IDE was written to perform.  You'll get much better results from them.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 16:10                 ` Avi Kivity
@ 2010-08-03 16:28                   ` Richard W.M. Jones
  2010-08-03 16:44                     ` Avi Kivity
  0 siblings, 1 reply; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-03 16:28 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, Gleb Natapov, kvm

On Tue, Aug 03, 2010 at 07:10:18PM +0300, Avi Kivity wrote:
> -kernel and -initrd is a developer's interface intended to make life
> easier for users that use qemu to develop kernels.  It was not
> intended as a high performance DMA engine.  Neither was the firmware
> _configuration_ interface.  That is what virtio and to a lesser
> extent IDE was written to perform.  You'll get much better results
> from them.

Firmware configuration replaced something which was already working
really fast -- preloading the images into memory -- with something
which worked slower, and has just recently got _way_ more slow.

This is a regression.  Plain and simple.

I have posted a small patch which makes this 650x faster without
appreciable complication.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://et.redhat.com/~rjones/virt-top

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 16:28                   ` Richard W.M. Jones
@ 2010-08-03 16:44                     ` Avi Kivity
  2010-08-03 16:46                       ` Anthony Liguori
                                         ` (2 more replies)
  0 siblings, 3 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 16:44 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: qemu-devel, Gleb Natapov, kvm

  On 08/03/2010 07:28 PM, Richard W.M. Jones wrote:
> On Tue, Aug 03, 2010 at 07:10:18PM +0300, Avi Kivity wrote:
>> -kernel and -initrd is a developer's interface intended to make life
>> easier for users that use qemu to develop kernels.  It was not
>> intended as a high performance DMA engine.  Neither was the firmware
>> _configuration_ interface.  That is what virtio and to a lesser
>> extent IDE was written to perform.  You'll get much better results
>> from them.
> Firmware configuration replaced something which was already working
> really fast -- preloading the images into memory -- with something
> which worked slower, and has just recently got _way_ more slow.
>
> This is a regression.  Plain and simple.

It's only a regression if there was any intent at making this a 
performant interface.  Otherwise any change an be interpreted as a 
regression.  Even "binary doesn't hash to exact same signature" is a 
regression.

> I have posted a small patch which makes this 650x faster without
> appreciable complication.

It doesn't appear to support live migration, or hiding the feature for 
-M older.

It's not a good path to follow.  Tomorrow we'll need to load 300MB 
initrds and we'll have to rework this yet again.  Meanwhile the kernel 
and virtio support demand loading of any image size you'd want to use.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 16:44                     ` Avi Kivity
@ 2010-08-03 16:46                       ` Anthony Liguori
  2010-08-03 16:50                         ` Avi Kivity
  2010-08-03 16:48                       ` Avi Kivity
  2010-08-03 16:56                       ` Richard W.M. Jones
  2 siblings, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-03 16:46 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

On 08/03/2010 11:44 AM, Avi Kivity wrote:
>  On 08/03/2010 07:28 PM, Richard W.M. Jones wrote:
>> On Tue, Aug 03, 2010 at 07:10:18PM +0300, Avi Kivity wrote:
>>> -kernel and -initrd is a developer's interface intended to make life
>>> easier for users that use qemu to develop kernels.  It was not
>>> intended as a high performance DMA engine.  Neither was the firmware
>>> _configuration_ interface.  That is what virtio and to a lesser
>>> extent IDE was written to perform.  You'll get much better results
>>> from them.
>> Firmware configuration replaced something which was already working
>> really fast -- preloading the images into memory -- with something
>> which worked slower, and has just recently got _way_ more slow.
>>
>> This is a regression.  Plain and simple.
>
> It's only a regression if there was any intent at making this a 
> performant interface.  Otherwise any change an be interpreted as a 
> regression.  Even "binary doesn't hash to exact same signature" is a 
> regression.
>
>> I have posted a small patch which makes this 650x faster without
>> appreciable complication.
>
> It doesn't appear to support live migration, or hiding the feature for 
> -M older.
>
> It's not a good path to follow.  Tomorrow we'll need to load 300MB 
> initrds and we'll have to rework this yet again.  Meanwhile the kernel 
> and virtio support demand loading of any image size you'd want to use.

firmware is totally broken with respect to -M older FWIW.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 16:46                       ` Anthony Liguori
@ 2010-08-03 16:50                         ` Avi Kivity
  2010-08-03 16:53                           ` Anthony Liguori
  2010-08-03 16:56                           ` Anthony Liguori
  0 siblings, 2 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 16:50 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

  On 08/03/2010 07:46 PM, Anthony Liguori wrote:
>> It doesn't appear to support live migration, or hiding the feature 
>> for -M older.
>>
>> It's not a good path to follow.  Tomorrow we'll need to load 300MB 
>> initrds and we'll have to rework this yet again.  Meanwhile the 
>> kernel and virtio support demand loading of any image size you'd want 
>> to use.
>
>
> firmware is totally broken with respect to -M older FWIW.
>

Well, then this is adding to the brokenness.

fwcfg dma is going to have exactly one user, libguestfs.  Much better to 
have libguestfs move to some other interface and improve are 
users-to-interfaces ratio.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 16:50                         ` Avi Kivity
@ 2010-08-03 16:53                           ` Anthony Liguori
  2010-08-03 17:01                             ` Avi Kivity
  2010-08-03 16:56                           ` Anthony Liguori
  1 sibling, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-03 16:53 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

On 08/03/2010 11:50 AM, Avi Kivity wrote:
>  On 08/03/2010 07:46 PM, Anthony Liguori wrote:
>>> It doesn't appear to support live migration, or hiding the feature 
>>> for -M older.
>>>
>>> It's not a good path to follow.  Tomorrow we'll need to load 300MB 
>>> initrds and we'll have to rework this yet again.  Meanwhile the 
>>> kernel and virtio support demand loading of any image size you'd 
>>> want to use.
>>
>>
>> firmware is totally broken with respect to -M older FWIW.
>>
>
> Well, then this is adding to the brokenness.
>
> fwcfg dma is going to have exactly one user, libguestfs.  Much better 
> to have libguestfs move to some other interface and improve are 
> users-to-interfaces ratio.

You mean, only one class of users cares about the performance of loading 
an initrd.  However, you've also argued in other threads how important 
it is not to break libvirt even if it means we have to do silly things 
(like change help text).

So... why is it that libguestfs has to change itself and yet we should 
bend over backwards so libvirt doesn't have to change itself?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 16:53                           ` Anthony Liguori
@ 2010-08-03 17:01                             ` Avi Kivity
  2010-08-03 17:42                               ` Anthony Liguori
  0 siblings, 1 reply; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 17:01 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

  On 08/03/2010 07:53 PM, Anthony Liguori wrote:
> On 08/03/2010 11:50 AM, Avi Kivity wrote:
>>  On 08/03/2010 07:46 PM, Anthony Liguori wrote:
>>>> It doesn't appear to support live migration, or hiding the feature 
>>>> for -M older.
>>>>
>>>> It's not a good path to follow.  Tomorrow we'll need to load 300MB 
>>>> initrds and we'll have to rework this yet again.  Meanwhile the 
>>>> kernel and virtio support demand loading of any image size you'd 
>>>> want to use.
>>>
>>>
>>> firmware is totally broken with respect to -M older FWIW.
>>>
>>
>> Well, then this is adding to the brokenness.
>>
>> fwcfg dma is going to have exactly one user, libguestfs.  Much better 
>> to have libguestfs move to some other interface and improve are 
>> users-to-interfaces ratio.
>
> You mean, only one class of users cares about the performance of 
> loading an initrd.  However, you've also argued in other threads how 
> important it is not to break libvirt even if it means we have to do 
> silly things (like change help text).
>
> So... why is it that libguestfs has to change itself and yet we should 
> bend over backwards so libvirt doesn't have to change itself?

libvirt is a major user that is widely deployed, and would be completely 
broken if we change -help.  Changing -help is of no consequence to us.
libguestfs is a (pardon me) minor user that is not widely used, and 
would suffer a performance regression, not total breakage, unless we add 
a fw-dma interface.  Adding the interface is of consequence to us: we 
have to implement live migration and backwards compatibility, and 
support this new interface for a long while.

In an ideal world we wouldn't tolerate any regression.  The world is not 
ideal, so we prioritize.

the -help change scores very high on benfit/cost.  fw-dma, much lower.

Note in both cases the long term solution is for the user to move to 
another interface (cap reporting, virtio), so adding an interface which 
would only be abandoned later by its only user drops the benfit/cost 
ratio even further.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 17:01                             ` Avi Kivity
@ 2010-08-03 17:42                               ` Anthony Liguori
  2010-08-03 17:58                                 ` Avi Kivity
  0 siblings, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-03 17:42 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

On 08/03/2010 12:01 PM, Avi Kivity wrote:
>> You mean, only one class of users cares about the performance of 
>> loading an initrd.  However, you've also argued in other threads how 
>> important it is not to break libvirt even if it means we have to do 
>> silly things (like change help text).
>>
>> So... why is it that libguestfs has to change itself and yet we 
>> should bend over backwards so libvirt doesn't have to change itself?
>
>
> libvirt is a major user that is widely deployed, and would be 
> completely broken if we change -help.  Changing -help is of no 
> consequence to us.
> libguestfs is a (pardon me) minor user that is not widely used, and 
> would suffer a performance regression, not total breakage, unless we 
> add a fw-dma interface.  Adding the interface is of consequence to us: 
> we have to implement live migration and backwards compatibility, and 
> support this new interface for a long while.

I certainly buy the argument about making changes of little consequence 
to us vs. ones that we have to be concerned about long term.

However, I don't think we can objectively differentiate between a 
"major" and "minor" user.  Generally speaking, I would rather that we 
not take the position of "you are a minor user therefore we're not going 
to accommodate you".

Regards,

Anthony Liguori

>
> In an ideal world we wouldn't tolerate any regression.  The world is 
> not ideal, so we prioritize.
>
> the -help change scores very high on benfit/cost.  fw-dma, much lower.
>
> Note in both cases the long term solution is for the user to move to 
> another interface (cap reporting, virtio), so adding an interface 
> which would only be abandoned later by its only user drops the 
> benfit/cost ratio even further.
>

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 17:42                               ` Anthony Liguori
@ 2010-08-03 17:58                                 ` Avi Kivity
  2010-08-03 18:11                                   ` Richard W.M. Jones
  2010-08-03 18:26                                   ` Anthony Liguori
  0 siblings, 2 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 17:58 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

  On 08/03/2010 08:42 PM, Anthony Liguori wrote:
> However, I don't think we can objectively differentiate between a 
> "major" and "minor" user.  Generally speaking, I would rather that we 
> not take the position of "you are a minor user therefore we're not 
> going to accommodate you".

Again it's a matter of practicalities.  With have written virtio drivers 
for Windows and Linux, but not for FreeDOS or NetWare.  To speed up 
Windows XP we have (in qemu-kvm) kvm-tpr-opt.c that is a gross breach of 
decency, would we go to the same lengths to speed up Haiku?  I suggest 
that we would not.

libvirt and Windows XP did not win "major user" status by making large 
anonymous donations to qemu developers.  They did so by having lots of 
users.  Those users are our end users, and we should be focusing our 
efforts in a way that maximizes the gain for as large a number of those 
end users as we can.

Not breaking libvirt will be unknowingly appreciated by a large number 
of users, every day.  Not slowing down libguestfs, by a much smaller 
number for a much shorter time.  If it were just a matter of changing 
the help text I wouldn't mind at all, but introducing an undocumented 
migration-unsafe broken-dma interface isn't something I'm happy to do.

btw, gaining back some of the speed that we lost _is_ something I want 
to do, since it doesn't break or add any interfaces, and would be a gain 
not just for libguestfs, but also for Windows installs (which use string 
pio extensively).  Richard, can you test kvm.git master?  it already 
contains one fix and we plan to add more.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 17:58                                 ` Avi Kivity
@ 2010-08-03 18:11                                   ` Richard W.M. Jones
  2010-08-03 18:26                                   ` Anthony Liguori
  1 sibling, 0 replies; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-03 18:11 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gleb Natapov, qemu-devel, kvm

On Tue, Aug 03, 2010 at 08:58:10PM +0300, Avi Kivity wrote:
> Richard, can you test kvm.git
> master?  it already contains one fix and we plan to add more.

Yup, I will ...

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 17:58                                 ` Avi Kivity
  2010-08-03 18:11                                   ` Richard W.M. Jones
@ 2010-08-03 18:26                                   ` Anthony Liguori
  2010-08-03 18:43                                     ` Avi Kivity
  1 sibling, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-03 18:26 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

On 08/03/2010 12:58 PM, Avi Kivity wrote:
>  On 08/03/2010 08:42 PM, Anthony Liguori wrote:
>> However, I don't think we can objectively differentiate between a 
>> "major" and "minor" user.  Generally speaking, I would rather that we 
>> not take the position of "you are a minor user therefore we're not 
>> going to accommodate you".
>
> Again it's a matter of practicalities.  With have written virtio 
> drivers for Windows and Linux, but not for FreeDOS or NetWare.  To 
> speed up Windows XP we have (in qemu-kvm) kvm-tpr-opt.c that is a 
> gross breach of decency, would we go to the same lengths to speed up 
> Haiku?  I suggest that we would not.

tpr-opt optimizes a legitimate dependence on the x86 architecture that 
Windows has.  While the implementation may be grossly indecent, it 
certainly fits the overall mission of what we're trying to do in qemu 
and kvm which is emulate an architecture.

You've invested a lot of time and effort into it because it's important 
to you (or more specifically, your employer).  That's because Windows is 
important to you.

If someone as adept and commit as you was heavily invested in Haiku and 
was willing to implement something equivalent to tpr-opt and also 
willing to do all of the work of maintaining it, then reject such a 
patch would be a mistake.

If Richard is willing to do the work to make -kernel perform faster in 
such a way that it fits into the overall mission of what we're building, 
then I see no reason to reject it.  The criteria for evaluating a patch 
should only depend on how it affects other areas of qemu and whether it 
impacts overall usability.

As a side note, we ought to do a better job of removing features that 
have created a burden on other areas of qemu that aren't actively being 
maintained.  That's a different discussion though.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 18:26                                   ` Anthony Liguori
@ 2010-08-03 18:43                                     ` Avi Kivity
  2010-08-03 18:47                                       ` Avi Kivity
                                                         ` (4 more replies)
  0 siblings, 5 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 18:43 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

  On 08/03/2010 09:26 PM, Anthony Liguori wrote:
> On 08/03/2010 12:58 PM, Avi Kivity wrote:
>>  On 08/03/2010 08:42 PM, Anthony Liguori wrote:
>>> However, I don't think we can objectively differentiate between a 
>>> "major" and "minor" user.  Generally speaking, I would rather that 
>>> we not take the position of "you are a minor user therefore we're 
>>> not going to accommodate you".
>>
>> Again it's a matter of practicalities.  With have written virtio 
>> drivers for Windows and Linux, but not for FreeDOS or NetWare.  To 
>> speed up Windows XP we have (in qemu-kvm) kvm-tpr-opt.c that is a 
>> gross breach of decency, would we go to the same lengths to speed up 
>> Haiku?  I suggest that we would not.
>
> tpr-opt optimizes a legitimate dependence on the x86 architecture that 
> Windows has.  While the implementation may be grossly indecent, it 
> certainly fits the overall mission of what we're trying to do in qemu 
> and kvm which is emulate an architecture.
>
> You've invested a lot of time and effort into it because it's 
> important to you (or more specifically, your employer).  That's 
> because Windows is important to you.

Correct.

>
> If someone as adept and commit as you was heavily invested in Haiku 
> and was willing to implement something equivalent to tpr-opt and also 
> willing to do all of the work of maintaining it, then reject such a 
> patch would be a mistake.

libguestfs does not depend on an x86 architectural feature.  
qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We should 
discourage people from depending on this interface for production use.

>
> If Richard is willing to do the work to make -kernel perform faster in 
> such a way that it fits into the overall mission of what we're 
> building, then I see no reason to reject it.  The criteria for 
> evaluating a patch should only depend on how it affects other areas of 
> qemu and whether it impacts overall usability.

That's true, but extending fwcfg doesn't fit into the overall picture 
well.  We have well defined interfaces for pushing data into a guest: 
virtio-serial (dma upload), virtio-blk (adds demand paging), and 
virtio-p9fs (no image needed).  Adapting libguestfs to use one of these 
is a better move than adding yet another interface.

A better (though still inaccurate) analogy is would be if the developers 
of a guest OS came up with a virtual bus for devices and were willing to 
do the work to make this bus perform better.  Would we accept this new 
work or would we point them at our existing bus (pci) instead?

Really, the bar on new interfaces (both to guest and host) should be 
high, much higher than it is now.  Interfaces should be well documented, 
future proof, migration safe, and orthogonal to existing interfaces.  
While the first three points could be improved with some effort, adding 
a new dma interface is not going to be orthogonal to virtio.  And 
frankly, libguestfs is better off switching to one of the other 
interfaces.  Slurping huge initrds isn't the right way to do this.

> As a side note, we ought to do a better job of removing features that 
> have created a burden on other areas of qemu that aren't actively 
> being maintained.  That's a different discussion though.

Sure, we need something like Linux' 
Documentation/feature-removal-schedule.txt for people to ignore.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 18:43                                     ` Avi Kivity
@ 2010-08-03 18:47                                       ` Avi Kivity
  2010-08-03 18:55                                       ` Anthony Liguori
                                                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 18:47 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

  On 08/03/2010 09:43 PM, Avi Kivity wrote:
> Really, the bar on new interfaces (both to guest and host) should be 
> high, much higher than it is now.  Interfaces should be well 
> documented, future proof, migration safe, and orthogonal to existing 
> interfaces.  While the first three points could be improved with some 
> effort, adding a new dma interface is not going to be orthogonal to 
> virtio.  And frankly, libguestfs is better off switching to one of the 
> other interfaces.  Slurping huge initrds isn't the right way to do this.

btw, precedent should play no role here.  Just because an older 
interfaces wasn't documented or migration safe or unit-tested doesn't 
mean new ones get off the hook.

It does help to have a framework in place that we can point people at, 
for example I added a skeleton Documentation/kvm/api.txt and some unit 
tests and then made contributors fill them in for new features.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 18:43                                     ` Avi Kivity
  2010-08-03 18:47                                       ` Avi Kivity
@ 2010-08-03 18:55                                       ` Anthony Liguori
  2010-08-03 19:00                                         ` Avi Kivity
  2010-08-03 19:05                                       ` Gleb Natapov
                                                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-03 18:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

On 08/03/2010 01:43 PM, Avi Kivity wrote:
>>
>> If Richard is willing to do the work to make -kernel perform faster 
>> in such a way that it fits into the overall mission of what we're 
>> building, then I see no reason to reject it.  The criteria for 
>> evaluating a patch should only depend on how it affects other areas 
>> of qemu and whether it impacts overall usability.
>
> That's true, but extending fwcfg doesn't fit into the overall picture 
> well.  We have well defined interfaces for pushing data into a guest: 
> virtio-serial (dma upload), virtio-blk (adds demand paging), and 
> virtio-p9fs (no image needed).  Adapting libguestfs to use one of 
> these is a better move than adding yet another interface.

On real hardware, there's an awful lot of interaction between the 
firmware and the platform.  It's a pretty rich interface.  On IBM 
systems, we actually extend that all the way down to userspace via a 
virtual USB RNDIS driver that you can use IPMI over.

> A better (though still inaccurate) analogy is would be if the 
> developers of a guest OS came up with a virtual bus for devices and 
> were willing to do the work to make this bus perform better.  Would we 
> accept this new work or would we point them at our existing bus (pci) 
> instead?

Doesn't this precisely describe virtio-s390?

>
> Really, the bar on new interfaces (both to guest and host) should be 
> high, much higher than it is now.  Interfaces should be well 
> documented, future proof, migration safe, and orthogonal to existing 
> interfaces.

Okay, but this is a bigger discussion that I'm very eager to have.  But 
we shouldn't explicitly apply new policies to random patches without 
clearly stating the policy up front.

Regards,

Anthony Liguori

>   While the first three points could be improved with some effort, 
> adding a new dma interface is not going to be orthogonal to virtio.  
> And frankly, libguestfs is better off switching to one of the other 
> interfaces.  Slurping huge initrds isn't the right way to do this.
>
>> As a side note, we ought to do a better job of removing features that 
>> have created a burden on other areas of qemu that aren't actively 
>> being maintained.  That's a different discussion though.
>
> Sure, we need something like Linux' 
> Documentation/feature-removal-schedule.txt for people to ignore.
>

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 18:55                                       ` Anthony Liguori
@ 2010-08-03 19:00                                         ` Avi Kivity
  0 siblings, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 19:00 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

  On 08/03/2010 09:55 PM, Anthony Liguori wrote:
> On 08/03/2010 01:43 PM, Avi Kivity wrote:
>>>
>>> If Richard is willing to do the work to make -kernel perform faster 
>>> in such a way that it fits into the overall mission of what we're 
>>> building, then I see no reason to reject it.  The criteria for 
>>> evaluating a patch should only depend on how it affects other areas 
>>> of qemu and whether it impacts overall usability.
>>
>> That's true, but extending fwcfg doesn't fit into the overall picture 
>> well.  We have well defined interfaces for pushing data into a guest: 
>> virtio-serial (dma upload), virtio-blk (adds demand paging), and 
>> virtio-p9fs (no image needed).  Adapting libguestfs to use one of 
>> these is a better move than adding yet another interface.
>
> On real hardware, there's an awful lot of interaction between the 
> firmware and the platform.  It's a pretty rich interface.  On IBM 
> systems, we actually extend that all the way down to userspace via a 
> virtual USB RNDIS driver that you can use IPMI over.

That is fine and we'll do pv interfaces when we have to.  That's fwfg, 
that's virtio.  But let's not do more than we have to.

>
>> A better (though still inaccurate) analogy is would be if the 
>> developers of a guest OS came up with a virtual bus for devices and 
>> were willing to do the work to make this bus perform better.  Would 
>> we accept this new work or would we point them at our existing bus 
>> (pci) instead?
>
> Doesn't this precisely describe virtio-s390?

As I understood it, s390 had good reasons not to use their native 
interfaces.  On x86 we have no good reason not to use pci and no good 
reason not to use virtio for dma.

>>
>> Really, the bar on new interfaces (both to guest and host) should be 
>> high, much higher than it is now.  Interfaces should be well 
>> documented, future proof, migration safe, and orthogonal to existing 
>> interfaces.
>
> Okay, but this is a bigger discussion that I'm very eager to have.  
> But we shouldn't explicitly apply new policies to random patches 
> without clearly stating the policy up front.
>

Migration safety has been part of the criteria for a while.  Future 
proofness less so.  Documentation was usually completely missing but I 
see no reason not to insist on it now, better late than never.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 18:43                                     ` Avi Kivity
  2010-08-03 18:47                                       ` Avi Kivity
  2010-08-03 18:55                                       ` Anthony Liguori
@ 2010-08-03 19:05                                       ` Gleb Natapov
  2010-08-03 19:09                                         ` Avi Kivity
  2010-08-03 19:15                                         ` Anthony Liguori
  2010-08-03 19:13                                       ` Richard W.M. Jones
  2010-08-04 14:51                                       ` David S. Ahern
  4 siblings, 2 replies; 151+ messages in thread
From: Gleb Natapov @ 2010-08-03 19:05 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, qemu-devel

On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote:
> >
> >If Richard is willing to do the work to make -kernel perform
> >faster in such a way that it fits into the overall mission of what
> >we're building, then I see no reason to reject it.  The criteria
> >for evaluating a patch should only depend on how it affects other
> >areas of qemu and whether it impacts overall usability.
> 
> That's true, but extending fwcfg doesn't fit into the overall
> picture well.  We have well defined interfaces for pushing data into
> a guest: virtio-serial (dma upload), virtio-blk (adds demand
> paging), and virtio-p9fs (no image needed).  Adapting libguestfs to
> use one of these is a better move than adding yet another interface.
> 
+1. I already proposed that. Nobody objects against fast fast
communication channel between guest and host. In fact we have one:
virtio-serial. Of course it is much easier to hack dma semantic into
fw_cfg interface than add virtio-serial to seabios, but it doesn't make
it right. Does virtio-serial has to be exposed as PCI to a guest or can
we expose it as ISA device too in case someone want to use -kernel option
but do not see additional PCI device in a guest?

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 19:05                                       ` Gleb Natapov
@ 2010-08-03 19:09                                         ` Avi Kivity
  2010-08-03 19:15                                         ` Anthony Liguori
  1 sibling, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 19:09 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Richard W.M. Jones, qemu-devel

  On 08/03/2010 10:05 PM, Gleb Natapov wrote:
>
>> That's true, but extending fwcfg doesn't fit into the overall
>> picture well.  We have well defined interfaces for pushing data into
>> a guest: virtio-serial (dma upload), virtio-blk (adds demand
>> paging), and virtio-p9fs (no image needed).  Adapting libguestfs to
>> use one of these is a better move than adding yet another interface.
>>
> +1. I already proposed that. Nobody objects against fast fast
> communication channel between guest and host. In fact we have one:
> virtio-serial. Of course it is much easier to hack dma semantic into
> fw_cfg interface than add virtio-serial to seabios, but it doesn't make
> it right. Does virtio-serial has to be exposed as PCI to a guest or can
> we expose it as ISA device too in case someone want to use -kernel option
> but do not see additional PCI device in a guest?

No need for virtio-serial in firmware.  We can have a small initrd slurp 
a larger filesystem via virtio-serial, or mount a virtio-blk or 
virtio-p9fs, or boot the whole thing from a virtio-blk image and avoid 
-kernel -initrd completely.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 19:05                                       ` Gleb Natapov
  2010-08-03 19:09                                         ` Avi Kivity
@ 2010-08-03 19:15                                         ` Anthony Liguori
  2010-08-03 19:24                                           ` Avi Kivity
  2010-08-03 19:26                                           ` Gleb Natapov
  1 sibling, 2 replies; 151+ messages in thread
From: Anthony Liguori @ 2010-08-03 19:15 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: qemu-devel, Avi Kivity, kvm, Richard W.M. Jones

On 08/03/2010 02:05 PM, Gleb Natapov wrote:
> On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote:
>    
>>> If Richard is willing to do the work to make -kernel perform
>>> faster in such a way that it fits into the overall mission of what
>>> we're building, then I see no reason to reject it.  The criteria
>>> for evaluating a patch should only depend on how it affects other
>>> areas of qemu and whether it impacts overall usability.
>>>        
>> That's true, but extending fwcfg doesn't fit into the overall
>> picture well.  We have well defined interfaces for pushing data into
>> a guest: virtio-serial (dma upload), virtio-blk (adds demand
>> paging), and virtio-p9fs (no image needed).  Adapting libguestfs to
>> use one of these is a better move than adding yet another interface.
>>
>>      
> +1. I already proposed that. Nobody objects against fast fast
> communication channel between guest and host. In fact we have one:
> virtio-serial. Of course it is much easier to hack dma semantic into
> fw_cfg interface than add virtio-serial to seabios, but it doesn't make
> it right. Does virtio-serial has to be exposed as PCI to a guest or can
> we expose it as ISA device too in case someone want to use -kernel option
> but do not see additional PCI device in a guest?
>    

fw_cfg has to be available pretty early on so relying on a PCI device 
isn't reasonable.  Having dual interfaces seems wasteful.

We're already doing bulk data transfer over fw_cfg as we need to do it 
to transfer roms and potentially a boot splash.  Even outside of loading 
an initrd, the performance is going to start to matter with a large 
number of devices.

Regards,

Anthony Liguori

> --
> 			Gleb.
>    

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 19:15                                         ` Anthony Liguori
@ 2010-08-03 19:24                                           ` Avi Kivity
  2010-08-03 19:38                                             ` Anthony Liguori
                                                               ` (2 more replies)
  2010-08-03 19:26                                           ` Gleb Natapov
  1 sibling, 3 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 19:24 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

  On 08/03/2010 10:15 PM, Anthony Liguori wrote:
>
> fw_cfg has to be available pretty early on so relying on a PCI device 
> isn't reasonable.  Having dual interfaces seems wasteful.

Agree.

>
> We're already doing bulk data transfer over fw_cfg as we need to do it 
> to transfer roms and potentially a boot splash. 

Why do we need to transfer roms?  These are devices on the memory bus or 
pci bus, it just needs to be there at the right address.  Boot splash 
should just be another rom as it would be on a real system.

> Even outside of loading an initrd, the performance is going to start 
> to matter with a large number of devices.

I don't really see why.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 19:24                                           ` Avi Kivity
@ 2010-08-03 19:38                                             ` Anthony Liguori
  2010-08-03 19:41                                               ` Avi Kivity
  2010-08-03 21:20                                             ` Gerd Hoffmann
  2010-08-03 22:06                                             ` Richard W.M. Jones
  2 siblings, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-03 19:38 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

On 08/03/2010 02:24 PM, Avi Kivity wrote:
>  On 08/03/2010 10:15 PM, Anthony Liguori wrote:
>>
>> fw_cfg has to be available pretty early on so relying on a PCI device 
>> isn't reasonable.  Having dual interfaces seems wasteful.
>
> Agree.
>
>>
>> We're already doing bulk data transfer over fw_cfg as we need to do 
>> it to transfer roms and potentially a boot splash. 
>
> Why do we need to transfer roms?  These are devices on the memory bus 
> or pci bus, it just needs to be there at the right address.

Not quite.  The BIOS owns the option ROM space.  The way it works on 
bare metal is that the PCI ROM BAR gets mapped to some location in 
physical memory by the BIOS, the BIOS executes the initialization 
vector, and after initialization, the ROM will reorganize itself into 
something smaller.  It's nice and clean.

But ISA is not nearly as clean.  Ultimately, to make this mix work in a 
reasonable way, we have to provide a side channel interface to SeaBIOS 
such that we can deliver ROMs outside of PCI and still let SeaBIOS 
decide how ROMs get organized.

It's additionally complicated by the fact that we didn't support PCI ROM 
BAR until recently so to maintain compatibility with -M older, we have 
to use a side channel to lay out option roms.

Regards,

Anthony Liguori

>   Boot splash should just be another rom as it would be on a real system.
>
>> Even outside of loading an initrd, the performance is going to start 
>> to matter with a large number of devices.
>
> I don't really see why.
>

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 19:38                                             ` Anthony Liguori
@ 2010-08-03 19:41                                               ` Avi Kivity
  2010-08-03 19:47                                                 ` Anthony Liguori
  2010-08-03 21:24                                                 ` Gerd Hoffmann
  0 siblings, 2 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 19:41 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

  On 08/03/2010 10:38 PM, Anthony Liguori wrote:
>> Why do we need to transfer roms?  These are devices on the memory bus 
>> or pci bus, it just needs to be there at the right address.
>
>
> Not quite.  The BIOS owns the option ROM space.  The way it works on 
> bare metal is that the PCI ROM BAR gets mapped to some location in 
> physical memory by the BIOS, the BIOS executes the initialization 
> vector, and after initialization, the ROM will reorganize itself into 
> something smaller.  It's nice and clean.
>
> But ISA is not nearly as clean. 

So far so good.

> Ultimately, to make this mix work in a reasonable way, we have to 
> provide a side channel interface to SeaBIOS such that we can deliver 
> ROMs outside of PCI and still let SeaBIOS decide how ROMs get organized.

I don't follow.  Why do we need this side channel?  What would a real 
ISA machine do?  Are there actually enough ISA devices for there to be a 
problem?

>
> It's additionally complicated by the fact that we didn't support PCI 
> ROM BAR until recently so to maintain compatibility with -M older, we 
> have to use a side channel to lay out option roms.

Again I don't follow.  We can just lay out the ROMs in memory like we 
did in the past?

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 19:41                                               ` Avi Kivity
@ 2010-08-03 19:47                                                 ` Anthony Liguori
  2010-08-04  5:47                                                   ` Avi Kivity
  2010-08-03 21:24                                                 ` Gerd Hoffmann
  1 sibling, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-03 19:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

On 08/03/2010 02:41 PM, Avi Kivity wrote:
>  On 08/03/2010 10:38 PM, Anthony Liguori wrote:
>>> Why do we need to transfer roms?  These are devices on the memory 
>>> bus or pci bus, it just needs to be there at the right address.
>>
>>
>> Not quite.  The BIOS owns the option ROM space.  The way it works on 
>> bare metal is that the PCI ROM BAR gets mapped to some location in 
>> physical memory by the BIOS, the BIOS executes the initialization 
>> vector, and after initialization, the ROM will reorganize itself into 
>> something smaller.  It's nice and clean.
>>
>> But ISA is not nearly as clean. 
>
> So far so good.
>
>> Ultimately, to make this mix work in a reasonable way, we have to 
>> provide a side channel interface to SeaBIOS such that we can deliver 
>> ROMs outside of PCI and still let SeaBIOS decide how ROMs get organized.
>
> I don't follow.  Why do we need this side channel?  What would a real 
> ISA machine do?

It depends on the ISA machine.  In the worst case, there's a DIP switch 
on the card and if you've got a conflict between two cards, you start 
flipping DIP switches.  It's pure awesomeness.  No, I don't want to 
emulate DIP switches :-)

>   Are there actually enough ISA devices for there to be a problem?

No, but -M older has the same problem.

>>
>> It's additionally complicated by the fact that we didn't support PCI 
>> ROM BAR until recently so to maintain compatibility with -M older, we 
>> have to use a side channel to lay out option roms.
>
> Again I don't follow.  We can just lay out the ROMs in memory like we 
> did in the past?

Because only one component can own the option ROM space.  Either that's 
SeaBIOS and we need a side channel or it's QEMU and we can't use PMM.

I guess that's the real issue here.  Previously we used etherboot which 
was well under 32k.  We only loaded roms we needed.  Now we use gPXE 
which is much bigger and if you don't use PMM, then you run out of 
option rom space very quickly.

Previously, we loaded option ROMs on demand when a user used -boot n but 
that was a giant hack and wasn't like bare metal at all.  It involved 
x86-isms in vl.c.  Now we always load ROMs so PMM is very important.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 19:47                                                 ` Anthony Liguori
@ 2010-08-04  5:47                                                   ` Avi Kivity
  0 siblings, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04  5:47 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

  On 08/03/2010 10:47 PM, Anthony Liguori wrote:
> On 08/03/2010 02:41 PM, Avi Kivity wrote:
>>  On 08/03/2010 10:38 PM, Anthony Liguori wrote:
>>>> Why do we need to transfer roms?  These are devices on the memory 
>>>> bus or pci bus, it just needs to be there at the right address.
>>>
>>>
>>> Not quite.  The BIOS owns the option ROM space.  The way it works on 
>>> bare metal is that the PCI ROM BAR gets mapped to some location in 
>>> physical memory by the BIOS, the BIOS executes the initialization 
>>> vector, and after initialization, the ROM will reorganize itself 
>>> into something smaller.  It's nice and clean.
>>>
>>> But ISA is not nearly as clean. 
>>
>> So far so good.
>>
>>> Ultimately, to make this mix work in a reasonable way, we have to 
>>> provide a side channel interface to SeaBIOS such that we can deliver 
>>> ROMs outside of PCI and still let SeaBIOS decide how ROMs get 
>>> organized.
>>
>> I don't follow.  Why do we need this side channel?  What would a real 
>> ISA machine do?
>
> It depends on the ISA machine.  In the worst case, there's a DIP 
> switch on the card and if you've got a conflict between two cards, you 
> start flipping DIP switches.  It's pure awesomeness.  No, I don't want 
> to emulate DIP switches :-)

How else do you set the IRQ line and I/O port base address?

  static ISADeviceInfo ne2000_isa_info = {
      .qdev.name  = "ne2k_isa",
      .qdev.size  = sizeof(ISANE2000State),
      .init       = isa_ne2000_initfn,
      .qdev.props = (Property[]) {
          DEFINE_PROP_HEX32("iobase", ISANE2000State, iobase, 0x300),
          DEFINE_PROP_UINT32("irq",   ISANE2000State, isairq, 9),
+      DEFINE_PROP_HEX32("rombase", ISANE2000State, isarombase, 0xe8000),
          DEFINE_NIC_PROPERTIES(ISANE2000State, ne2000.c),
          DEFINE_PROP_END_OF_LIST(),
      },
  };


we already are emulating DIP switches...

>
>>   Are there actually enough ISA devices for there to be a problem?
>
> No, but -M older has the same problem.

So we do the same solution we did in older.  We didn't have fwcfg dma 
back then.

>
>>>
>>> It's additionally complicated by the fact that we didn't support PCI 
>>> ROM BAR until recently so to maintain compatibility with -M older, 
>>> we have to use a side channel to lay out option roms.
>>
>> Again I don't follow.  We can just lay out the ROMs in memory like we 
>> did in the past?
>
> Because only one component can own the option ROM space.  Either 
> that's SeaBIOS and we need a side channel or it's QEMU and we can't 
> use PMM.
>
> I guess that's the real issue here.  Previously we used etherboot 
> which was well under 32k.  We only loaded roms we needed.  Now we use 
> gPXE which is much bigger and if you don't use PMM, then you run out 
> of option rom space very quickly.

A true -M older would use the older ROMs for full compatibility.

>
> Previously, we loaded option ROMs on demand when a user used -boot n 
> but that was a giant hack and wasn't like bare metal at all.  It 
> involved x86-isms in vl.c.  Now we always load ROMs so PMM is very 
> important.

Though it's a hack, we can load ROMs via the existing fwcfg interface; 
no need for an extension.  Richard is seeing problems loading 100MB 
initrds, not 64KB ROMs.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 19:41                                               ` Avi Kivity
  2010-08-03 19:47                                                 ` Anthony Liguori
@ 2010-08-03 21:24                                                 ` Gerd Hoffmann
  1 sibling, 0 replies; 151+ messages in thread
From: Gerd Hoffmann @ 2010-08-03 21:24 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm, Gleb Natapov, Richard W.M. Jones

   Hi,

> Again I don't follow. We can just lay out the ROMs in memory like we did
> in the past?

Well.  We have some size issues then.  PCI ROMS are loaded by the BIOS 
in a way that only a small fraction is actually resident in the small 
0xd0000 -> 0xe0000 area.  That doesn't work if qemu tries to simply copy 
the whole thing there like old versions did.  With the size of the gPXE 
roms this matters in real life.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 19:24                                           ` Avi Kivity
  2010-08-03 19:38                                             ` Anthony Liguori
@ 2010-08-03 21:20                                             ` Gerd Hoffmann
  2010-08-04  5:53                                               ` Avi Kivity
  2010-08-03 22:06                                             ` Richard W.M. Jones
  2 siblings, 1 reply; 151+ messages in thread
From: Gerd Hoffmann @ 2010-08-03 21:20 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm, Gleb Natapov, Richard W.M. Jones

   Hi,

>> We're already doing bulk data transfer over fw_cfg as we need to do it
>> to transfer roms and potentially a boot splash.
>
> Why do we need to transfer roms? These are devices on the memory bus or
> pci bus, it just needs to be there at the right address.

Indeed.  We do that in most cases.  The exceptions are:

   (1) -M somethingold.  PCI devices don't have a pci rom bar then by
       default because they didn't not have one in older qemu versions,
       so we need some other way to pass the option rom to seabios.
   (2) vgabios.bin.  vgabios needs patches to make loading via pci rom
       bar work (vgabios-cirrus.bin works fine already).  I have patches
       in the queue to do that.
   (3) roms not associated with a PCI device:  multiboot, extboot,
       -option-rom command line switch, vgabios for -M isapc.

The default configuration (qemu $diskimage) loads two roms: 
vgabios-cirrus.bin and e1000.bin.  Both are loaded via pci rom bar and 
not via fw_cfg.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 21:20                                             ` Gerd Hoffmann
@ 2010-08-04  5:53                                               ` Avi Kivity
  2010-08-04  7:56                                                 ` Gerd Hoffmann
  0 siblings, 1 reply; 151+ messages in thread
From: Avi Kivity @ 2010-08-04  5:53 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: qemu-devel, kvm, Gleb Natapov, Richard W.M. Jones

  On 08/04/2010 12:20 AM, Gerd Hoffmann wrote:
>   Hi,
>
>>> We're already doing bulk data transfer over fw_cfg as we need to do it
>>> to transfer roms and potentially a boot splash.
>>
>> Why do we need to transfer roms? These are devices on the memory bus or
>> pci bus, it just needs to be there at the right address.
>
> Indeed.  We do that in most cases.  The exceptions are:
>
>   (1) -M somethingold.  PCI devices don't have a pci rom bar then by
>       default because they didn't not have one in older qemu versions,
>       so we need some other way to pass the option rom to seabios.

What did we do back then?  before we had the fwcfg interface?

>   (2) vgabios.bin.  vgabios needs patches to make loading via pci rom
>       bar work (vgabios-cirrus.bin works fine already).  I have patches
>       in the queue to do that.

So not an issue.

>   (3) roms not associated with a PCI device:  multiboot, extboot,
>       -option-rom command line switch, vgabios for -M isapc.

We could lay those out in high memory (4GB-512MB) and have the bios copy 
them from there.  I believe that's what real hardware does - the flash 
chip is mapped there (the reset vector is at 4GB-16) and shadowed at the 
end of the 1MB 8086 range.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04  5:53                                               ` Avi Kivity
@ 2010-08-04  7:56                                                 ` Gerd Hoffmann
  2010-08-04  8:17                                                   ` Avi Kivity
  0 siblings, 1 reply; 151+ messages in thread
From: Gerd Hoffmann @ 2010-08-04  7:56 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm, Gleb Natapov, Richard W.M. Jones

   Hi,

>> (1) -M somethingold. PCI devices don't have a pci rom bar then by
>> default because they didn't not have one in older qemu versions,
>> so we need some other way to pass the option rom to seabios.
>
> What did we do back then? before we had the fwcfg interface?

Have qemu instead of bochs/seabios manage the vgabios/optionrom area 
(0xc8000 -> 0xe0000) and copy the roms to memory.  Which implies the 
whole rom has to sit there as PMM can't be used then.

>> (3) roms not associated with a PCI device: multiboot, extboot,
>> -option-rom command line switch, vgabios for -M isapc.
>
> We could lay those out in high memory (4GB-512MB) and have the bios copy
> them from there.

Yea, we could.  But it is pointless IMHO.

$ ls -l *.bin
-rwxrwxr-x. 1 kraxel kraxel 1536 Jul 15 15:51 extboot.bin*
-rwxrwxr-x. 1 kraxel kraxel 1024 Jul 15 15:51 linuxboot.bin*
-rwxrwxr-x. 1 kraxel kraxel 1024 Jul 15 15:51 multiboot.bin*
-rwxrwxr-x. 1 kraxel kraxel 8960 Jul 15 15:51 vapic.bin*

That are the ones we can't load via pci rom bar.  Look how small they are.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04  7:56                                                 ` Gerd Hoffmann
@ 2010-08-04  8:17                                                   ` Avi Kivity
  2010-08-04  8:43                                                     ` Gleb Natapov
                                                                       ` (2 more replies)
  0 siblings, 3 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04  8:17 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: qemu-devel, kvm, Gleb Natapov, Richard W.M. Jones

  On 08/04/2010 10:56 AM, Gerd Hoffmann wrote:
>   Hi,
>
>>> (1) -M somethingold. PCI devices don't have a pci rom bar then by
>>> default because they didn't not have one in older qemu versions,
>>> so we need some other way to pass the option rom to seabios.
>>
>> What did we do back then? before we had the fwcfg interface?
>
> Have qemu instead of bochs/seabios manage the vgabios/optionrom area 
> (0xc8000 -> 0xe0000) and copy the roms to memory.  Which implies the 
> whole rom has to sit there as PMM can't be used then.

Do we actually need PMM for isapc?  Did PMM exist before pci?

>
>>> (3) roms not associated with a PCI device: multiboot, extboot,
>>> -option-rom command line switch, vgabios for -M isapc.
>>
>> We could lay those out in high memory (4GB-512MB) and have the bios copy
>> them from there.
>
> Yea, we could.  But it is pointless IMHO.
>
> $ ls -l *.bin
> -rwxrwxr-x. 1 kraxel kraxel 1536 Jul 15 15:51 extboot.bin*
> -rwxrwxr-x. 1 kraxel kraxel 1024 Jul 15 15:51 linuxboot.bin*
> -rwxrwxr-x. 1 kraxel kraxel 1024 Jul 15 15:51 multiboot.bin*
> -rwxrwxr-x. 1 kraxel kraxel 8960 Jul 15 15:51 vapic.bin*
>
> That are the ones we can't load via pci rom bar.  Look how small they 
> are.

So they can just sit there?  I'm confused, either there is enough 
address space and we don't need to play games, or there isn't and we do.

For playing games, there are three options:
- existing fwcfg
- fwcfg+dma
- put roms in 4GB-2MB (or whatever we decide the flash size is) and have 
the BIOS copy them

Existing fwcfg is the least amount of work and probably satisfactory for 
isapc.  fwcfg+dma is IMO going off a tangent.  High memory flash is the 
most hardware-like solution, pretty easy from a qemu point of view but 
requires more work.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04  8:17                                                   ` Avi Kivity
@ 2010-08-04  8:43                                                     ` Gleb Natapov
  2010-08-04  9:22                                                     ` Gerd Hoffmann
  2010-08-04 13:04                                                     ` Anthony Liguori
  2 siblings, 0 replies; 151+ messages in thread
From: Gleb Natapov @ 2010-08-04  8:43 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm, Gerd Hoffmann, Richard W.M. Jones

On Wed, Aug 04, 2010 at 11:17:28AM +0300, Avi Kivity wrote:
>  On 08/04/2010 10:56 AM, Gerd Hoffmann wrote:
> >  Hi,
> >
> >>>(1) -M somethingold. PCI devices don't have a pci rom bar then by
> >>>default because they didn't not have one in older qemu versions,
> >>>so we need some other way to pass the option rom to seabios.
> >>
> >>What did we do back then? before we had the fwcfg interface?
> >
> >Have qemu instead of bochs/seabios manage the vgabios/optionrom
> >area (0xc8000 -> 0xe0000) and copy the roms to memory.  Which
> >implies the whole rom has to sit there as PMM can't be used then.
> 
> Do we actually need PMM for isapc?  Did PMM exist before pci?
> 
> >
> >>>(3) roms not associated with a PCI device: multiboot, extboot,
> >>>-option-rom command line switch, vgabios for -M isapc.
> >>
> >>We could lay those out in high memory (4GB-512MB) and have the bios copy
> >>them from there.
> >
> >Yea, we could.  But it is pointless IMHO.
> >
> >$ ls -l *.bin
> >-rwxrwxr-x. 1 kraxel kraxel 1536 Jul 15 15:51 extboot.bin*
> >-rwxrwxr-x. 1 kraxel kraxel 1024 Jul 15 15:51 linuxboot.bin*
> >-rwxrwxr-x. 1 kraxel kraxel 1024 Jul 15 15:51 multiboot.bin*
> >-rwxrwxr-x. 1 kraxel kraxel 8960 Jul 15 15:51 vapic.bin*
> >
> >That are the ones we can't load via pci rom bar.  Look how small
> >they are.
> 
> So they can just sit there?  I'm confused, either there is enough
> address space and we don't need to play games, or there isn't and we
> do.
> 
> For playing games, there are three options:
> - existing fwcfg
> - fwcfg+dma
> - put roms in 4GB-2MB (or whatever we decide the flash size is) and
> have the BIOS copy them
> 
> Existing fwcfg is the least amount of work and probably satisfactory
> for isapc.  fwcfg+dma is IMO going off a tangent.  High memory flash
> is the most hardware-like solution, pretty easy from a qemu point of
> view but requires more work.
> 
We can do interface like that: guest enumerates available roms using
fwcfg. Guest can tell host to map rom into guest specified IOMEM region.
Guest copies rom from IOMEM region and tell host to unmap it.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04  8:17                                                   ` Avi Kivity
  2010-08-04  8:43                                                     ` Gleb Natapov
@ 2010-08-04  9:22                                                     ` Gerd Hoffmann
  2010-08-04 13:04                                                     ` Anthony Liguori
  2 siblings, 0 replies; 151+ messages in thread
From: Gerd Hoffmann @ 2010-08-04  9:22 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm, Gleb Natapov, Richard W.M. Jones

On 08/04/10 10:17, Avi Kivity wrote:
> On 08/04/2010 10:56 AM, Gerd Hoffmann wrote:
>> Hi,
>>
>>>> (1) -M somethingold. PCI devices don't have a pci rom bar then by
>>>> default because they didn't not have one in older qemu versions,
>>>> so we need some other way to pass the option rom to seabios.
>>>
>>> What did we do back then? before we had the fwcfg interface?
>>
>> Have qemu instead of bochs/seabios manage the vgabios/optionrom area
>> (0xc8000 -> 0xe0000) and copy the roms to memory. Which implies the
>> whole rom has to sit there as PMM can't be used then.
>
> Do we actually need PMM for isapc? Did PMM exist before pci?

I don't know.

>>>> (3) roms not associated with a PCI device: multiboot, extboot,
>>>> -option-rom command line switch, vgabios for -M isapc.
>>>
>>> We could lay those out in high memory (4GB-512MB) and have the bios copy
>>> them from there.
>>
>> Yea, we could. But it is pointless IMHO.
>>
>> $ ls -l *.bin
>> -rwxrwxr-x. 1 kraxel kraxel 1536 Jul 15 15:51 extboot.bin*
>> -rwxrwxr-x. 1 kraxel kraxel 1024 Jul 15 15:51 linuxboot.bin*
>> -rwxrwxr-x. 1 kraxel kraxel 1024 Jul 15 15:51 multiboot.bin*
>> -rwxrwxr-x. 1 kraxel kraxel 8960 Jul 15 15:51 vapic.bin*
>>
>> That are the ones we can't load via pci rom bar. Look how small they are.
>
> So they can just sit there? I'm confused, either there is enough address
> space and we don't need to play games, or there isn't and we do.

Well.  Looks like I should be a bit more verbose.

The old (qemu 0.11) way was to have qemu load roms to memory and 
bochsbios/seabios scan the memory area for option rom signatures to find 
them.  All option roms have to fit in there then, completely:

   vgabios        (~40k)
   etherboot rom  (~32k)
   extboot rom    (~1k)

The new way is to have seabios load roms to memory:

   vgabios         (~40k)
   gPXE rom header (~2k IIRC)
   extboot rom     (~1k)

Thanks to SeaBIOS loading the roms only a small part of the gPXE rom has 
to live in the option rom area, everything else is stored somewhere else 
in high memory (using PMM, don't ask me how this works in detail).  gPXE 
roms are ~56k in size (e1000 even 72k), so they would fill up the option 
rom area pretty quickly if we would load them the old way without PMM.

Another advantage of seabios loading the roms is that parts of the 
0xe0000 segment can be used then.  Seabios size is just a bit more than 
64k, so most of the 0xe0000 -> 0xf0000 area isn't actually used by seabios.

seabios has two ways get the roms:  (1) fw_cfg and (2) pci rom bar.  The 
ones listed above are the ones which have to go through fw_cfg.  There 
are more roms which have to fit into the option rom space (vgabios, one 
gPXE per nic), but these don't depend on fw_cfg.

> For playing games, there are three options:
> - existing fwcfg

Gived the size that is good+fast enougth for the roms IMO.

Kernel+initrd is another story though.  We are talking about megabytes 
not kilobytes then.  Standard fedora initramfs is ~14M on x86_64.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04  8:17                                                   ` Avi Kivity
  2010-08-04  8:43                                                     ` Gleb Natapov
  2010-08-04  9:22                                                     ` Gerd Hoffmann
@ 2010-08-04 13:04                                                     ` Anthony Liguori
  2010-08-04 13:07                                                       ` Gleb Natapov
  2010-08-04 16:25                                                       ` Avi Kivity
  2 siblings, 2 replies; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 13:04 UTC (permalink / raw)
  To: Avi Kivity
  Cc: qemu-devel, kvm, Gerd Hoffmann, Gleb Natapov, Richard W.M. Jones

On 08/04/2010 03:17 AM, Avi Kivity wrote:
> For playing games, there are three options:
> - existing fwcfg
> - fwcfg+dma
> - put roms in 4GB-2MB (or whatever we decide the flash size is) and 
> have the BIOS copy them
>
> Existing fwcfg is the least amount of work and probably satisfactory 
> for isapc.  fwcfg+dma is IMO going off a tangent.  High memory flash 
> is the most hardware-like solution, pretty easy from a qemu point of 
> view but requires more work.

The only trouble I see is that high memory isn't always available.  If 
it's a 32-bit PC and you've exhausted RAM space, then you're only left 
with the PCI hole and it's not clear to me if you can really pull out 
100mb of space there as an option ROM without breaking something.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 13:04                                                     ` Anthony Liguori
@ 2010-08-04 13:07                                                       ` Gleb Natapov
  2010-08-04 13:15                                                         ` Anthony Liguori
  2010-08-04 13:22                                                         ` Richard W.M. Jones
  2010-08-04 16:25                                                       ` Avi Kivity
  1 sibling, 2 replies; 151+ messages in thread
From: Gleb Natapov @ 2010-08-04 13:07 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, kvm, Gerd Hoffmann

On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
> On 08/04/2010 03:17 AM, Avi Kivity wrote:
> >For playing games, there are three options:
> >- existing fwcfg
> >- fwcfg+dma
> >- put roms in 4GB-2MB (or whatever we decide the flash size is)
> >and have the BIOS copy them
> >
> >Existing fwcfg is the least amount of work and probably
> >satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
> >High memory flash is the most hardware-like solution, pretty easy
> >from a qemu point of view but requires more work.
> 
> The only trouble I see is that high memory isn't always available.
> If it's a 32-bit PC and you've exhausted RAM space, then you're only
> left with the PCI hole and it's not clear to me if you can really
> pull out 100mb of space there as an option ROM without breaking
> something.
> 
We can map it on demand. Guest tells qemu to map rom "A" to address X by
writing into some io port. Guest copies rom. Guest tells qemu to unmap
it. Better then DMA interface IMHO.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 13:07                                                       ` Gleb Natapov
@ 2010-08-04 13:15                                                         ` Anthony Liguori
  2010-08-04 13:24                                                           ` Richard W.M. Jones
  2010-08-04 13:34                                                           ` Gleb Natapov
  2010-08-04 13:22                                                         ` Richard W.M. Jones
  1 sibling, 2 replies; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 13:15 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, kvm, Gerd Hoffmann

On 08/04/2010 08:07 AM, Gleb Natapov wrote:
> On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
>    
>> On 08/04/2010 03:17 AM, Avi Kivity wrote:
>>      
>>> For playing games, there are three options:
>>> - existing fwcfg
>>> - fwcfg+dma
>>> - put roms in 4GB-2MB (or whatever we decide the flash size is)
>>> and have the BIOS copy them
>>>
>>> Existing fwcfg is the least amount of work and probably
>>> satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
>>> High memory flash is the most hardware-like solution, pretty easy
>>>        
>> >from a qemu point of view but requires more work.
>>
>> The only trouble I see is that high memory isn't always available.
>> If it's a 32-bit PC and you've exhausted RAM space, then you're only
>> left with the PCI hole and it's not clear to me if you can really
>> pull out 100mb of space there as an option ROM without breaking
>> something.
>>
>>      
> We can map it on demand. Guest tells qemu to map rom "A" to address X by
> writing into some io port. Guest copies rom. Guest tells qemu to unmap
> it. Better then DMA interface IMHO.
>    

That's what I thought too, but in a 32-bit guest using ~3.5GB of RAM, 
where can you safely get 100MB of memory to full map the ROM?  If you're 
going to map chunks at a time, you are basically doing DMA.

And what's the upper limit on ROM size that we impose?  100MB is already 
at the ridiculously large size.

Regards,

Anthony Liguori

> --
> 			Gleb.
>    

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 13:15                                                         ` Anthony Liguori
@ 2010-08-04 13:24                                                           ` Richard W.M. Jones
  2010-08-04 13:26                                                             ` Gleb Natapov
  2010-08-04 16:26                                                             ` Avi Kivity
  2010-08-04 13:34                                                           ` Gleb Natapov
  1 sibling, 2 replies; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-04 13:24 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Avi Kivity, Gleb Natapov, Gerd Hoffmann

On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote:
> On 08/04/2010 08:07 AM, Gleb Natapov wrote:
> >On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
> >>On 08/04/2010 03:17 AM, Avi Kivity wrote:
> >>>For playing games, there are three options:
> >>>- existing fwcfg
> >>>- fwcfg+dma
> >>>- put roms in 4GB-2MB (or whatever we decide the flash size is)
> >>>and have the BIOS copy them
> >>>
> >>>Existing fwcfg is the least amount of work and probably
> >>>satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
> >>>High memory flash is the most hardware-like solution, pretty easy
> >>>from a qemu point of view but requires more work.
> >>
> >>The only trouble I see is that high memory isn't always available.
> >>If it's a 32-bit PC and you've exhausted RAM space, then you're only
> >>left with the PCI hole and it's not clear to me if you can really
> >>pull out 100mb of space there as an option ROM without breaking
> >>something.
> >>
> >We can map it on demand. Guest tells qemu to map rom "A" to address X by
> >writing into some io port. Guest copies rom. Guest tells qemu to unmap
> >it. Better then DMA interface IMHO.
> 
> That's what I thought too, but in a 32-bit guest using ~3.5GB of
> RAM, where can you safely get 100MB of memory to full map the ROM?
> If you're going to map chunks at a time, you are basically doing
> DMA.

It's boot time, so you can just map it over some existing RAM surely?
Linuxboot.bin can work out where to map it so it won't be in any
memory either being used or the target for the copy.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://et.redhat.com/~rjones/libguestfs/
See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 13:24                                                           ` Richard W.M. Jones
@ 2010-08-04 13:26                                                             ` Gleb Natapov
  2010-08-04 14:22                                                               ` Anthony Liguori
  2010-08-04 16:26                                                             ` Avi Kivity
  1 sibling, 1 reply; 151+ messages in thread
From: Gleb Natapov @ 2010-08-04 13:26 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: qemu-devel, kvm, Avi Kivity, Gerd Hoffmann

On Wed, Aug 04, 2010 at 02:24:08PM +0100, Richard W.M. Jones wrote:
> On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote:
> > On 08/04/2010 08:07 AM, Gleb Natapov wrote:
> > >On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
> > >>On 08/04/2010 03:17 AM, Avi Kivity wrote:
> > >>>For playing games, there are three options:
> > >>>- existing fwcfg
> > >>>- fwcfg+dma
> > >>>- put roms in 4GB-2MB (or whatever we decide the flash size is)
> > >>>and have the BIOS copy them
> > >>>
> > >>>Existing fwcfg is the least amount of work and probably
> > >>>satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
> > >>>High memory flash is the most hardware-like solution, pretty easy
> > >>>from a qemu point of view but requires more work.
> > >>
> > >>The only trouble I see is that high memory isn't always available.
> > >>If it's a 32-bit PC and you've exhausted RAM space, then you're only
> > >>left with the PCI hole and it's not clear to me if you can really
> > >>pull out 100mb of space there as an option ROM without breaking
> > >>something.
> > >>
> > >We can map it on demand. Guest tells qemu to map rom "A" to address X by
> > >writing into some io port. Guest copies rom. Guest tells qemu to unmap
> > >it. Better then DMA interface IMHO.
> > 
> > That's what I thought too, but in a 32-bit guest using ~3.5GB of
> > RAM, where can you safely get 100MB of memory to full map the ROM?
> > If you're going to map chunks at a time, you are basically doing
> > DMA.
> 
> It's boot time, so you can just map it over some existing RAM surely?
Not with current qemu. This  is broken now.

> Linuxboot.bin can work out where to map it so it won't be in any
> memory either being used or the target for the copy.
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 13:26                                                             ` Gleb Natapov
@ 2010-08-04 14:22                                                               ` Anthony Liguori
  2010-08-04 14:38                                                                 ` Gleb Natapov
  0 siblings, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 14:22 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: qemu-devel, Gerd Hoffmann, Richard W.M. Jones, kvm, Avi Kivity

On 08/04/2010 08:26 AM, Gleb Natapov wrote:
> On Wed, Aug 04, 2010 at 02:24:08PM +0100, Richard W.M. Jones wrote:
>    
>> On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote:
>>      
>>> On 08/04/2010 08:07 AM, Gleb Natapov wrote:
>>>        
>>>> On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
>>>>          
>>>>> On 08/04/2010 03:17 AM, Avi Kivity wrote:
>>>>>            
>>>>>> For playing games, there are three options:
>>>>>> - existing fwcfg
>>>>>> - fwcfg+dma
>>>>>> - put roms in 4GB-2MB (or whatever we decide the flash size is)
>>>>>> and have the BIOS copy them
>>>>>>
>>>>>> Existing fwcfg is the least amount of work and probably
>>>>>> satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
>>>>>> High memory flash is the most hardware-like solution, pretty easy
>>>>>>              
>>>>> >from a qemu point of view but requires more work.
>>>>>
>>>>> The only trouble I see is that high memory isn't always available.
>>>>> If it's a 32-bit PC and you've exhausted RAM space, then you're only
>>>>> left with the PCI hole and it's not clear to me if you can really
>>>>> pull out 100mb of space there as an option ROM without breaking
>>>>> something.
>>>>>
>>>>>            
>>>> We can map it on demand. Guest tells qemu to map rom "A" to address X by
>>>> writing into some io port. Guest copies rom. Guest tells qemu to unmap
>>>> it. Better then DMA interface IMHO.
>>>>          
>>> That's what I thought too, but in a 32-bit guest using ~3.5GB of
>>> RAM, where can you safely get 100MB of memory to full map the ROM?
>>> If you're going to map chunks at a time, you are basically doing
>>> DMA.
>>>        
>> It's boot time, so you can just map it over some existing RAM surely?
>>      
> Not with current qemu. This  is broken now.
>    

But even if it wasn't it can potentially create havoc.  I think we 
currently believe that the northbridge likely never forwards RAM access 
to a device so this doesn't fit how hardware would work.

More importantly, BIOSes and ROMs do very funny things with RAM.  It's 
not unusual for a ROM to muck with the e820 map to allocate RAM for 
itself which means there's always the chance that we're going to walk 
over RAM being used for something else.

Regards,

Anthony Liguori

>> Linuxboot.bin can work out where to map it so it won't be in any
>> memory either being used or the target for the copy.
>>
>>      
> --
> 			Gleb.
>    

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 14:22                                                               ` Anthony Liguori
@ 2010-08-04 14:38                                                                 ` Gleb Natapov
  2010-08-04 14:50                                                                   ` Anthony Liguori
  0 siblings, 1 reply; 151+ messages in thread
From: Gleb Natapov @ 2010-08-04 14:38 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Gerd Hoffmann, Richard W.M. Jones, kvm, Avi Kivity

On Wed, Aug 04, 2010 at 09:22:22AM -0500, Anthony Liguori wrote:
> On 08/04/2010 08:26 AM, Gleb Natapov wrote:
> >On Wed, Aug 04, 2010 at 02:24:08PM +0100, Richard W.M. Jones wrote:
> >>On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote:
> >>>On 08/04/2010 08:07 AM, Gleb Natapov wrote:
> >>>>On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
> >>>>>On 08/04/2010 03:17 AM, Avi Kivity wrote:
> >>>>>>For playing games, there are three options:
> >>>>>>- existing fwcfg
> >>>>>>- fwcfg+dma
> >>>>>>- put roms in 4GB-2MB (or whatever we decide the flash size is)
> >>>>>>and have the BIOS copy them
> >>>>>>
> >>>>>>Existing fwcfg is the least amount of work and probably
> >>>>>>satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
> >>>>>>High memory flash is the most hardware-like solution, pretty easy
> >>>>>>from a qemu point of view but requires more work.
> >>>>>
> >>>>>The only trouble I see is that high memory isn't always available.
> >>>>>If it's a 32-bit PC and you've exhausted RAM space, then you're only
> >>>>>left with the PCI hole and it's not clear to me if you can really
> >>>>>pull out 100mb of space there as an option ROM without breaking
> >>>>>something.
> >>>>>
> >>>>We can map it on demand. Guest tells qemu to map rom "A" to address X by
> >>>>writing into some io port. Guest copies rom. Guest tells qemu to unmap
> >>>>it. Better then DMA interface IMHO.
> >>>That's what I thought too, but in a 32-bit guest using ~3.5GB of
> >>>RAM, where can you safely get 100MB of memory to full map the ROM?
> >>>If you're going to map chunks at a time, you are basically doing
> >>>DMA.
> >>It's boot time, so you can just map it over some existing RAM surely?
> >Not with current qemu. This  is broken now.
> 
> But even if it wasn't it can potentially create havoc.  I think we
> currently believe that the northbridge likely never forwards RAM
> access to a device so this doesn't fit how hardware would work.
> 
Good point.

> More importantly, BIOSes and ROMs do very funny things with RAM.
> It's not unusual for a ROM to muck with the e820 map to allocate RAM
> for itself which means there's always the chance that we're going to
> walk over RAM being used for something else.
> 
ROM does not muck with the e820. It uses PMM to allocate memory and the
memory it gets is marked as reserved in e820 map.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 14:38                                                                 ` Gleb Natapov
@ 2010-08-04 14:50                                                                   ` Anthony Liguori
  2010-08-04 15:01                                                                     ` Gleb Natapov
  0 siblings, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 14:50 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: qemu-devel, Gerd Hoffmann, Richard W.M. Jones, kvm, Avi Kivity

On 08/04/2010 09:38 AM, Gleb Natapov wrote:
>>
>> But even if it wasn't it can potentially create havoc.  I think we
>> currently believe that the northbridge likely never forwards RAM
>> access to a device so this doesn't fit how hardware would work.
>>
>>      
> Good point.
>
>    
>> More importantly, BIOSes and ROMs do very funny things with RAM.
>> It's not unusual for a ROM to muck with the e820 map to allocate RAM
>> for itself which means there's always the chance that we're going to
>> walk over RAM being used for something else.
>>
>>      
> ROM does not muck with the e820. It uses PMM to allocate memory and the
> memory it gets is marked as reserved in e820 map.
>    

PMM allocations are only valid during the init function's execution.  
It's intention is to enable the use of scratch memory to decompress or 
otherwise modify the ROM to shrink its size.

If a ROM needs memory after the init function, it needs to use the 
traditional tricks to allocate long term memory and the most popular one 
is modifying the e820 tables.

See src/arch/i386/firmware/pcbios/e820mangler.S in gPXE.

Regards,

Anthony Liguori

> --
> 			Gleb.
>    

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 14:50                                                                   ` Anthony Liguori
@ 2010-08-04 15:01                                                                     ` Gleb Natapov
  2010-08-04 15:07                                                                       ` Anthony Liguori
  2010-08-04 22:41                                                                       ` Kevin O'Connor
  0 siblings, 2 replies; 151+ messages in thread
From: Gleb Natapov @ 2010-08-04 15:01 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Gerd Hoffmann, Richard W.M. Jones, kvm, Avi Kivity

On Wed, Aug 04, 2010 at 09:50:55AM -0500, Anthony Liguori wrote:
> On 08/04/2010 09:38 AM, Gleb Natapov wrote:
> >>
> >>But even if it wasn't it can potentially create havoc.  I think we
> >>currently believe that the northbridge likely never forwards RAM
> >>access to a device so this doesn't fit how hardware would work.
> >>
> >Good point.
> >
> >>More importantly, BIOSes and ROMs do very funny things with RAM.
> >>It's not unusual for a ROM to muck with the e820 map to allocate RAM
> >>for itself which means there's always the chance that we're going to
> >>walk over RAM being used for something else.
> >>
> >ROM does not muck with the e820. It uses PMM to allocate memory and the
> >memory it gets is marked as reserved in e820 map.
> 
> PMM allocations are only valid during the init function's execution.
> It's intention is to enable the use of scratch memory to decompress
> or otherwise modify the ROM to shrink its size.
> 
Hm, may be. I read seabios code differently, but may be I misread it.
 
> If a ROM needs memory after the init function, it needs to use the
> traditional tricks to allocate long term memory and the most popular
> one is modifying the e820 tables.
> 
e820 has no in memory format,

> See src/arch/i386/firmware/pcbios/e820mangler.S in gPXE.
so this ugly code intercepts int15 and mangle result. OMG. How this can
even work if more then two ROMs want to do that?

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 15:01                                                                     ` Gleb Natapov
@ 2010-08-04 15:07                                                                       ` Anthony Liguori
  2010-08-04 15:15                                                                         ` Gleb Natapov
  2010-08-04 22:41                                                                       ` Kevin O'Connor
  1 sibling, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 15:07 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: qemu-devel, Gerd Hoffmann, Richard W.M. Jones, kvm, Avi Kivity

On 08/04/2010 10:01 AM, Gleb Natapov wrote:
>
> Hm, may be. I read seabios code differently, but may be I misread it.
>    

The BIOS Boot Specification spells it all out pretty clearly.

>> If a ROM needs memory after the init function, it needs to use the
>> traditional tricks to allocate long term memory and the most popular
>> one is modifying the e820 tables.
>>
>>      
> e820 has no in memory format,
>    

Indeed.

>> See src/arch/i386/firmware/pcbios/e820mangler.S in gPXE.
>>      
> so this ugly code intercepts int15 and mangle result. OMG. How this can
> even work if more then two ROMs want to do that?
>    

You have to save the old handlers and invoke them.  Where do you save 
the old handlers?  There's tricks you can do by trying to use some 
unused vectors and also temporarily using the stack.

But basically, yeah, I'm amazed every time I see a PC boot that it all 
actually works :-)

Regards,

Anthony Liguori

> --
> 			Gleb.
>    

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 15:07                                                                       ` Anthony Liguori
@ 2010-08-04 15:15                                                                         ` Gleb Natapov
  0 siblings, 0 replies; 151+ messages in thread
From: Gleb Natapov @ 2010-08-04 15:15 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Gerd Hoffmann, Richard W.M. Jones, kvm, Avi Kivity

On Wed, Aug 04, 2010 at 10:07:24AM -0500, Anthony Liguori wrote:
> On 08/04/2010 10:01 AM, Gleb Natapov wrote:
> >
> >Hm, may be. I read seabios code differently, but may be I misread it.
> 
> The BIOS Boot Specification spells it all out pretty clearly.
> 
I have the spec. Isn't this enough to be an expert? Or do you mean I
should read it too?

> >>If a ROM needs memory after the init function, it needs to use the
> >>traditional tricks to allocate long term memory and the most popular
> >>one is modifying the e820 tables.
> >>
> >e820 has no in memory format,
> 
> Indeed.
> 
> >>See src/arch/i386/firmware/pcbios/e820mangler.S in gPXE.
> >so this ugly code intercepts int15 and mangle result. OMG. How this can
> >even work if more then two ROMs want to do that?
> 
> You have to save the old handlers and invoke them.  Where do you
> save the old handlers?  There's tricks you can do by trying to use
> some unused vectors and also temporarily using the stack.
> 
> But basically, yeah, I'm amazed every time I see a PC boot that it
> all actually works :-)
> 
Heh.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 15:01                                                                     ` Gleb Natapov
  2010-08-04 15:07                                                                       ` Anthony Liguori
@ 2010-08-04 22:41                                                                       ` Kevin O'Connor
  1 sibling, 0 replies; 151+ messages in thread
From: Kevin O'Connor @ 2010-08-04 22:41 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann, Avi Kivity

On Wed, Aug 04, 2010 at 06:01:54PM +0300, Gleb Natapov wrote:
> On Wed, Aug 04, 2010 at 09:50:55AM -0500, Anthony Liguori wrote:
> > On 08/04/2010 09:38 AM, Gleb Natapov wrote:
> > >ROM does not muck with the e820. It uses PMM to allocate memory and the
> > >memory it gets is marked as reserved in e820 map.

Every ROM is implemented differently - there's no way to really know
what they'll do.

> > PMM allocations are only valid during the init function's execution.
> > It's intention is to enable the use of scratch memory to decompress
> > or otherwise modify the ROM to shrink its size.
> > 
> Hm, may be. I read seabios code differently, but may be I misread it.

There is a PCIv3 extension to PMM which supports long term memory
allocations.  SeaBIOS does implement this.  The base PMM spec though
only supports memory allocations during the POST phase.

-Kevin

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 13:24                                                           ` Richard W.M. Jones
  2010-08-04 13:26                                                             ` Gleb Natapov
@ 2010-08-04 16:26                                                             ` Avi Kivity
  1 sibling, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 16:26 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: qemu-devel, kvm, Gleb Natapov, Gerd Hoffmann

  On 08/04/2010 04:24 PM, Richard W.M. Jones wrote:
>
> It's boot time, so you can just map it over some existing RAM surely?
> Linuxboot.bin can work out where to map it so it won't be in any
> memory either being used or the target for the copy.

There's no such thing as boot time from the host's point of view.  There 
are interfaces and they should work whatever the guest is doing right now.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 13:15                                                         ` Anthony Liguori
  2010-08-04 13:24                                                           ` Richard W.M. Jones
@ 2010-08-04 13:34                                                           ` Gleb Natapov
  2010-08-04 13:52                                                             ` Anthony Liguori
  1 sibling, 1 reply; 151+ messages in thread
From: Gleb Natapov @ 2010-08-04 13:34 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, kvm, Gerd Hoffmann

On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote:
> On 08/04/2010 08:07 AM, Gleb Natapov wrote:
> >On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
> >>On 08/04/2010 03:17 AM, Avi Kivity wrote:
> >>>For playing games, there are three options:
> >>>- existing fwcfg
> >>>- fwcfg+dma
> >>>- put roms in 4GB-2MB (or whatever we decide the flash size is)
> >>>and have the BIOS copy them
> >>>
> >>>Existing fwcfg is the least amount of work and probably
> >>>satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
> >>>High memory flash is the most hardware-like solution, pretty easy
> >>>from a qemu point of view but requires more work.
> >>
> >>The only trouble I see is that high memory isn't always available.
> >>If it's a 32-bit PC and you've exhausted RAM space, then you're only
> >>left with the PCI hole and it's not clear to me if you can really
> >>pull out 100mb of space there as an option ROM without breaking
> >>something.
> >>
> >We can map it on demand. Guest tells qemu to map rom "A" to address X by
> >writing into some io port. Guest copies rom. Guest tells qemu to unmap
> >it. Better then DMA interface IMHO.
> 
> That's what I thought too, but in a 32-bit guest using ~3.5GB of
> RAM, where can you safely get 100MB of memory to full map the ROM?
> If you're going to map chunks at a time, you are basically doing
> DMA.
> 
This is not like DMA event if done in chunks and chunks can be pretty
big. The code that dials with copying may temporary unmap some pci
devices to have more space there.

> And what's the upper limit on ROM size that we impose?  100MB is
> already at the ridiculously large size.
> 
Agree. We have two solutions:
1. Avoid the problem
2. Fix the problem.

Both are fine with me and I prefer 1, but if we are going with 2 I
prefer something sane.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 13:34                                                           ` Gleb Natapov
@ 2010-08-04 13:52                                                             ` Anthony Liguori
  2010-08-04 14:00                                                               ` Gleb Natapov
  2010-08-04 16:30                                                               ` Avi Kivity
  0 siblings, 2 replies; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 13:52 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, kvm, Gerd Hoffmann

On 08/04/2010 08:34 AM, Gleb Natapov wrote:
> On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote:
>    
>> On 08/04/2010 08:07 AM, Gleb Natapov wrote:
>>      
>>> On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
>>>        
>>>> On 08/04/2010 03:17 AM, Avi Kivity wrote:
>>>>          
>>>>> For playing games, there are three options:
>>>>> - existing fwcfg
>>>>> - fwcfg+dma
>>>>> - put roms in 4GB-2MB (or whatever we decide the flash size is)
>>>>> and have the BIOS copy them
>>>>>
>>>>> Existing fwcfg is the least amount of work and probably
>>>>> satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
>>>>> High memory flash is the most hardware-like solution, pretty easy
>>>>>            
>>>> >from a qemu point of view but requires more work.
>>>>
>>>> The only trouble I see is that high memory isn't always available.
>>>> If it's a 32-bit PC and you've exhausted RAM space, then you're only
>>>> left with the PCI hole and it's not clear to me if you can really
>>>> pull out 100mb of space there as an option ROM without breaking
>>>> something.
>>>>
>>>>          
>>> We can map it on demand. Guest tells qemu to map rom "A" to address X by
>>> writing into some io port. Guest copies rom. Guest tells qemu to unmap
>>> it. Better then DMA interface IMHO.
>>>        
>> That's what I thought too, but in a 32-bit guest using ~3.5GB of
>> RAM, where can you safely get 100MB of memory to full map the ROM?
>> If you're going to map chunks at a time, you are basically doing
>> DMA.
>>
>>      
> This is not like DMA event if done in chunks and chunks can be pretty
> big. The code that dials with copying may temporary unmap some pci
> devices to have more space there.
>    

That's a bit complicated because SeaBIOS is managing the PCI devices 
whereas the kernel code is running as an option rom.  I don't know the 
BIOS PCI interfaces well so I don't know how doable this is.

Maybe we're just being too fancy here.

We could rewrite -kernel/-append/-initrd to just generate a floppy image 
in RAM, and just boot from floppy.

Regards,

Anthony Liguori

>    
>> And what's the upper limit on ROM size that we impose?  100MB is
>> already at the ridiculously large size.
>>
>>      
> Agree. We have two solutions:
> 1. Avoid the problem
> 2. Fix the problem.
>
> Both are fine with me and I prefer 1, but if we are going with 2 I
> prefer something sane.
>
> --
> 			Gleb.
>    

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 13:52                                                             ` Anthony Liguori
@ 2010-08-04 14:00                                                               ` Gleb Natapov
  2010-08-04 14:14                                                                 ` Anthony Liguori
  2010-08-04 14:22                                                                 ` Paolo Bonzini
  2010-08-04 16:30                                                               ` Avi Kivity
  1 sibling, 2 replies; 151+ messages in thread
From: Gleb Natapov @ 2010-08-04 14:00 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, kvm, Gerd Hoffmann

On Wed, Aug 04, 2010 at 08:52:44AM -0500, Anthony Liguori wrote:
> On 08/04/2010 08:34 AM, Gleb Natapov wrote:
> >On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote:
> >>On 08/04/2010 08:07 AM, Gleb Natapov wrote:
> >>>On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
> >>>>On 08/04/2010 03:17 AM, Avi Kivity wrote:
> >>>>>For playing games, there are three options:
> >>>>>- existing fwcfg
> >>>>>- fwcfg+dma
> >>>>>- put roms in 4GB-2MB (or whatever we decide the flash size is)
> >>>>>and have the BIOS copy them
> >>>>>
> >>>>>Existing fwcfg is the least amount of work and probably
> >>>>>satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
> >>>>>High memory flash is the most hardware-like solution, pretty easy
> >>>>>from a qemu point of view but requires more work.
> >>>>
> >>>>The only trouble I see is that high memory isn't always available.
> >>>>If it's a 32-bit PC and you've exhausted RAM space, then you're only
> >>>>left with the PCI hole and it's not clear to me if you can really
> >>>>pull out 100mb of space there as an option ROM without breaking
> >>>>something.
> >>>>
> >>>We can map it on demand. Guest tells qemu to map rom "A" to address X by
> >>>writing into some io port. Guest copies rom. Guest tells qemu to unmap
> >>>it. Better then DMA interface IMHO.
> >>That's what I thought too, but in a 32-bit guest using ~3.5GB of
> >>RAM, where can you safely get 100MB of memory to full map the ROM?
> >>If you're going to map chunks at a time, you are basically doing
> >>DMA.
> >>
> >This is not like DMA event if done in chunks and chunks can be pretty
> >big. The code that dials with copying may temporary unmap some pci
> >devices to have more space there.
> 
> That's a bit complicated because SeaBIOS is managing the PCI devices
> whereas the kernel code is running as an option rom.  I don't know
> the BIOS PCI interfaces well so I don't know how doable this is.
> 
Unmapping device and mapping it at the same place is easy. Enumerating
pci devices from multiboot.bin looks like unneeded churn though.

> Maybe we're just being too fancy here.
> 
> We could rewrite -kernel/-append/-initrd to just generate a floppy
> image in RAM, and just boot from floppy.
> 
May be. Can floppy be 100M?

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 14:00                                                               ` Gleb Natapov
@ 2010-08-04 14:14                                                                 ` Anthony Liguori
  2010-08-04 14:36                                                                   ` Gleb Natapov
  2010-08-04 14:22                                                                 ` Paolo Bonzini
  1 sibling, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 14:14 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, kvm, Gerd Hoffmann

On 08/04/2010 09:00 AM, Gleb Natapov wrote:
> On Wed, Aug 04, 2010 at 08:52:44AM -0500, Anthony Liguori wrote:
>    
>> On 08/04/2010 08:34 AM, Gleb Natapov wrote:
>>      
>>> On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote:
>>>        
>>>> On 08/04/2010 08:07 AM, Gleb Natapov wrote:
>>>>          
>>>>> On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
>>>>>            
>>>>>> On 08/04/2010 03:17 AM, Avi Kivity wrote:
>>>>>>              
>>>>>>> For playing games, there are three options:
>>>>>>> - existing fwcfg
>>>>>>> - fwcfg+dma
>>>>>>> - put roms in 4GB-2MB (or whatever we decide the flash size is)
>>>>>>> and have the BIOS copy them
>>>>>>>
>>>>>>> Existing fwcfg is the least amount of work and probably
>>>>>>> satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
>>>>>>> High memory flash is the most hardware-like solution, pretty easy
>>>>>>>                
>>>>>> >from a qemu point of view but requires more work.
>>>>>>
>>>>>> The only trouble I see is that high memory isn't always available.
>>>>>> If it's a 32-bit PC and you've exhausted RAM space, then you're only
>>>>>> left with the PCI hole and it's not clear to me if you can really
>>>>>> pull out 100mb of space there as an option ROM without breaking
>>>>>> something.
>>>>>>
>>>>>>              
>>>>> We can map it on demand. Guest tells qemu to map rom "A" to address X by
>>>>> writing into some io port. Guest copies rom. Guest tells qemu to unmap
>>>>> it. Better then DMA interface IMHO.
>>>>>            
>>>> That's what I thought too, but in a 32-bit guest using ~3.5GB of
>>>> RAM, where can you safely get 100MB of memory to full map the ROM?
>>>> If you're going to map chunks at a time, you are basically doing
>>>> DMA.
>>>>
>>>>          
>>> This is not like DMA event if done in chunks and chunks can be pretty
>>> big. The code that dials with copying may temporary unmap some pci
>>> devices to have more space there.
>>>        
>> That's a bit complicated because SeaBIOS is managing the PCI devices
>> whereas the kernel code is running as an option rom.  I don't know
>> the BIOS PCI interfaces well so I don't know how doable this is.
>>
>>      
> Unmapping device and mapping it at the same place is easy. Enumerating
> pci devices from multiboot.bin looks like unneeded churn though.
>
>    
>> Maybe we're just being too fancy here.
>>
>> We could rewrite -kernel/-append/-initrd to just generate a floppy
>> image in RAM, and just boot from floppy.
>>
>>      
> May be. Can floppy be 100M?
>    

No, I forgot just how small they are.  R/O usb mass storage device?  
CDROM?  I'm beginning thing that loading such a large initrd through 
fwcfg is simply a dead end.

Regards,

Anthony Liguori

> --
> 			Gleb.
>    

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 14:14                                                                 ` Anthony Liguori
@ 2010-08-04 14:36                                                                   ` Gleb Natapov
  0 siblings, 0 replies; 151+ messages in thread
From: Gleb Natapov @ 2010-08-04 14:36 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, kvm, Gerd Hoffmann

On Wed, Aug 04, 2010 at 09:14:01AM -0500, Anthony Liguori wrote:
> >Unmapping device and mapping it at the same place is easy. Enumerating
> >pci devices from multiboot.bin looks like unneeded churn though.
> >
> >>Maybe we're just being too fancy here.
> >>
> >>We could rewrite -kernel/-append/-initrd to just generate a floppy
> >>image in RAM, and just boot from floppy.
> >>
> >May be. Can floppy be 100M?
> 
> No, I forgot just how small they are.  R/O usb mass storage device?
> CDROM?  I'm beginning thing that loading such a large initrd through
> fwcfg is simply a dead end.
> 
Well, libguestfs can use CDROM by itself to begin with.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 14:00                                                               ` Gleb Natapov
  2010-08-04 14:14                                                                 ` Anthony Liguori
@ 2010-08-04 14:22                                                                 ` Paolo Bonzini
  2010-08-04 14:39                                                                   ` Anthony Liguori
  1 sibling, 1 reply; 151+ messages in thread
From: Paolo Bonzini @ 2010-08-04 14:22 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, Gerd Hoffmann

On 08/04/2010 04:00 PM, Gleb Natapov wrote:
>> Maybe we're just being too fancy here.
>>
>> We could rewrite -kernel/-append/-initrd to just generate a floppy
>> image in RAM, and just boot from floppy.
>>
> May be. Can floppy be 100M?

Well, in theory you can have 16384 bytes/sector, 256 tracks, 255 
sectors, 2 heads... that makes 2^(14+8+8+1) = 2 GB. :)  Not sure the 
BIOS would read such a beast, or SYSLINUX.

By the way, if libguestfs insists for an initrd rather than a CDROM 
image, it could do something in between and make an ISO image with 
ISOLINUX and the required kernel/initrd pair.

(By the way, a network installation image for a typical distribution has 
a 120M initrd, so it's not just libguestfs.  It is very useful to pass 
the network installation images directly to qemu via -kernel/-initrd).

Paolo

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 14:22                                                                 ` Paolo Bonzini
@ 2010-08-04 14:39                                                                   ` Anthony Liguori
  2010-08-04 16:33                                                                     ` Avi Kivity
  0 siblings, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 14:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity,
	Gerd Hoffmann

On 08/04/2010 09:22 AM, Paolo Bonzini wrote:
> On 08/04/2010 04:00 PM, Gleb Natapov wrote:
>>> Maybe we're just being too fancy here.
>>>
>>> We could rewrite -kernel/-append/-initrd to just generate a floppy
>>> image in RAM, and just boot from floppy.
>>>
>> May be. Can floppy be 100M?
>
> Well, in theory you can have 16384 bytes/sector, 256 tracks, 255 
> sectors, 2 heads... that makes 2^(14+8+8+1) = 2 GB. :)  Not sure the 
> BIOS would read such a beast, or SYSLINUX.
>
> By the way, if libguestfs insists for an initrd rather than a CDROM 
> image, it could do something in between and make an ISO image with 
> ISOLINUX and the required kernel/initrd pair.
>
> (By the way, a network installation image for a typical distribution 
> has a 120M initrd, so it's not just libguestfs.  It is very useful to 
> pass the network installation images directly to qemu via 
> -kernel/-initrd).

We could make kernel an awful lot smarter but unless we've got someone 
just itching to write 16-bit option rom code, I think our best bet is to 
try to leverage a standard bootloader and expose a disk containing the 
kernel/initrd.

Otherwise, we just stick with what we have and deal with the performance 
as is.

Regards,

Anthony Liguori

>
> Paolo

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 14:39                                                                   ` Anthony Liguori
@ 2010-08-04 16:33                                                                     ` Avi Kivity
  0 siblings, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 16:33 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann,
	Paolo Bonzini

  On 08/04/2010 05:39 PM, Anthony Liguori wrote:
>
> We could make kernel an awful lot smarter but unless we've got someone 
> just itching to write 16-bit option rom code, I think our best bet is 
> to try to leverage a standard bootloader and expose a disk containing 
> the kernel/initrd.
>

A problem with that is that the booted kernel would see that disk and 
try to do something with it.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 13:52                                                             ` Anthony Liguori
  2010-08-04 14:00                                                               ` Gleb Natapov
@ 2010-08-04 16:30                                                               ` Avi Kivity
  2010-08-04 16:36                                                                 ` Avi Kivity
  2010-08-04 16:42                                                                 ` Anthony Liguori
  1 sibling, 2 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 16:30 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, kvm, Gerd Hoffmann, Gleb Natapov, Richard W.M. Jones

  On 08/04/2010 04:52 PM, Anthony Liguori wrote:
>>>
>> This is not like DMA event if done in chunks and chunks can be pretty
>> big. The code that dials with copying may temporary unmap some pci
>> devices to have more space there.
>
>
> That's a bit complicated because SeaBIOS is managing the PCI devices 
> whereas the kernel code is running as an option rom.  I don't know the 
> BIOS PCI interfaces well so I don't know how doable this is.
>
> Maybe we're just being too fancy here.
>
> We could rewrite -kernel/-append/-initrd to just generate a floppy 
> image in RAM, and just boot from floppy.

How could this work?  the RAM belongs to SeaBIOS immediately after 
reset, it would just scribble over it.  Or worse, not scribble on it 
until some date in the future.

-kernel data has to find its way to memory after the bios gives control 
to some optionrom.  An alternative would be to embed knowledge of 
-kernel in seabios, but I don't think it's a good one.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:30                                                               ` Avi Kivity
@ 2010-08-04 16:36                                                                 ` Avi Kivity
  2010-08-04 16:44                                                                   ` Anthony Liguori
                                                                                     ` (2 more replies)
  2010-08-04 16:42                                                                 ` Anthony Liguori
  1 sibling, 3 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 16:36 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, kvm, Gerd Hoffmann, Gleb Natapov, Richard W.M. Jones

  On 08/04/2010 07:30 PM, Avi Kivity wrote:
>  On 08/04/2010 04:52 PM, Anthony Liguori wrote:
>>>>
>>> This is not like DMA event if done in chunks and chunks can be pretty
>>> big. The code that dials with copying may temporary unmap some pci
>>> devices to have more space there.
>>
>>
>> That's a bit complicated because SeaBIOS is managing the PCI devices 
>> whereas the kernel code is running as an option rom.  I don't know 
>> the BIOS PCI interfaces well so I don't know how doable this is.
>>
>> Maybe we're just being too fancy here.
>>
>> We could rewrite -kernel/-append/-initrd to just generate a floppy 
>> image in RAM, and just boot from floppy.
>
> How could this work?  the RAM belongs to SeaBIOS immediately after 
> reset, it would just scribble over it.  Or worse, not scribble on it 
> until some date in the future.
>
> -kernel data has to find its way to memory after the bios gives 
> control to some optionrom.  An alternative would be to embed knowledge 
> of -kernel in seabios, but I don't think it's a good one.
>

Oh, you meant host RAM, not guest RAM.  Disregard.

This is basically my suggestion to libguestfs: instead of generating an 
initrd, generate a bootable cdrom, and boot from that.  The result is 
faster and has a smaller memory footprint.  Everyone wins.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:36                                                                 ` Avi Kivity
@ 2010-08-04 16:44                                                                   ` Anthony Liguori
  2010-08-04 16:52                                                                     ` Avi Kivity
                                                                                       ` (2 more replies)
  2010-08-04 16:45                                                                   ` Alexander Graf
  2010-08-04 17:46                                                                   ` Richard W.M. Jones
  2 siblings, 3 replies; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 16:44 UTC (permalink / raw)
  To: Avi Kivity
  Cc: qemu-devel, kvm, Gerd Hoffmann, Gleb Natapov, Richard W.M. Jones

On 08/04/2010 11:36 AM, Avi Kivity wrote:
>  On 08/04/2010 07:30 PM, Avi Kivity wrote:
>>  On 08/04/2010 04:52 PM, Anthony Liguori wrote:
>>>>>
>>>> This is not like DMA event if done in chunks and chunks can be pretty
>>>> big. The code that dials with copying may temporary unmap some pci
>>>> devices to have more space there.
>>>
>>>
>>> That's a bit complicated because SeaBIOS is managing the PCI devices 
>>> whereas the kernel code is running as an option rom.  I don't know 
>>> the BIOS PCI interfaces well so I don't know how doable this is.
>>>
>>> Maybe we're just being too fancy here.
>>>
>>> We could rewrite -kernel/-append/-initrd to just generate a floppy 
>>> image in RAM, and just boot from floppy.
>>
>> How could this work?  the RAM belongs to SeaBIOS immediately after 
>> reset, it would just scribble over it.  Or worse, not scribble on it 
>> until some date in the future.
>>
>> -kernel data has to find its way to memory after the bios gives 
>> control to some optionrom.  An alternative would be to embed 
>> knowledge of -kernel in seabios, but I don't think it's a good one.
>>
>
> Oh, you meant host RAM, not guest RAM.  Disregard.
>
> This is basically my suggestion to libguestfs: instead of generating 
> an initrd, generate a bootable cdrom, and boot from that.  The result 
> is faster and has a smaller memory footprint.  Everyone wins.

Yeah, but we could also do that entirely in QEMU.  If that's what we 
suggest doing, there's no reason not to do it instead of the option rom 
trickery that we do today.

The option rom stuff has a number of short comings.  Because we hijack 
int19, extboot doesn't get to run.  That means that if you use -kernel 
to load a grub (the Ubuntu guys for their own absurd reasons) then grub 
does not see extboot backed disks.  The solution for them is the same, 
generate a proper disk and boot from that disk.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:44                                                                   ` Anthony Liguori
@ 2010-08-04 16:52                                                                     ` Avi Kivity
  2010-08-04 17:37                                                                     ` Gleb Natapov
  2010-08-05  7:28                                                                     ` Gerd Hoffmann
  2 siblings, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 16:52 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, kvm, Gerd Hoffmann, Gleb Natapov, Richard W.M. Jones

  On 08/04/2010 07:44 PM, Anthony Liguori wrote:
>
> The option rom stuff has a number of short comings.  Because we hijack 
> int19, extboot doesn't get to run.  That means that if you use -kernel 
> to load a grub (the Ubuntu guys for their own absurd reasons) then 
> grub does not see extboot backed disks.  The solution for them is the 
> same, generate a proper disk and boot from that disk.
>

Let's print it out and hand out leaflets at the upcoming kvm forum.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:44                                                                   ` Anthony Liguori
  2010-08-04 16:52                                                                     ` Avi Kivity
@ 2010-08-04 17:37                                                                     ` Gleb Natapov
  2010-08-05  7:28                                                                     ` Gerd Hoffmann
  2 siblings, 0 replies; 151+ messages in thread
From: Gleb Natapov @ 2010-08-04 17:37 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, kvm, Gerd Hoffmann

On Wed, Aug 04, 2010 at 11:44:33AM -0500, Anthony Liguori wrote:
> On 08/04/2010 11:36 AM, Avi Kivity wrote:
> > On 08/04/2010 07:30 PM, Avi Kivity wrote:
> >> On 08/04/2010 04:52 PM, Anthony Liguori wrote:
> >>>>>
> >>>>This is not like DMA event if done in chunks and chunks can be pretty
> >>>>big. The code that dials with copying may temporary unmap some pci
> >>>>devices to have more space there.
> >>>
> >>>
> >>>That's a bit complicated because SeaBIOS is managing the PCI
> >>>devices whereas the kernel code is running as an option rom.
> >>>I don't know the BIOS PCI interfaces well so I don't know how
> >>>doable this is.
> >>>
> >>>Maybe we're just being too fancy here.
> >>>
> >>>We could rewrite -kernel/-append/-initrd to just generate a
> >>>floppy image in RAM, and just boot from floppy.
> >>
> >>How could this work?  the RAM belongs to SeaBIOS immediately
> >>after reset, it would just scribble over it.  Or worse, not
> >>scribble on it until some date in the future.
> >>
> >>-kernel data has to find its way to memory after the bios gives
> >>control to some optionrom.  An alternative would be to embed
> >>knowledge of -kernel in seabios, but I don't think it's a good
> >>one.
> >>
> >
> >Oh, you meant host RAM, not guest RAM.  Disregard.
> >
> >This is basically my suggestion to libguestfs: instead of
> >generating an initrd, generate a bootable cdrom, and boot from
> >that.  The result is faster and has a smaller memory footprint.
> >Everyone wins.
> 
> Yeah, but we could also do that entirely in QEMU.  If that's what we
> suggest doing, there's no reason not to do it instead of the option
> rom trickery that we do today.
> 
> The option rom stuff has a number of short comings.  Because we
> hijack int19, extboot doesn't get to run.  That means that if you
> use -kernel to load a grub (the Ubuntu guys for their own absurd
> reasons) then grub does not see extboot backed disks.  The solution
> for them is the same, generate a proper disk and boot from that
> disk.
> 
Extboot is not so relevant any more.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:44                                                                   ` Anthony Liguori
  2010-08-04 16:52                                                                     ` Avi Kivity
  2010-08-04 17:37                                                                     ` Gleb Natapov
@ 2010-08-05  7:28                                                                     ` Gerd Hoffmann
  2010-08-05  7:34                                                                       ` Gleb Natapov
  2010-08-05 13:43                                                                       ` Anthony Liguori
  2 siblings, 2 replies; 151+ messages in thread
From: Gerd Hoffmann @ 2010-08-05  7:28 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, kvm, Avi Kivity, Gleb Natapov, Richard W.M. Jones

   Hi,

> The option rom stuff has a number of short comings. Because we hijack
> int19, extboot doesn't get to run. That means that if you use -kernel to
> load a grub (the Ubuntu guys for their own absurd reasons) then grub
> does not see extboot backed disks. The solution for them is the same,
> generate a proper disk and boot from that disk.

Oh, having extboot + linuxboot + multiboot register a BEV (correct 
acronym?) entry instead of hijacking int19 would fix that too. 
Additional bonus will be that they are selectable in the boot menu.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-05  7:28                                                                     ` Gerd Hoffmann
@ 2010-08-05  7:34                                                                       ` Gleb Natapov
  2010-08-05  7:56                                                                         ` Avi Kivity
  2010-08-05 13:43                                                                       ` Anthony Liguori
  1 sibling, 1 reply; 151+ messages in thread
From: Gleb Natapov @ 2010-08-05  7:34 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: qemu-devel, kvm, Avi Kivity, Richard W.M. Jones

On Thu, Aug 05, 2010 at 09:28:57AM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> >The option rom stuff has a number of short comings. Because we hijack
> >int19, extboot doesn't get to run. That means that if you use -kernel to
> >load a grub (the Ubuntu guys for their own absurd reasons) then grub
> >does not see extboot backed disks. The solution for them is the same,
> >generate a proper disk and boot from that disk.
> 
> Oh, having extboot + linuxboot + multiboot register a BEV (correct
> acronym?) entry instead of hijacking int19 would fix that too.
> Additional bonus will be that they are selectable in the boot menu.
> 
Good idea except that we are not good at communicating to seabios where
do we want to boot from by default.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-05  7:34                                                                       ` Gleb Natapov
@ 2010-08-05  7:56                                                                         ` Avi Kivity
  2010-08-05  7:59                                                                           ` Gleb Natapov
  0 siblings, 1 reply; 151+ messages in thread
From: Avi Kivity @ 2010-08-05  7:56 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: qemu-devel, kvm, Gerd Hoffmann, Richard W.M. Jones

  On 08/05/2010 10:34 AM, Gleb Natapov wrote:
> On Thu, Aug 05, 2010 at 09:28:57AM +0200, Gerd Hoffmann wrote:
>>    Hi,
>>
>>> The option rom stuff has a number of short comings. Because we hijack
>>> int19, extboot doesn't get to run. That means that if you use -kernel to
>>> load a grub (the Ubuntu guys for their own absurd reasons) then grub
>>> does not see extboot backed disks. The solution for them is the same,
>>> generate a proper disk and boot from that disk.
>> Oh, having extboot + linuxboot + multiboot register a BEV (correct
>> acronym?) entry instead of hijacking int19 would fix that too.
>> Additional bonus will be that they are selectable in the boot menu.
>>
> Good idea except that we are not good at communicating to seabios where
> do we want to boot from by default.

We have the firmware configuration interface for that, if we can 
tolerate its speed.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-05  7:56                                                                         ` Avi Kivity
@ 2010-08-05  7:59                                                                           ` Gleb Natapov
  2010-08-05  8:45                                                                             ` Avi Kivity
  0 siblings, 1 reply; 151+ messages in thread
From: Gleb Natapov @ 2010-08-05  7:59 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm, Gerd Hoffmann, Richard W.M. Jones

On Thu, Aug 05, 2010 at 10:56:52AM +0300, Avi Kivity wrote:
>  On 08/05/2010 10:34 AM, Gleb Natapov wrote:
> >On Thu, Aug 05, 2010 at 09:28:57AM +0200, Gerd Hoffmann wrote:
> >>   Hi,
> >>
> >>>The option rom stuff has a number of short comings. Because we hijack
> >>>int19, extboot doesn't get to run. That means that if you use -kernel to
> >>>load a grub (the Ubuntu guys for their own absurd reasons) then grub
> >>>does not see extboot backed disks. The solution for them is the same,
> >>>generate a proper disk and boot from that disk.
> >>Oh, having extboot + linuxboot + multiboot register a BEV (correct
> >>acronym?) entry instead of hijacking int19 would fix that too.
> >>Additional bonus will be that they are selectable in the boot menu.
> >>
> >Good idea except that we are not good at communicating to seabios where
> >do we want to boot from by default.
> 
> We have the firmware configuration interface for that, if we can
> tolerate its speed.
> 
To pass default boot device, sure :) The question is what to pass so
that seabios will be able to unambiguously determine what device to
boot from.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-05  7:59                                                                           ` Gleb Natapov
@ 2010-08-05  8:45                                                                             ` Avi Kivity
  2010-08-05  8:48                                                                               ` Gleb Natapov
  0 siblings, 1 reply; 151+ messages in thread
From: Avi Kivity @ 2010-08-05  8:45 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: qemu-devel, kvm, Gerd Hoffmann, Richard W.M. Jones

  On 08/05/2010 10:59 AM, Gleb Natapov wrote:
>
>> We have the firmware configuration interface for that, if we can
>> tolerate its speed.
>>
> To pass default boot device, sure :) The question is what to pass so
> that seabios will be able to unambiguously determine what device to
> boot from.

IMO seabios should (if it doesn't already) store this information in 
CMOS non-volatile memory (which can be backed by a small disk image).  
This allows the user to play with the configuration at boot time, and if 
we document the format, management tools can read and write it as well.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-05  8:45                                                                             ` Avi Kivity
@ 2010-08-05  8:48                                                                               ` Gleb Natapov
  0 siblings, 0 replies; 151+ messages in thread
From: Gleb Natapov @ 2010-08-05  8:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm, Gerd Hoffmann, Richard W.M. Jones

On Thu, Aug 05, 2010 at 11:45:33AM +0300, Avi Kivity wrote:
>  On 08/05/2010 10:59 AM, Gleb Natapov wrote:
> >
> >>We have the firmware configuration interface for that, if we can
> >>tolerate its speed.
> >>
> >To pass default boot device, sure :) The question is what to pass so
> >that seabios will be able to unambiguously determine what device to
> >boot from.
> 
> IMO seabios should (if it doesn't already) store this information in
> CMOS non-volatile memory (which can be backed by a small disk
> image).  This allows the user to play with the configuration at boot
> time, and if we document the format, management tools can read and
> write it as well.
> 
The important part is to find the way to unambiguously pass default boot
device between seabios/qemu/management. Afterward we can do many things
with it. Pass it on command line, save it in external disk image etc.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-05  7:28                                                                     ` Gerd Hoffmann
  2010-08-05  7:34                                                                       ` Gleb Natapov
@ 2010-08-05 13:43                                                                       ` Anthony Liguori
  1 sibling, 0 replies; 151+ messages in thread
From: Anthony Liguori @ 2010-08-05 13:43 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: qemu-devel, kvm, Avi Kivity, Gleb Natapov, Richard W.M. Jones

On 08/05/2010 02:28 AM, Gerd Hoffmann wrote:
>   Hi,
>
>> The option rom stuff has a number of short comings. Because we hijack
>> int19, extboot doesn't get to run. That means that if you use -kernel to
>> load a grub (the Ubuntu guys for their own absurd reasons) then grub
>> does not see extboot backed disks. The solution for them is the same,
>> generate a proper disk and boot from that disk.
>
> Oh, having extboot + linuxboot + multiboot register a BEV (correct 
> acronym?) entry instead of hijacking int19 would fix that too. 
> Additional bonus will be that they are selectable in the boot menu.

Well, extboot doesn't hijack int19, it hijacks int13.  It'll appear as 
the disk 0x80.  It would be better though to do a BCV rom such that the 
extboot disk appeared as an independent disk instead of hijacking disk 0x80.

linuxboot/multiboot should be BEV roms, no doubt.

Regards,

Anthony Liguori

> cheers,
>   Gerd
>

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:36                                                                 ` Avi Kivity
  2010-08-04 16:44                                                                   ` Anthony Liguori
@ 2010-08-04 16:45                                                                   ` Alexander Graf
  2010-08-04 16:54                                                                     ` Avi Kivity
  2010-08-04 17:26                                                                     ` Anthony Liguori
  2010-08-04 17:46                                                                   ` Richard W.M. Jones
  2 siblings, 2 replies; 151+ messages in thread
From: Alexander Graf @ 2010-08-04 16:45 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann

On 04.08.2010, at 18:36, Avi Kivity wrote:

> On 08/04/2010 07:30 PM, Avi Kivity wrote:
>> On 08/04/2010 04:52 PM, Anthony Liguori wrote:
>>>>> 
>>>> This is not like DMA event if done in chunks and chunks can be pretty
>>>> big. The code that dials with copying may temporary unmap some pci
>>>> devices to have more space there.
>>> 
>>> 
>>> That's a bit complicated because SeaBIOS is managing the PCI devices whereas the kernel code is running as an option rom.  I don't know the BIOS PCI interfaces well so I don't know how doable this is.
>>> 
>>> Maybe we're just being too fancy here.
>>> 
>>> We could rewrite -kernel/-append/-initrd to just generate a floppy image in RAM, and just boot from floppy.
>> 
>> How could this work?  the RAM belongs to SeaBIOS immediately after reset, it would just scribble over it.  Or worse, not scribble on it until some date in the future.
>> 
>> -kernel data has to find its way to memory after the bios gives control to some optionrom.  An alternative would be to embed knowledge of -kernel in seabios, but I don't think it's a good one.
>> 
> 
> Oh, you meant host RAM, not guest RAM.  Disregard.
> 
> This is basically my suggestion to libguestfs: instead of generating an initrd, generate a bootable cdrom, and boot from that.  The result is faster and has a smaller memory footprint.  Everyone wins.

Frankly, I partially agreed to your point when we were talking about 300ms vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose the wrong interface to transfer kernel+initrd data into the guest.

Now the question is how to fix that. I would veto against anything normally guest-OS-visible. By occupying the floppy, you lose a floppy drive in the guest. By occupying a disk, you see an unwanted disk in the guest. By taking virtio-serial you see an unwanted virtio-serial line in the guest. fw_cfg is great because it's a private interface nobody else accesses.

I see two alternatives out of this mess:

1) Speed up string PIO so we're actually fast again.
2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways)

Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2.

Alex

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:45                                                                   ` Alexander Graf
@ 2010-08-04 16:54                                                                     ` Avi Kivity
  2010-08-04 17:01                                                                       ` Alexander Graf
  2010-08-04 17:26                                                                     ` Anthony Liguori
  1 sibling, 1 reply; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 16:54 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann

  On 08/04/2010 07:45 PM, Alexander Graf wrote:
>
> I see two alternatives out of this mess:
>
> 1) Speed up string PIO so we're actually fast again.

Certainly, the best option given that it needs no new interfaces, and 
improves the most workloads.

> 2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways)

A guest/host interface is not private.

> Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2.

3) don't use -kernel for 100MB or more.  It's not the right tool.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:54                                                                     ` Avi Kivity
@ 2010-08-04 17:01                                                                       ` Alexander Graf
  2010-08-04 17:14                                                                         ` Avi Kivity
  0 siblings, 1 reply; 151+ messages in thread
From: Alexander Graf @ 2010-08-04 17:01 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann


On 04.08.2010, at 18:54, Avi Kivity wrote:

> On 08/04/2010 07:45 PM, Alexander Graf wrote:
>> 
>> I see two alternatives out of this mess:
>> 
>> 1) Speed up string PIO so we're actually fast again.
> 
> Certainly, the best option given that it needs no new interfaces, and improves the most workloads.
> 
>> 2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways)
> 
> A guest/host interface is not private.

fw_cfg is as private as it gets with host/guest interfaces. It's about as close as CPU specific MSRs or SMC chips.

> 
>> Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2.
> 
> 3) don't use -kernel for 100MB or more.  It's not the right tool.

Why not? You're the one always ranting about caring about users. Now you get at least 3 users from the Qemu development community actually using a feature and you just claim it's wrong? Please, we've added way more useless features for worse reasons.


Alex

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 17:01                                                                       ` Alexander Graf
@ 2010-08-04 17:14                                                                         ` Avi Kivity
  2010-08-04 17:27                                                                           ` Alexander Graf
  0 siblings, 1 reply; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 17:14 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann

  On 08/04/2010 08:01 PM, Alexander Graf wrote:
>
>>
>>> 2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways)
>> A guest/host interface is not private.
> fw_cfg is as private as it gets with host/guest interfaces. It's about as close as CPU specific MSRs or SMC chips.
>

Well, it isn't.  Two external projects already use it.  You can't change 
it due to the needs to live migrate from older versions.

>>> Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2.
>> 3) don't use -kernel for 100MB or more.  It's not the right tool.
> Why not? You're the one always ranting about caring about users. Now you get at least 3 users from the Qemu development community actually using a feature and you just claim it's wrong? Please, we've added way more useless features for worse reasons.
>

It's not wrong in itself, but using it with supersized initrds is 
wrong.  The data is stored in qemu, host pagecache, and the guest, so 
three copies, it's limited by guest RAM, has to be live migrated.  Sure 
we could optimize it, but it's better to spend our efforts on more 
mainstream users.

If you want to pull large amounts of data into the guest efficiently, 
use virtio-blk.  That's what it's for.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 17:14                                                                         ` Avi Kivity
@ 2010-08-04 17:27                                                                           ` Alexander Graf
  2010-08-04 17:34                                                                             ` Avi Kivity
  0 siblings, 1 reply; 151+ messages in thread
From: Alexander Graf @ 2010-08-04 17:27 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann


On 04.08.2010, at 19:14, Avi Kivity wrote:

> On 08/04/2010 08:01 PM, Alexander Graf wrote:
>> 
>>> 
>>>> 2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways)
>>> A guest/host interface is not private.
>> fw_cfg is as private as it gets with host/guest interfaces. It's about as close as CPU specific MSRs or SMC chips.
>> 
> 
> Well, it isn't.  Two external projects already use it.  You can't change it due to the needs to live migrate from older versions.

You can always extend it. You can even break it with a new -M.

> 
>>>> Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2.
>>> 3) don't use -kernel for 100MB or more.  It's not the right tool.
>> Why not? You're the one always ranting about caring about users. Now you get at least 3 users from the Qemu development community actually using a feature and you just claim it's wrong? Please, we've added way more useless features for worse reasons.
>> 
> 
> It's not wrong in itself, but using it with supersized initrds is wrong.  The data is stored in qemu, host pagecache, and the guest, so three copies, it's limited by guest RAM, has to be live migrated.  Sure we could optimize it, but it's better to spend our efforts on more mainstream users.

It's only stored twice. The host pagecache copy is gone during the lifetime of the VM. Migration also doesn't make sense for most -kernel/-initrd use cases. And it's awesome for fast prototyping. Of course, once that fast becomes dog slow, it's not useful anymore.

I bet within the time everybody spent on this thread we would have a working and stable DMA fw_cfg interface plus extra spare time for supporting breakage already.


Alex

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 17:27                                                                           ` Alexander Graf
@ 2010-08-04 17:34                                                                             ` Avi Kivity
  2010-08-04 20:06                                                                               ` David S. Ahern
  0 siblings, 1 reply; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 17:34 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann

  On 08/04/2010 08:27 PM, Alexander Graf wrote:
>>
>> Well, it isn't.  Two external projects already use it.  You can't change it due to the needs to live migrate from older versions.
> You can always extend it. You can even break it with a new -M.

Yes.  But it's a pain to make sure it all works out.  We're already 
suffering from this where we have no choice, why do it where we have a 
choice?

>> It's not wrong in itself, but using it with supersized initrds is wrong.  The data is stored in qemu, host pagecache, and the guest, so three copies, it's limited by guest RAM, has to be live migrated.  Sure we could optimize it, but it's better to spend our efforts on more mainstream users.
> It's only stored twice. The host pagecache copy is gone during the lifetime of the VM.

It has still evicted some other pagecache.  Footprint is footprint.  
300MB to cat some file in a guest.

> Migration also doesn't make sense for most -kernel/-initrd use cases.

You're just inviting a bug report here.  If we add a feature, let's make 
it work.

> And it's awesome for fast prototyping. Of course, once that fast becomes dog slow, it's not useful anymore.

For the Nth time, it's only slow with 100MB initrds.

> I bet within the time everybody spent on this thread we would have a working and stable DMA fw_cfg interface plus extra spare time for supporting breakage already.

The time would have been better spent improving kvm's pio or porting 
libguestfs to use a cdrom.  I'm also hoping to get the point across that 
adding pv interfaces like crazy is not sustainable.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 17:34                                                                             ` Avi Kivity
@ 2010-08-04 20:06                                                                               ` David S. Ahern
  2010-08-04 20:16                                                                                 ` Richard W.M. Jones
  2010-08-05  2:38                                                                                 ` Avi Kivity
  0 siblings, 2 replies; 151+ messages in thread
From: David S. Ahern @ 2010-08-04 20:06 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alexander Graf, kvm, Gleb Natapov, qemu-devel, Richard W.M. Jones,
	Gerd Hoffmann

On 08/04/10 11:34, Avi Kivity wrote:

>> And it's awesome for fast prototyping. Of course, once that fast
>> becomes dog slow, it's not useful anymore.
> 
> For the Nth time, it's only slow with 100MB initrds.

100MB is really not that large for an initrd.

Consider the deployment of stateless nodes - something that
virtualization allows the rapid deployment of. 1 kernel, 1 initrd with
the various binaries to be run. Create nodes as needed by launching a
shell command - be it for more capacity, isolation, etc. Why require an
iso or disk wrapper for a binary blob that is all to be run out of
memory? The -append argument allows boot parameters to be specified at
launch. That is a very powerful and simple design option.

David

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 20:06                                                                               ` David S. Ahern
@ 2010-08-04 20:16                                                                                 ` Richard W.M. Jones
  2010-08-05  2:38                                                                                 ` Avi Kivity
  1 sibling, 0 replies; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-04 20:16 UTC (permalink / raw)
  To: David S. Ahern
  Cc: kvm, Gleb Natapov, qemu-devel, Alexander Graf, Gerd Hoffmann,
	Avi Kivity

On Wed, Aug 04, 2010 at 02:06:58PM -0600, David S. Ahern wrote:
> 
> 
> On 08/04/10 11:34, Avi Kivity wrote:
> 
> >> And it's awesome for fast prototyping. Of course, once that fast
> >> becomes dog slow, it's not useful anymore.
> > 
> > For the Nth time, it's only slow with 100MB initrds.
> 
> 100MB is really not that large for an initrd.

<note>

I'd just like to note that the libguestfs initrd is uncompressed.

The reason for this is I found that the decompression code in Linux is
really slow.  I have to admit I didn't look into why this is.  By not
compressing it on the host and decompressing it on the guest, we saved
a bunch of boot time (3-5 seconds IIRC).

Anyway, comparing 115MB libguestfs initrd and other initrd sizes may
not be a fair comparison, since almost every other initrd you will see
will be compressed.

</note>

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 20:06                                                                               ` David S. Ahern
  2010-08-04 20:16                                                                                 ` Richard W.M. Jones
@ 2010-08-05  2:38                                                                                 ` Avi Kivity
  1 sibling, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-05  2:38 UTC (permalink / raw)
  To: David S. Ahern
  Cc: Alexander Graf, kvm, Gleb Natapov, qemu-devel, Richard W.M. Jones,
	Gerd Hoffmann

  On 08/04/2010 11:06 PM, David S. Ahern wrote:
>
> On 08/04/10 11:34, Avi Kivity wrote:
>
>>> And it's awesome for fast prototyping. Of course, once that fast
>>> becomes dog slow, it's not useful anymore.
>> For the Nth time, it's only slow with 100MB initrds.
> 100MB is really not that large for an initrd.
>
> Consider the deployment of stateless nodes - something that
> virtualization allows the rapid deployment of. 1 kernel, 1 initrd with
> the various binaries to be run. Create nodes as needed by launching a
> shell command - be it for more capacity, isolation, etc. Why require an
> iso or disk wrapper for a binary blob that is all to be run out of
> memory?

It's inefficient.  First qemu reads the initrd and stores it in memory 
(where it is kept while the guest runs in case you migrate or reboot).  
Then the guest copies it into temporary storage (where we currently have 
the slowdown).  Then the guest decompresses and extracts it to tmpfs 
(initramfs model).  Finally the guest runs init out of initrd, typically 
using just a part of the 100MB+.

Whereas with a disk image, individual pages are copied to the guest on 
demand without taking space in qemu.  With cache=none, they don't even 
affect host pagecache.

> The -append argument allows boot parameters to be specified at
> launch. That is a very powerful and simple design option.

Good point.  You still have it with a small initrd that bootstraps a 
larger image.

Note -append probably works even without -kernel, it's just that the 
guest isn't tooled to look at it.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:45                                                                   ` Alexander Graf
  2010-08-04 16:54                                                                     ` Avi Kivity
@ 2010-08-04 17:26                                                                     ` Anthony Liguori
  2010-08-04 17:31                                                                       ` Alexander Graf
  1 sibling, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 17:26 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity,
	Gerd Hoffmann

On 08/04/2010 11:45 AM, Alexander Graf wrote:
> Frankly, I partially agreed to your point when we were talking about 300ms vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose the wrong interface to transfer kernel+initrd data into the guest.
>
> Now the question is how to fix that. I would veto against anything normally guest-OS-visible. By occupying the floppy, you lose a floppy drive in the guest. By occupying a disk, you see an unwanted disk in the guest.


Introduce a new virtio device type (say, id 6).  Teach SeaBIOS that 6 is 
exactly like virtio-blk (id 2).  Make it clear that id 6 is only to be 
used by firmware and that normal guest drivers should not be written for 
id 6.

Problem is now solved and everyone's happy.  Now we can all go back to 
making slides for next week :-)

Regards,

Anthony Liguori

>   By taking virtio-serial you see an unwanted virtio-serial line in the guest. fw_cfg is great because it's a private interface nobody else accesses.
>
> I see two alternatives out of this mess:
>
> 1) Speed up string PIO so we're actually fast again.
> 2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways)
>
> Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2.
>
>
> Alex
>
>    

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 17:26                                                                     ` Anthony Liguori
@ 2010-08-04 17:31                                                                       ` Alexander Graf
  2010-08-04 17:35                                                                         ` Avi Kivity
  2010-08-04 17:36                                                                         ` Anthony Liguori
  0 siblings, 2 replies; 151+ messages in thread
From: Alexander Graf @ 2010-08-04 17:31 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity,
	Gerd Hoffmann


On 04.08.2010, at 19:26, Anthony Liguori wrote:

> On 08/04/2010 11:45 AM, Alexander Graf wrote:
>> Frankly, I partially agreed to your point when we were talking about 300ms vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose the wrong interface to transfer kernel+initrd data into the guest.
>> 
>> Now the question is how to fix that. I would veto against anything normally guest-OS-visible. By occupying the floppy, you lose a floppy drive in the guest. By occupying a disk, you see an unwanted disk in the guest.
> 
> 
> Introduce a new virtio device type (say, id 6).  Teach SeaBIOS that 6 is exactly like virtio-blk (id 2).  Make it clear that id 6 is only to be used by firmware and that normal guest drivers should not be written for id 6.

Why not make id 6 be a fw_cfg virtio interface? That way we'd stay 100% compatible to everything we have and also get a fast path for reading big chunks of data from fw_cfg. All we'd need is a command to set the 'file' we're in.

Even better yet, why not use virtio-9p and expose all of fw_cfg as files? Then implement a simple virtio-9p client in SeaBIOS and maybe even get direct kernel/initrd boot from a real 9p system ;).


Alex

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 17:31                                                                       ` Alexander Graf
@ 2010-08-04 17:35                                                                         ` Avi Kivity
  2010-08-04 17:36                                                                         ` Anthony Liguori
  1 sibling, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 17:35 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann

  On 08/04/2010 08:31 PM, Alexander Graf wrote:
>
> Even better yet, why not use virtio-9p and expose all of fw_cfg as files? Then implement a simple virtio-9p client in SeaBIOS and maybe even get direct kernel/initrd boot from a real 9p system ;).
>

libguestfs could use 9pfs directly.  That will be way faster and reduce 
the footprint dramatically (the guest will demand load only the pages it 
needs).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 17:31                                                                       ` Alexander Graf
  2010-08-04 17:35                                                                         ` Avi Kivity
@ 2010-08-04 17:36                                                                         ` Anthony Liguori
  2010-08-04 17:36                                                                           ` Alexander Graf
  1 sibling, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 17:36 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity,
	Gerd Hoffmann

On 08/04/2010 12:31 PM, Alexander Graf wrote:
> On 04.08.2010, at 19:26, Anthony Liguori wrote:
>
>    
>> On 08/04/2010 11:45 AM, Alexander Graf wrote:
>>      
>>> Frankly, I partially agreed to your point when we were talking about 300ms vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose the wrong interface to transfer kernel+initrd data into the guest.
>>>
>>> Now the question is how to fix that. I would veto against anything normally guest-OS-visible. By occupying the floppy, you lose a floppy drive in the guest. By occupying a disk, you see an unwanted disk in the guest.
>>>        
>>
>> Introduce a new virtio device type (say, id 6).  Teach SeaBIOS that 6 is exactly like virtio-blk (id 2).  Make it clear that id 6 is only to be used by firmware and that normal guest drivers should not be written for id 6.
>>      
> Why not make id 6 be a fw_cfg virtio interface?

Because that's a ton more work and we need fw_cfg to be available before 
PCI is.  IOW, fw_cfg cannot be a PCI interface.

Regards,

Anthony Liguori

>   That way we'd stay 100% compatible to everything we have and also get a fast path for reading big chunks of data from fw_cfg. All we'd need is a command to set the 'file' we're in.
>
> Even better yet, why not use virtio-9p and expose all of fw_cfg as files? Then implement a simple virtio-9p client in SeaBIOS and maybe even get direct kernel/initrd boot from a real 9p system ;).
>
>
> Alex
>
>    

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 17:36                                                                         ` Anthony Liguori
@ 2010-08-04 17:36                                                                           ` Alexander Graf
  0 siblings, 0 replies; 151+ messages in thread
From: Alexander Graf @ 2010-08-04 17:36 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity,
	Gerd Hoffmann


On 04.08.2010, at 19:36, Anthony Liguori wrote:

> On 08/04/2010 12:31 PM, Alexander Graf wrote:
>> On 04.08.2010, at 19:26, Anthony Liguori wrote:
>> 
>>   
>>> On 08/04/2010 11:45 AM, Alexander Graf wrote:
>>>     
>>>> Frankly, I partially agreed to your point when we were talking about 300ms vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose the wrong interface to transfer kernel+initrd data into the guest.
>>>> 
>>>> Now the question is how to fix that. I would veto against anything normally guest-OS-visible. By occupying the floppy, you lose a floppy drive in the guest. By occupying a disk, you see an unwanted disk in the guest.
>>>>       
>>> 
>>> Introduce a new virtio device type (say, id 6).  Teach SeaBIOS that 6 is exactly like virtio-blk (id 2).  Make it clear that id 6 is only to be used by firmware and that normal guest drivers should not be written for id 6.
>>>     
>> Why not make id 6 be a fw_cfg virtio interface?
> 
> Because that's a ton more work and we need fw_cfg to be available before PCI is.  IOW, fw_cfg cannot be a PCI interface.

in addition to fw_cfg. So you'd have the same contents be exposed using both interfaces.

Alex

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:36                                                                 ` Avi Kivity
  2010-08-04 16:44                                                                   ` Anthony Liguori
  2010-08-04 16:45                                                                   ` Alexander Graf
@ 2010-08-04 17:46                                                                   ` Richard W.M. Jones
  2010-08-04 17:50                                                                     ` Avi Kivity
  2010-08-04 18:13                                                                     ` Alexander Graf
  2 siblings, 2 replies; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-04 17:46 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, kvm, Gleb Natapov, Gerd Hoffmann

On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote:
> This is basically my suggestion to libguestfs: instead of generating
> an initrd, generate a bootable cdrom, and boot from that.  The
> result is faster and has a smaller memory footprint.  Everyone wins.

We had some discussion of this upstream & decided to do this.  It
should save the time it takes for the guest kernel to unpack the
initrd, so maybe another second off boot time, which could bring us
ever closer to the "golden" 5 second boot target.

It's not trivial mind you, and won't happen straightaway.  Part of it
is that it requires reworking the appliance builder (a matter of just
coding really).  The less trivial part is that we have to 'hide' the
CD device throughout the publically available interfaces.  Then of
course, a lot of testing.

I will note that virt-install uses the -initrd interface for
installing guests (large initrds too).  And I've talked with a
sysadmin who was using -kernel and -initrd for deploying VM hosting.
In his case he did it so he could centralize kernel distribution /
updates, and have the guests use /dev/vda == filesystem which made
provisioning easy [for him -- I would have used libguestfs ...].

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 17:46                                                                   ` Richard W.M. Jones
@ 2010-08-04 17:50                                                                     ` Avi Kivity
  2010-08-04 18:13                                                                     ` Alexander Graf
  1 sibling, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 17:50 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: qemu-devel, kvm, Gleb Natapov, Gerd Hoffmann

  On 08/04/2010 08:46 PM, Richard W.M. Jones wrote:
> On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote:
>> This is basically my suggestion to libguestfs: instead of generating
>> an initrd, generate a bootable cdrom, and boot from that.  The
>> result is faster and has a smaller memory footprint.  Everyone wins.
> We had some discussion of this upstream&  decided to do this.  It
> should save the time it takes for the guest kernel to unpack the
> initrd, so maybe another second off boot time, which could bring us
> ever closer to the "golden" 5 second boot target.
>

Great.  IMO it's the right thing even if initrd took zero time.

> It's not trivial mind you, and won't happen straightaway.  Part of it
> is that it requires reworking the appliance builder (a matter of just
> coding really).  The less trivial part is that we have to 'hide' the
> CD device throughout the publically available interfaces.  Then of
> course, a lot of testing.
>
> I will note that virt-install uses the -initrd interface for
> installing guests (large initrds too).  And I've talked with a
> sysadmin who was using -kernel and -initrd for deploying VM hosting.
> In his case he did it so he could centralize kernel distribution /
> updates, and have the guests use /dev/vda == filesystem which made
> provisioning easy [for him -- I would have used libguestfs ...].

We still plan to improve pio speed.

(note a few added seconds to guest install or bootup is not such a drag 
compared to the hit on an interactive tool like libguestfs).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 17:46                                                                   ` Richard W.M. Jones
  2010-08-04 17:50                                                                     ` Avi Kivity
@ 2010-08-04 18:13                                                                     ` Alexander Graf
  2010-08-04 18:16                                                                       ` Anthony Liguori
  2010-08-04 18:18                                                                       ` Avi Kivity
  1 sibling, 2 replies; 151+ messages in thread
From: Alexander Graf @ 2010-08-04 18:13 UTC (permalink / raw)
  To: Richard W.M.Jones
  Cc: Gleb Natapov, kvm, qemu-devel, Avi Kivity, Gerd Hoffmann


On 04.08.2010, at 19:46, Richard W.M. Jones wrote:

> On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote:
>> This is basically my suggestion to libguestfs: instead of generating
>> an initrd, generate a bootable cdrom, and boot from that.  The
>> result is faster and has a smaller memory footprint.  Everyone wins.
> 
> We had some discussion of this upstream & decided to do this.  It
> should save the time it takes for the guest kernel to unpack the
> initrd, so maybe another second off boot time, which could bring us
> ever closer to the "golden" 5 second boot target.
> 
> It's not trivial mind you, and won't happen straightaway.  Part of it
> is that it requires reworking the appliance builder (a matter of just
> coding really).  The less trivial part is that we have to 'hide' the
> CD device throughout the publically available interfaces.  Then of
> course, a lot of testing.

Why not go with 9p? That would save off even more time, as you don't have to generate an iso. You could just copy all the relevant executables into tmpfs and boot from there using your kernel and a very small (pre-built) initrd.

Alex

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 18:13                                                                     ` Alexander Graf
@ 2010-08-04 18:16                                                                       ` Anthony Liguori
  2010-08-04 18:18                                                                         ` Alexander Graf
  2010-08-04 18:19                                                                         ` Avi Kivity
  2010-08-04 18:18                                                                       ` Avi Kivity
  1 sibling, 2 replies; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 18:16 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M.Jones, Gerd Hoffmann,
	Avi Kivity

On 08/04/2010 01:13 PM, Alexander Graf wrote:
> On 04.08.2010, at 19:46, Richard W.M. Jones wrote:
>
>    
>> On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote:
>>      
>>> This is basically my suggestion to libguestfs: instead of generating
>>> an initrd, generate a bootable cdrom, and boot from that.  The
>>> result is faster and has a smaller memory footprint.  Everyone wins.
>>>        
>> We had some discussion of this upstream&  decided to do this.  It
>> should save the time it takes for the guest kernel to unpack the
>> initrd, so maybe another second off boot time, which could bring us
>> ever closer to the "golden" 5 second boot target.
>>
>> It's not trivial mind you, and won't happen straightaway.  Part of it
>> is that it requires reworking the appliance builder (a matter of just
>> coding really).  The less trivial part is that we have to 'hide' the
>> CD device throughout the publically available interfaces.  Then of
>> course, a lot of testing.
>>      
> Why not go with 9p? That would save off even more time, as you don't have to generate an iso. You could just copy all the relevant executables into tmpfs and boot from there using your kernel and a very small (pre-built) initrd.
>    

You can't boot from 9p.

Regards,

Anthony Liguori

> Alex
>
>    

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 18:16                                                                       ` Anthony Liguori
@ 2010-08-04 18:18                                                                         ` Alexander Graf
  2010-08-04 18:19                                                                         ` Avi Kivity
  1 sibling, 0 replies; 151+ messages in thread
From: Alexander Graf @ 2010-08-04 18:18 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M.Jones, Gerd Hoffmann,
	Avi Kivity


On 04.08.2010, at 20:16, Anthony Liguori wrote:

> On 08/04/2010 01:13 PM, Alexander Graf wrote:
>> On 04.08.2010, at 19:46, Richard W.M. Jones wrote:
>> 
>>   
>>> On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote:
>>>     
>>>> This is basically my suggestion to libguestfs: instead of generating
>>>> an initrd, generate a bootable cdrom, and boot from that.  The
>>>> result is faster and has a smaller memory footprint.  Everyone wins.
>>>>       
>>> We had some discussion of this upstream&  decided to do this.  It
>>> should save the time it takes for the guest kernel to unpack the
>>> initrd, so maybe another second off boot time, which could bring us
>>> ever closer to the "golden" 5 second boot target.
>>> 
>>> It's not trivial mind you, and won't happen straightaway.  Part of it
>>> is that it requires reworking the appliance builder (a matter of just
>>> coding really).  The less trivial part is that we have to 'hide' the
>>> CD device throughout the publically available interfaces.  Then of
>>> course, a lot of testing.
>>>     
>> Why not go with 9p? That would save off even more time, as you don't have to generate an iso. You could just copy all the relevant executables into tmpfs and boot from there using your kernel and a very small (pre-built) initrd.
>>   
> 
> You can't boot from 9p.

But you could still use -kernel and -initrd for that, no?


Alex

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 18:16                                                                       ` Anthony Liguori
  2010-08-04 18:18                                                                         ` Alexander Graf
@ 2010-08-04 18:19                                                                         ` Avi Kivity
  1 sibling, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 18:19 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gleb Natapov, kvm, Richard W.M.Jones, qemu-devel, Alexander Graf,
	Gerd Hoffmann

  On 08/04/2010 09:16 PM, Anthony Liguori wrote:
>> Why not go with 9p? That would save off even more time, as you don't 
>> have to generate an iso. You could just copy all the relevant 
>> executables into tmpfs and boot from there using your kernel and a 
>> very small (pre-built) initrd.
>
> You can't boot from 9p.
>

As Alex said, you boot from a non-100MB initrd (or cdrom) and mount the 
9pfs.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 18:13                                                                     ` Alexander Graf
  2010-08-04 18:16                                                                       ` Anthony Liguori
@ 2010-08-04 18:18                                                                       ` Avi Kivity
  1 sibling, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 18:18 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M.Jones, Gerd Hoffmann

  On 08/04/2010 09:13 PM, Alexander Graf wrote:
>
>> It's not trivial mind you, and won't happen straightaway.  Part of it
>> is that it requires reworking the appliance builder (a matter of just
>> coding really).  The less trivial part is that we have to 'hide' the
>> CD device throughout the publically available interfaces.  Then of
>> course, a lot of testing.
> Why not go with 9p? That would save off even more time, as you don't have to generate an iso. You could just copy all the relevant executables into tmpfs and boot from there using your kernel and a very small (pre-built) initrd.

Yes - and you don't need to copy, just hardlink if your /tmp and /usr 
are on the same filesystem.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:30                                                               ` Avi Kivity
  2010-08-04 16:36                                                                 ` Avi Kivity
@ 2010-08-04 16:42                                                                 ` Anthony Liguori
  1 sibling, 0 replies; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 16:42 UTC (permalink / raw)
  To: Avi Kivity
  Cc: qemu-devel, kvm, Gerd Hoffmann, Gleb Natapov, Richard W.M. Jones

On 08/04/2010 11:30 AM, Avi Kivity wrote:
>  On 08/04/2010 04:52 PM, Anthony Liguori wrote:
>>>>
>>> This is not like DMA event if done in chunks and chunks can be pretty
>>> big. The code that dials with copying may temporary unmap some pci
>>> devices to have more space there.
>>
>>
>> That's a bit complicated because SeaBIOS is managing the PCI devices 
>> whereas the kernel code is running as an option rom.  I don't know 
>> the BIOS PCI interfaces well so I don't know how doable this is.
>>
>> Maybe we're just being too fancy here.
>>
>> We could rewrite -kernel/-append/-initrd to just generate a floppy 
>> image in RAM, and just boot from floppy.
>
> How could this work?  the RAM belongs to SeaBIOS immediately after 
> reset, it would just scribble over it.  Or worse, not scribble on it 
> until some date in the future.

I mean host RAM, not guest RAM.

Regards,

Anthony Liguori

>
> -kernel data has to find its way to memory after the bios gives 
> control to some optionrom.  An alternative would be to embed knowledge 
> of -kernel in seabios, but I don't think it's a good one.
>

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 13:07                                                       ` Gleb Natapov
  2010-08-04 13:15                                                         ` Anthony Liguori
@ 2010-08-04 13:22                                                         ` Richard W.M. Jones
  2010-08-04 13:29                                                           ` Gleb Natapov
  1 sibling, 1 reply; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-04 13:22 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: qemu-devel, kvm, Avi Kivity, Gerd Hoffmann


On Wed, Aug 04, 2010 at 04:07:09PM +0300, Gleb Natapov wrote:
> On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
> > On 08/04/2010 03:17 AM, Avi Kivity wrote:
> > >For playing games, there are three options:
> > >- existing fwcfg
> > >- fwcfg+dma
> > >- put roms in 4GB-2MB (or whatever we decide the flash size is)
> > >and have the BIOS copy them
> > >
> > >Existing fwcfg is the least amount of work and probably
> > >satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
> > >High memory flash is the most hardware-like solution, pretty easy
> > >from a qemu point of view but requires more work.
> > 
> > The only trouble I see is that high memory isn't always available.
> > If it's a 32-bit PC and you've exhausted RAM space, then you're only
> > left with the PCI hole and it's not clear to me if you can really
> > pull out 100mb of space there as an option ROM without breaking
> > something.
> > 
> We can map it on demand. Guest tells qemu to map rom "A" to address X by
> writing into some io port. Guest copies rom. Guest tells qemu to unmap
> it. Better then DMA interface IMHO.

I think this is a fine idea.  Do you want me to try to implement
something like this?  (I'm on holiday this week and next week at
the KVM Forum, so it won't be for a while ...)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 13:22                                                         ` Richard W.M. Jones
@ 2010-08-04 13:29                                                           ` Gleb Natapov
  0 siblings, 0 replies; 151+ messages in thread
From: Gleb Natapov @ 2010-08-04 13:29 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: qemu-devel, kvm, Avi Kivity, Gerd Hoffmann

On Wed, Aug 04, 2010 at 02:22:29PM +0100, Richard W.M. Jones wrote:
> 
> On Wed, Aug 04, 2010 at 04:07:09PM +0300, Gleb Natapov wrote:
> > On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote:
> > > On 08/04/2010 03:17 AM, Avi Kivity wrote:
> > > >For playing games, there are three options:
> > > >- existing fwcfg
> > > >- fwcfg+dma
> > > >- put roms in 4GB-2MB (or whatever we decide the flash size is)
> > > >and have the BIOS copy them
> > > >
> > > >Existing fwcfg is the least amount of work and probably
> > > >satisfactory for isapc.  fwcfg+dma is IMO going off a tangent.
> > > >High memory flash is the most hardware-like solution, pretty easy
> > > >from a qemu point of view but requires more work.
> > > 
> > > The only trouble I see is that high memory isn't always available.
> > > If it's a 32-bit PC and you've exhausted RAM space, then you're only
> > > left with the PCI hole and it's not clear to me if you can really
> > > pull out 100mb of space there as an option ROM without breaking
> > > something.
> > > 
> > We can map it on demand. Guest tells qemu to map rom "A" to address X by
> > writing into some io port. Guest copies rom. Guest tells qemu to unmap
> > it. Better then DMA interface IMHO.
> 
> I think this is a fine idea.  Do you want me to try to implement
> something like this?  (I'm on holiday this week and next week at
> the KVM Forum, so it won't be for a while ...)
> 
I wouldn't do that without principal agreement from Avi and Anthony :)

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 13:04                                                     ` Anthony Liguori
  2010-08-04 13:07                                                       ` Gleb Natapov
@ 2010-08-04 16:25                                                       ` Avi Kivity
  1 sibling, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 16:25 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, kvm, Gerd Hoffmann, Gleb Natapov, Richard W.M. Jones

  On 08/04/2010 04:04 PM, Anthony Liguori wrote:
> On 08/04/2010 03:17 AM, Avi Kivity wrote:
>> For playing games, there are three options:
>> - existing fwcfg
>> - fwcfg+dma
>> - put roms in 4GB-2MB (or whatever we decide the flash size is) and 
>> have the BIOS copy them
>>
>> Existing fwcfg is the least amount of work and probably satisfactory 
>> for isapc.  fwcfg+dma is IMO going off a tangent.  High memory flash 
>> is the most hardware-like solution, pretty easy from a qemu point of 
>> view but requires more work.
>
> The only trouble I see is that high memory isn't always available.  If 
> it's a 32-bit PC and you've exhausted RAM space, then you're only left 
> with the PCI hole and it's not clear to me if you can really pull out 
> 100mb of space there as an option ROM without breaking something.
>

100MB is out of the question, certainly.  I'm talking about your isapc 
problem, not about a cdrom replacement.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 19:24                                           ` Avi Kivity
  2010-08-03 19:38                                             ` Anthony Liguori
  2010-08-03 21:20                                             ` Gerd Hoffmann
@ 2010-08-03 22:06                                             ` Richard W.M. Jones
  2010-08-04  5:54                                               ` Avi Kivity
  2 siblings, 1 reply; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-03 22:06 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Gleb Natapov, qemu-devel

On Tue, Aug 03, 2010 at 10:24:41PM +0300, Avi Kivity wrote:
> Why do we need to transfer roms?  These are devices on the memory
> bus or pci bus, it just needs to be there at the right address.
> Boot splash should just be another rom as it would be on a real
> system.

Just like the initrd?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://et.redhat.com/~rjones/libguestfs/
See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 22:06                                             ` Richard W.M. Jones
@ 2010-08-04  5:54                                               ` Avi Kivity
  2010-08-04  9:24                                                 ` Richard W.M. Jones
  0 siblings, 1 reply; 151+ messages in thread
From: Avi Kivity @ 2010-08-04  5:54 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: kvm, Gleb Natapov, qemu-devel

  On 08/04/2010 01:06 AM, Richard W.M. Jones wrote:
> On Tue, Aug 03, 2010 at 10:24:41PM +0300, Avi Kivity wrote:
>> Why do we need to transfer roms?  These are devices on the memory
>> bus or pci bus, it just needs to be there at the right address.
>> Boot splash should just be another rom as it would be on a real
>> system.
> Just like the initrd?

There isn't enough address space for a 100MB initrd in ROM.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04  5:54                                               ` Avi Kivity
@ 2010-08-04  9:24                                                 ` Richard W.M. Jones
  2010-08-04  9:27                                                   ` Gleb Natapov
                                                                     ` (2 more replies)
  0 siblings, 3 replies; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-04  9:24 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Gleb Natapov, qemu-devel

On Wed, Aug 04, 2010 at 08:54:35AM +0300, Avi Kivity wrote:
>  On 08/04/2010 01:06 AM, Richard W.M. Jones wrote:
> >On Tue, Aug 03, 2010 at 10:24:41PM +0300, Avi Kivity wrote:
> >>Why do we need to transfer roms?  These are devices on the memory
> >>bus or pci bus, it just needs to be there at the right address.
> >>Boot splash should just be another rom as it would be on a real
> >>system.
> >Just like the initrd?
> 
> There isn't enough address space for a 100MB initrd in ROM.

Because of limits of the original PC, sure, where you had to fit
everything in 0xa0000-0xfffff or whatever it was.

But this isn't a real PC.

You can map the read-only memory anywhere you want.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://et.redhat.com/~rjones/libguestfs/
See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04  9:24                                                 ` Richard W.M. Jones
@ 2010-08-04  9:27                                                   ` Gleb Natapov
  2010-08-04  9:52                                                   ` Avi Kivity
  2010-08-04 12:59                                                   ` Anthony Liguori
  2 siblings, 0 replies; 151+ messages in thread
From: Gleb Natapov @ 2010-08-04  9:27 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: kvm, Avi Kivity, qemu-devel

On Wed, Aug 04, 2010 at 10:24:28AM +0100, Richard W.M. Jones wrote:
> On Wed, Aug 04, 2010 at 08:54:35AM +0300, Avi Kivity wrote:
> >  On 08/04/2010 01:06 AM, Richard W.M. Jones wrote:
> > >On Tue, Aug 03, 2010 at 10:24:41PM +0300, Avi Kivity wrote:
> > >>Why do we need to transfer roms?  These are devices on the memory
> > >>bus or pci bus, it just needs to be there at the right address.
> > >>Boot splash should just be another rom as it would be on a real
> > >>system.
> > >Just like the initrd?
> > 
> > There isn't enough address space for a 100MB initrd in ROM.
> 
> Because of limits of the original PC, sure, where you had to fit
> everything in 0xa0000-0xfffff or whatever it was.
> 
> But this isn't a real PC.
> 
In what way it is not?

> You can map the read-only memory anywhere you want.
> 
You can't. Guests expects certain memory layouts.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04  9:24                                                 ` Richard W.M. Jones
  2010-08-04  9:27                                                   ` Gleb Natapov
@ 2010-08-04  9:52                                                   ` Avi Kivity
  2010-08-04 11:33                                                     ` Richard W.M. Jones
  2010-08-04 12:59                                                   ` Anthony Liguori
  2 siblings, 1 reply; 151+ messages in thread
From: Avi Kivity @ 2010-08-04  9:52 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: kvm, Gleb Natapov, qemu-devel

  On 08/04/2010 12:24 PM, Richard W.M. Jones wrote:
>>>
>>> Just like the initrd?
>> There isn't enough address space for a 100MB initrd in ROM.
> Because of limits of the original PC, sure, where you had to fit
> everything in 0xa0000-0xfffff or whatever it was.
>
> But this isn't a real PC.
>
> You can map the read-only memory anywhere you want.

I wasn't talking about the 1MB limit, rather the 4GB limit.  Of that, 
3-3.5GB are reserved for RAM, 0.5-1GB for PCI.  Putting large amounts of 
ROM in that space will cost us PCI space.

100 MB initrds are a bad idea for multiple reasons.  Demand paging is 
there for a reason.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04  9:52                                                   ` Avi Kivity
@ 2010-08-04 11:33                                                     ` Richard W.M. Jones
  2010-08-04 11:36                                                       ` Avi Kivity
  2010-08-04 12:07                                                       ` Gleb Natapov
  0 siblings, 2 replies; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-04 11:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Gleb Natapov, qemu-devel

On Wed, Aug 04, 2010 at 12:52:23PM +0300, Avi Kivity wrote:
>  On 08/04/2010 12:24 PM, Richard W.M. Jones wrote:
> >>>
> >>>Just like the initrd?
> >>There isn't enough address space for a 100MB initrd in ROM.
> >Because of limits of the original PC, sure, where you had to fit
> >everything in 0xa0000-0xfffff or whatever it was.
> >
> >But this isn't a real PC.
> >
> >You can map the read-only memory anywhere you want.
> 
> I wasn't talking about the 1MB limit, rather the 4GB limit.  Of
> that, 3-3.5GB are reserved for RAM, 0.5-1GB for PCI.  Putting large
> amounts of ROM in that space will cost us PCI space.

I'm only allocating 500MB of RAM, so there's easily enough space to
put a large ROM, with tons of room for growth (of both RAM and ROM).
Yes, even real hardware has done this.  The Weitek math copro mapped
itself in at physical memory addresses c0000000 (a 32 MB window IIRC).

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://et.redhat.com/~rjones/virt-top

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 11:33                                                     ` Richard W.M. Jones
@ 2010-08-04 11:36                                                       ` Avi Kivity
  2010-08-04 12:07                                                       ` Gleb Natapov
  1 sibling, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 11:36 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: kvm, Gleb Natapov, qemu-devel

  On 08/04/2010 02:33 PM, Richard W.M. Jones wrote:
>
> I'm only allocating 500MB of RAM, so there's easily enough space to
> put a large ROM, with tons of room for growth (of both RAM and ROM).
> Yes, even real hardware has done this.  The Weitek math copro mapped
> itself in at physical memory addresses c0000000 (a 32 MB window IIRC).

I'm sure it will work for your use case, but it becomes a feature that 
only works if you have a guest with a small amount of memory and few pci 
devices.  With a larger guest it fails.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 11:33                                                     ` Richard W.M. Jones
  2010-08-04 11:36                                                       ` Avi Kivity
@ 2010-08-04 12:07                                                       ` Gleb Natapov
  1 sibling, 0 replies; 151+ messages in thread
From: Gleb Natapov @ 2010-08-04 12:07 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: kvm, Avi Kivity, qemu-devel

On Wed, Aug 04, 2010 at 12:33:18PM +0100, Richard W.M. Jones wrote:
> On Wed, Aug 04, 2010 at 12:52:23PM +0300, Avi Kivity wrote:
> >  On 08/04/2010 12:24 PM, Richard W.M. Jones wrote:
> > >>>
> > >>>Just like the initrd?
> > >>There isn't enough address space for a 100MB initrd in ROM.
> > >Because of limits of the original PC, sure, where you had to fit
> > >everything in 0xa0000-0xfffff or whatever it was.
> > >
> > >But this isn't a real PC.
> > >
> > >You can map the read-only memory anywhere you want.
> > 
> > I wasn't talking about the 1MB limit, rather the 4GB limit.  Of
> > that, 3-3.5GB are reserved for RAM, 0.5-1GB for PCI.  Putting large
> > amounts of ROM in that space will cost us PCI space.
> 
> I'm only allocating 500MB of RAM, so there's easily enough space to
> put a large ROM, with tons of room for growth (of both RAM and ROM).
> Yes, even real hardware has done this.  The Weitek math copro mapped
> itself in at physical memory addresses c0000000 (a 32 MB window IIRC).
> 
c0000000 is 3G. This is where PCI windows starts usually  (configurable
in the chipset). Don't see anything unusual in this particular HW.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04  9:24                                                 ` Richard W.M. Jones
  2010-08-04  9:27                                                   ` Gleb Natapov
  2010-08-04  9:52                                                   ` Avi Kivity
@ 2010-08-04 12:59                                                   ` Anthony Liguori
  2 siblings, 0 replies; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 12:59 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: kvm, Avi Kivity, Gleb Natapov, qemu-devel

On 08/04/2010 04:24 AM, Richard W.M. Jones wrote:
> On Wed, Aug 04, 2010 at 08:54:35AM +0300, Avi Kivity wrote:
>    
>>   On 08/04/2010 01:06 AM, Richard W.M. Jones wrote:
>>      
>>> On Tue, Aug 03, 2010 at 10:24:41PM +0300, Avi Kivity wrote:
>>>        
>>>> Why do we need to transfer roms?  These are devices on the memory
>>>> bus or pci bus, it just needs to be there at the right address.
>>>> Boot splash should just be another rom as it would be on a real
>>>> system.
>>>>          
>>> Just like the initrd?
>>>        
>> There isn't enough address space for a 100MB initrd in ROM.
>>      
> Because of limits of the original PC, sure, where you had to fit
> everything in 0xa0000-0xfffff or whatever it was.
>
> But this isn't a real PC.
>
> You can map the read-only memory anywhere you want.
>    

It's not that simple.  Option roms are initialized in 16-bit mode so the 
physical address space is limited.  The address mappings have very well 
defined semantics.

Regards,

Anthony Liguori

> Rich.
>
>    

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 19:15                                         ` Anthony Liguori
  2010-08-03 19:24                                           ` Avi Kivity
@ 2010-08-03 19:26                                           ` Gleb Natapov
  1 sibling, 0 replies; 151+ messages in thread
From: Gleb Natapov @ 2010-08-03 19:26 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, Avi Kivity, kvm, Richard W.M. Jones

On Tue, Aug 03, 2010 at 02:15:05PM -0500, Anthony Liguori wrote:
> On 08/03/2010 02:05 PM, Gleb Natapov wrote:
> >On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote:
> >>>If Richard is willing to do the work to make -kernel perform
> >>>faster in such a way that it fits into the overall mission of what
> >>>we're building, then I see no reason to reject it.  The criteria
> >>>for evaluating a patch should only depend on how it affects other
> >>>areas of qemu and whether it impacts overall usability.
> >>That's true, but extending fwcfg doesn't fit into the overall
> >>picture well.  We have well defined interfaces for pushing data into
> >>a guest: virtio-serial (dma upload), virtio-blk (adds demand
> >>paging), and virtio-p9fs (no image needed).  Adapting libguestfs to
> >>use one of these is a better move than adding yet another interface.
> >>
> >+1. I already proposed that. Nobody objects against fast fast
> >communication channel between guest and host. In fact we have one:
> >virtio-serial. Of course it is much easier to hack dma semantic into
> >fw_cfg interface than add virtio-serial to seabios, but it doesn't make
> >it right. Does virtio-serial has to be exposed as PCI to a guest or can
> >we expose it as ISA device too in case someone want to use -kernel option
> >but do not see additional PCI device in a guest?
> 
> fw_cfg has to be available pretty early on so relying on a PCI
> device isn't reasonable.  Having dual interfaces seems wasteful.
> 
fw_cfg wasn't mean to be used for bulk transfers (seabios doesn't even
use string pio to access it which make load time 50 times slower that
what Richard reports). It was meant to be easy to use on very early
stages of booting. Kernel/initrd are loaded on very late stage of
booting at which point PCI is fully initialized.

> We're already doing bulk data transfer over fw_cfg as we need to do
> it to transfer roms and potentially a boot splash.  Even outside of
> loading an initrd, the performance is going to start to matter with
> a large number of devices.
> 
Most roms are loaded from rom PIC bars, so this leaves us with boot
splash, but boot splash image should be relatively small and if user
wants it he does not care about boot time already since bios need to
pause to show the boot splash anyway.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 18:43                                     ` Avi Kivity
                                                         ` (2 preceding siblings ...)
  2010-08-03 19:05                                       ` Gleb Natapov
@ 2010-08-03 19:13                                       ` Richard W.M. Jones
  2010-08-03 19:17                                         ` Gleb Natapov
                                                           ` (3 more replies)
  2010-08-04 14:51                                       ` David S. Ahern
  4 siblings, 4 replies; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-03 19:13 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gleb Natapov, qemu-devel, kvm

On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote:
> libguestfs does not depend on an x86 architectural feature.
> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We
> should discourage people from depending on this interface for
> production use.

I really don't get this whole thing where we must slavishly
emulate an exact PC ...

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://et.redhat.com/~rjones/virt-top

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 19:13                                       ` Richard W.M. Jones
@ 2010-08-03 19:17                                         ` Gleb Natapov
  2010-08-03 19:19                                         ` Anthony Liguori
                                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 151+ messages in thread
From: Gleb Natapov @ 2010-08-03 19:17 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: kvm, Avi Kivity, qemu-devel

On Tue, Aug 03, 2010 at 08:13:46PM +0100, Richard W.M. Jones wrote:
> On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote:
> > libguestfs does not depend on an x86 architectural feature.
> > qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We
> > should discourage people from depending on this interface for
> > production use.
> 
> I really don't get this whole thing where we must slavishly
> emulate an exact PC ...
> 
May be because you don't have to dial with consequences of not doing so?

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 19:13                                       ` Richard W.M. Jones
  2010-08-03 19:17                                         ` Gleb Natapov
@ 2010-08-03 19:19                                         ` Anthony Liguori
  2010-08-03 19:22                                         ` Avi Kivity
  2010-08-04  8:21                                         ` Avi Kivity
  3 siblings, 0 replies; 151+ messages in thread
From: Anthony Liguori @ 2010-08-03 19:19 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: kvm, Avi Kivity, Gleb Natapov, qemu-devel

On 08/03/2010 02:13 PM, Richard W.M. Jones wrote:
> On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote:
>    
>> libguestfs does not depend on an x86 architectural feature.
>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We
>> should discourage people from depending on this interface for
>> production use.
>>      
> I really don't get this whole thing where we must slavishly
> emulate an exact PC ...
>    

History has shown that when we deviate, we usually get it wrong and it 
becomes very painful to fix.

Regards,

Anthony Liguori

> Rich.
>
>    

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 19:13                                       ` Richard W.M. Jones
  2010-08-03 19:17                                         ` Gleb Natapov
  2010-08-03 19:19                                         ` Anthony Liguori
@ 2010-08-03 19:22                                         ` Avi Kivity
  2010-08-03 20:00                                           ` Richard W.M. Jones
  2010-08-04  8:21                                         ` Avi Kivity
  3 siblings, 1 reply; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 19:22 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Gleb Natapov, qemu-devel, kvm

  On 08/03/2010 10:13 PM, Richard W.M. Jones wrote:
> On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote:
>> libguestfs does not depend on an x86 architectural feature.
>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We
>> should discourage people from depending on this interface for
>> production use.
> I really don't get this whole thing where we must slavishly
> emulate an exact PC ...

This has two motivations:

- documented interfaces: we suck at documentation.  We seldom document.  
Even when we do document something, the documentation is often 
inaccurate, misleading, and incomplete.  While an "exact PC" 
unfortunately doesn't exist, it's a lot closer to reality than, say, an 
"exact Linux syscall interface".  If we adopt an existing interface, we 
already have the documentation, and if there's a conflict between the 
documentation and our implementation, it's clear who wins (well, not 
always).

- preexisting guests: if we design a new interface, we get to update all 
guests; and there are many of them.  Whereas an "exact PC" will be seen 
by the guest vendors as well who will then add whatever support is 
necessary.

Obviously we break this when we have to, but we don't, we shouldn't.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 19:22                                         ` Avi Kivity
@ 2010-08-03 20:00                                           ` Richard W.M. Jones
  2010-08-03 20:49                                             ` Anthony Liguori
  2010-08-04  1:17                                             ` Jamie Lokier
  0 siblings, 2 replies; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-03 20:00 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gleb Natapov, qemu-devel, kvm

On Tue, Aug 03, 2010 at 10:22:22PM +0300, Avi Kivity wrote:
>  On 08/03/2010 10:13 PM, Richard W.M. Jones wrote:
> >On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote:
> >>libguestfs does not depend on an x86 architectural feature.
> >>qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We
> >>should discourage people from depending on this interface for
> >>production use.
> >I really don't get this whole thing where we must slavishly
> >emulate an exact PC ...
> 
> This has two motivations:
> 
> - documented interfaces: we suck at documentation.  We seldom
> document.  Even when we do document something, the documentation is
> often inaccurate, misleading, and incomplete.  While an "exact PC"
> unfortunately doesn't exist, it's a lot closer to reality than, say,
> an "exact Linux syscall interface".  If we adopt an existing
> interface, we already have the documentation, and if there's a
> conflict between the documentation and our implementation, it's
> clear who wins (well, not always).
> 
> - preexisting guests: if we design a new interface, we get to update
> all guests; and there are many of them.  Whereas an "exact PC" will
> be seen by the guest vendors as well who will then add whatever
> support is necessary.

On the other hand we end up with stuff like only being able to add 29
virtio-blk devices to a single guest.  As best as I can tell, this
comes from PCI, and this limit required a bunch of hacks when
implementing virt-df.

These are reasonable motivations, but I think they are partially about
us:

We could document things better and make things future-proof.  I'm
surprised by how lacking the doc requirements are for qemu (compared
to, hmm, libguestfs for example).

We could demand that OSes write device drivers for more qemu devices
-- already OS vendors write thousands of device drivers for all sorts
of obscure devices, so this isn't really much of a demand for them.
In fact, they're already doing it.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 20:00                                           ` Richard W.M. Jones
@ 2010-08-03 20:49                                             ` Anthony Liguori
  2010-08-03 21:13                                               ` Paolo Bonzini
  2010-08-04  5:56                                               ` Avi Kivity
  2010-08-04  1:17                                             ` Jamie Lokier
  1 sibling, 2 replies; 151+ messages in thread
From: Anthony Liguori @ 2010-08-03 20:49 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: kvm, Avi Kivity, Gleb Natapov, qemu-devel

On 08/03/2010 03:00 PM, Richard W.M. Jones wrote:
> On Tue, Aug 03, 2010 at 10:22:22PM +0300, Avi Kivity wrote:
>    
>>   On 08/03/2010 10:13 PM, Richard W.M. Jones wrote:
>>      
>>> On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote:
>>>        
>>>> libguestfs does not depend on an x86 architectural feature.
>>>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We
>>>> should discourage people from depending on this interface for
>>>> production use.
>>>>          
>>> I really don't get this whole thing where we must slavishly
>>> emulate an exact PC ...
>>>        
>> This has two motivations:
>>
>> - documented interfaces: we suck at documentation.  We seldom
>> document.  Even when we do document something, the documentation is
>> often inaccurate, misleading, and incomplete.  While an "exact PC"
>> unfortunately doesn't exist, it's a lot closer to reality than, say,
>> an "exact Linux syscall interface".  If we adopt an existing
>> interface, we already have the documentation, and if there's a
>> conflict between the documentation and our implementation, it's
>> clear who wins (well, not always).
>>
>> - preexisting guests: if we design a new interface, we get to update
>> all guests; and there are many of them.  Whereas an "exact PC" will
>> be seen by the guest vendors as well who will then add whatever
>> support is necessary.
>>      
> On the other hand we end up with stuff like only being able to add 29
> virtio-blk devices to a single guest.  As best as I can tell, this
> comes from PCI

No, this comes from us being too clever for our own good and not 
following the way hardware does it.

All modern systems keep disks on their own dedicated bus.  In 
virtio-blk, we have a 1-1 relationship between disks and PCI devices.  
That's a perfect example of what happens when we try to "improve" things.

> , and this limit required a bunch of hacks when
> implementing virt-df.
>
> These are reasonable motivations, but I think they are partially about
> us:
>
> We could document things better and make things future-proof.  I'm
> surprised by how lacking the doc requirements are for qemu (compared
> to, hmm, libguestfs for example).
>    

We enjoy complaining about our lack of documentation more than we like 
actually writing documentation.

> We could demand that OSes write device drivers for more qemu devices
> -- already OS vendors write thousands of device drivers for all sorts
> of obscure devices, so this isn't really much of a demand for them.
> In fact, they're already doing it.
>    

So far, MS hasn't quite gotten the clue yet that they should write 
device drivers for qemu :-)  In fact, noone has.

Regards,

Anthony Liguori

> Rich.
>
>    

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 20:49                                             ` Anthony Liguori
@ 2010-08-03 21:13                                               ` Paolo Bonzini
  2010-08-03 21:34                                                 ` Anthony Liguori
  2010-08-04  5:56                                               ` Avi Kivity
  1 sibling, 1 reply; 151+ messages in thread
From: Paolo Bonzini @ 2010-08-03 21:13 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, kvm, Richard W.M. Jones, Gleb Natapov, Avi Kivity

On 08/03/2010 10:49 PM, Anthony Liguori wrote:
>> On the other hand we end up with stuff like only being able to add 29
>> virtio-blk devices to a single guest.  As best as I can tell, this
>> comes from PCI
>
> No, this comes from us being too clever for our own good and not
> following the way hardware does it.
>
> All modern systems keep disks on their own dedicated bus.  In
> virtio-blk, we have a 1-1 relationship between disks and PCI devices.
> That's a perfect example of what happens when we try to "improve" things.

Comparing (from personal experience) the complexity of the Windows 
drivers for Xen and virtio shows that it's not a bad idea at all.

Paolo

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 21:13                                               ` Paolo Bonzini
@ 2010-08-03 21:34                                                 ` Anthony Liguori
  2010-08-04  7:57                                                   ` Paolo Bonzini
  0 siblings, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-03 21:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, kvm, Richard W.M. Jones, Gleb Natapov, Avi Kivity

On 08/03/2010 04:13 PM, Paolo Bonzini wrote:
> On 08/03/2010 10:49 PM, Anthony Liguori wrote:
>>> On the other hand we end up with stuff like only being able to add 29
>>> virtio-blk devices to a single guest.  As best as I can tell, this
>>> comes from PCI
>>
>> No, this comes from us being too clever for our own good and not
>> following the way hardware does it.
>>
>> All modern systems keep disks on their own dedicated bus.  In
>> virtio-blk, we have a 1-1 relationship between disks and PCI devices.
>> That's a perfect example of what happens when we try to "improve" 
>> things.
>
> Comparing (from personal experience) the complexity of the Windows 
> drivers for Xen and virtio shows that it's not a bad idea at all.

Not quite sure what you're suggesting, but I could have been clearer.  
Instead of having virtio-blk where a virtio disk has a 1-1 mapping to a 
PCI device, we probably should have just done virtio-scsi.

Since most OSes have a SCSI-centric block layer, it would have resulted 
in much simpler drivers and we could support more than 1 disk per PCI 
slot.  I had thought Christoph was working on such a device at some 
point in time...

Regards,

Anthony Liguori

>
> Paolo

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 21:34                                                 ` Anthony Liguori
@ 2010-08-04  7:57                                                   ` Paolo Bonzini
  2010-08-04  8:19                                                     ` Avi Kivity
  2010-08-04 12:53                                                     ` Anthony Liguori
  0 siblings, 2 replies; 151+ messages in thread
From: Paolo Bonzini @ 2010-08-04  7:57 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, kvm, Richard W.M. Jones, Gleb Natapov, Avi Kivity

On 08/03/2010 11:34 PM, Anthony Liguori wrote:
>>
>> Comparing (from personal experience) the complexity of the Windows
>> drivers for Xen and virtio shows that it's not a bad idea at all.
>
> Not quite sure what you're suggesting, but I could have been clearer.
> Instead of having virtio-blk where a virtio disk has a 1-1 mapping to a
> PCI device, we probably should have just done virtio-scsi.

If you did virtio-scsi you might have as well ditched virtio-pci 
altogether and provide a single PCI device just like Xen does.  Just 
make your network device also speak SCSI (which is actually in the 
spec...), and the same for serial devices.

But now your driver that has to implement its own hot-plug/hot-unplug 
mechanism rather than deferring it to the PCI subsystem of the OS (like 
Xen), greatly adding to the complication.  In fact, a SCSI controller's 
firmware has a lot of other communication channels with the driver 
besides SCSI commands, and all this would be mapped into additional 
complexity on both the host side and the guest side.  Yet another 
reminder of Xen.

Despite the shortcomings, I think virtio-pci is the best example of 
balancing PV-specific aspects (do not make things too complicated) and 
"real world" aspects (do not invent new buses and the like).

Paolo

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04  7:57                                                   ` Paolo Bonzini
@ 2010-08-04  8:19                                                     ` Avi Kivity
  2010-08-04 12:53                                                     ` Anthony Liguori
  1 sibling, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04  8:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Gleb Natapov, Richard W.M. Jones, qemu-devel

  On 08/04/2010 10:57 AM, Paolo Bonzini wrote:
> On 08/03/2010 11:34 PM, Anthony Liguori wrote:
>>>
>>> Comparing (from personal experience) the complexity of the Windows
>>> drivers for Xen and virtio shows that it's not a bad idea at all.
>>
>> Not quite sure what you're suggesting, but I could have been clearer.
>> Instead of having virtio-blk where a virtio disk has a 1-1 mapping to a
>> PCI device, we probably should have just done virtio-scsi.
>
> If you did virtio-scsi you might have as well ditched virtio-pci 
> altogether and provide a single PCI device just like Xen does.  Just 
> make your network device also speak SCSI (which is actually in the 
> spec...), and the same for serial devices.
>
> But now your driver that has to implement its own hot-plug/hot-unplug 
> mechanism rather than deferring it to the PCI subsystem of the OS 
> (like Xen), greatly adding to the complication.  In fact, a SCSI 
> controller's firmware has a lot of other communication channels with 
> the driver besides SCSI commands, and all this would be mapped into 
> additional complexity on both the host side and the guest side.  Yet 
> another reminder of Xen.
>
> Despite the shortcomings, I think virtio-pci is the best example of 
> balancing PV-specific aspects (do not make things too complicated) and 
> "real world" aspects (do not invent new buses and the like).

Making virtio-blk a controller doesn't involve much difficulty.  We add 
LUN to all requests, and send a configuration interrupt (which we 
already have) when a LUN is added or removed.  Add some config space for 
discovering available LUNs.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04  7:57                                                   ` Paolo Bonzini
  2010-08-04  8:19                                                     ` Avi Kivity
@ 2010-08-04 12:53                                                     ` Anthony Liguori
  2010-08-04 16:44                                                       ` Avi Kivity
  1 sibling, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 12:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, kvm, Richard W.M. Jones, Gleb Natapov, Avi Kivity

On 08/04/2010 02:57 AM, Paolo Bonzini wrote:
> On 08/03/2010 11:34 PM, Anthony Liguori wrote:
>>>
>>> Comparing (from personal experience) the complexity of the Windows
>>> drivers for Xen and virtio shows that it's not a bad idea at all.
>>
>> Not quite sure what you're suggesting, but I could have been clearer.
>> Instead of having virtio-blk where a virtio disk has a 1-1 mapping to a
>> PCI device, we probably should have just done virtio-scsi.
>
> If you did virtio-scsi you might have as well ditched virtio-pci 
> altogether and provide a single PCI device just like Xen does.  Just 
> make your network device also speak SCSI (which is actually in the 
> spec...), and the same for serial devices.
>
> But now your driver that has to implement its own hot-plug/hot-unplug 
> mechanism rather than deferring it to the PCI subsystem of the OS 
> (like Xen), greatly adding to the complication.  In fact, a SCSI 
> controller's firmware has a lot of other communication channels with 
> the driver besides SCSI commands, and all this would be mapped into 
> additional complexity on both the host side and the guest side.  Yet 
> another reminder of Xen.
>
> Despite the shortcomings, I think virtio-pci is the best example of 
> balancing PV-specific aspects (do not make things too complicated) and 
> "real world" aspects (do not invent new buses and the like).

So how do we enable support for more than 20 disks?  I think a 
virtio-scsi is inevitable..

Regards,

Anthony Liguori

> Paolo

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 12:53                                                     ` Anthony Liguori
@ 2010-08-04 16:44                                                       ` Avi Kivity
  2010-08-04 16:46                                                         ` Anthony Liguori
  0 siblings, 1 reply; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 16:44 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Paolo Bonzini, kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

  On 08/04/2010 03:53 PM, Anthony Liguori wrote:
>
> So how do we enable support for more than 20 disks?  I think a 
> virtio-scsi is inevitable..

Not only for large numbers of disks, also for JBOD performance.  If you 
have one queue per disk you'll have low queue depths and high interrupt 
rates.  Aggregating many spindles into a single queue is important for 
reducing overhead.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:44                                                       ` Avi Kivity
@ 2010-08-04 16:46                                                         ` Anthony Liguori
  2010-08-04 16:48                                                           ` Alexander Graf
  0 siblings, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 16:46 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Paolo Bonzini, kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

On 08/04/2010 11:44 AM, Avi Kivity wrote:
>  On 08/04/2010 03:53 PM, Anthony Liguori wrote:
>>
>> So how do we enable support for more than 20 disks?  I think a 
>> virtio-scsi is inevitable..
>
> Not only for large numbers of disks, also for JBOD performance.  If 
> you have one queue per disk you'll have low queue depths and high 
> interrupt rates.  Aggregating many spindles into a single queue is 
> important for reducing overhead.

Right, the only question is, to you inject your own bus or do you just 
reuse SCSI.  On the surface, it seems like reusing SCSI has a 
significant number of advantages.  For instance, without changing the 
guest's drivers, we can implement PV cdroms or PC tape drivers.  It also 
supports SCSI level pass through which is pretty nice for enabling 
things like NPIV.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:46                                                         ` Anthony Liguori
@ 2010-08-04 16:48                                                           ` Alexander Graf
  2010-08-04 16:49                                                             ` Anthony Liguori
  0 siblings, 1 reply; 151+ messages in thread
From: Alexander Graf @ 2010-08-04 16:48 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity,
	Paolo Bonzini


On 04.08.2010, at 18:46, Anthony Liguori wrote:

> On 08/04/2010 11:44 AM, Avi Kivity wrote:
>> On 08/04/2010 03:53 PM, Anthony Liguori wrote:
>>> 
>>> So how do we enable support for more than 20 disks?  I think a virtio-scsi is inevitable..
>> 
>> Not only for large numbers of disks, also for JBOD performance.  If you have one queue per disk you'll have low queue depths and high interrupt rates.  Aggregating many spindles into a single queue is important for reducing overhead.
> 
> Right, the only question is, to you inject your own bus or do you just reuse SCSI.  On the surface, it seems like reusing SCSI has a significant number of advantages.  For instance, without changing the guest's drivers, we can implement PV cdroms or PC tape drivers.

What exactly would keep us from doing that with virtio-blk? I thought that supports scsi commands already.


Alex

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:48                                                           ` Alexander Graf
@ 2010-08-04 16:49                                                             ` Anthony Liguori
  2010-08-04 16:51                                                               ` Alexander Graf
  2010-08-04 17:01                                                               ` Paolo Bonzini
  0 siblings, 2 replies; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 16:49 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity,
	Paolo Bonzini

On 08/04/2010 11:48 AM, Alexander Graf wrote:
> On 04.08.2010, at 18:46, Anthony Liguori wrote:
>
>    
>> On 08/04/2010 11:44 AM, Avi Kivity wrote:
>>      
>>> On 08/04/2010 03:53 PM, Anthony Liguori wrote:
>>>        
>>>> So how do we enable support for more than 20 disks?  I think a virtio-scsi is inevitable..
>>>>          
>>> Not only for large numbers of disks, also for JBOD performance.  If you have one queue per disk you'll have low queue depths and high interrupt rates.  Aggregating many spindles into a single queue is important for reducing overhead.
>>>        
>> Right, the only question is, to you inject your own bus or do you just reuse SCSI.  On the surface, it seems like reusing SCSI has a significant number of advantages.  For instance, without changing the guest's drivers, we can implement PV cdroms or PC tape drivers.
>>      
> What exactly would keep us from doing that with virtio-blk? I thought that supports scsi commands already.
>    

I think the toughest change would be making it appear as a scsi device 
within the guest.  You could do that to virtio-blk but it would be a 
flag day as reasonable configured guests will break.

Having virtio-blk device show up as /dev/vdX was a big mistake.  It's 
been nothing but a giant PITA.  There is an amazing amount of software 
that only looks at /dev/sd* and /dev/hd*.

Regards,

Anthony Liguori

> Alex
>
>    

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:49                                                             ` Anthony Liguori
@ 2010-08-04 16:51                                                               ` Alexander Graf
  2010-08-04 17:01                                                               ` Paolo Bonzini
  1 sibling, 0 replies; 151+ messages in thread
From: Alexander Graf @ 2010-08-04 16:51 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity,
	Paolo Bonzini


On 04.08.2010, at 18:49, Anthony Liguori wrote:

> On 08/04/2010 11:48 AM, Alexander Graf wrote:
>> On 04.08.2010, at 18:46, Anthony Liguori wrote:
>> 
>>   
>>> On 08/04/2010 11:44 AM, Avi Kivity wrote:
>>>     
>>>> On 08/04/2010 03:53 PM, Anthony Liguori wrote:
>>>>       
>>>>> So how do we enable support for more than 20 disks?  I think a virtio-scsi is inevitable..
>>>>>         
>>>> Not only for large numbers of disks, also for JBOD performance.  If you have one queue per disk you'll have low queue depths and high interrupt rates.  Aggregating many spindles into a single queue is important for reducing overhead.
>>>>       
>>> Right, the only question is, to you inject your own bus or do you just reuse SCSI.  On the surface, it seems like reusing SCSI has a significant number of advantages.  For instance, without changing the guest's drivers, we can implement PV cdroms or PC tape drivers.
>>>     
>> What exactly would keep us from doing that with virtio-blk? I thought that supports scsi commands already.
>>   
> 
> I think the toughest change would be making it appear as a scsi device within the guest.  You could do that to virtio-blk but it would be a flag day as reasonable configured guests will break.
> 
> Having virtio-blk device show up as /dev/vdX was a big mistake.  It's been nothing but a giant PITA.  There is an amazing amount of software that only looks at /dev/sd* and /dev/hd*.

I completely agree and yes, we should move in that direction IMHO. I don't see why virtio-blk should be any different from megasas for example.

Alex

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:49                                                             ` Anthony Liguori
  2010-08-04 16:51                                                               ` Alexander Graf
@ 2010-08-04 17:01                                                               ` Paolo Bonzini
  2010-08-04 17:19                                                                 ` Avi Kivity
  1 sibling, 1 reply; 151+ messages in thread
From: Paolo Bonzini @ 2010-08-04 17:01 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexander Graf, Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones,
	Avi Kivity

On 08/04/2010 06:49 PM, Anthony Liguori wrote:
>>> Right, the only question is, to you inject your own bus or do you
>>> just reuse SCSI.  On the surface, it seems like reusing SCSI has a
>>> significant number of advantages.  For instance, without changing the
>>> guest's drivers, we can implement PV cdroms or PC tape drivers.

If you want multiple LUNs per virtio device SCSI is obviously a good 
choice, but you will need something more (like the config space Avi 
mentioned).  My position is that getting this "something more" right is 
considerably harder than virtio-blk.

Maybe it will be done some day, but I still think that not having 
virtio-scsi from day 1 was actually a good thing.  Even if we can learn 
from xenbus and all that.

>> What exactly would keep us from doing that with virtio-blk? I thought
>> that supports scsi commands already.
>
> I think the toughest change would be making it appear as a scsi device
> within the guest.  You could do that to virtio-blk but it would be a
> flag day as reasonable configured guests will break.
>
> Having virtio-blk device show up as /dev/vdX was a big mistake.  It's
> been nothing but a giant PITA.  There is an amazing amount of software
> that only looks at /dev/sd* and /dev/hd*.

That's another story and I totally agree here, but not reusing /dev/sd* 
is not intrinsic in the design of virtio-blk (and one thing that Windows 
gets right; everything is SCSI, period).

Paolo

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 17:01                                                               ` Paolo Bonzini
@ 2010-08-04 17:19                                                                 ` Avi Kivity
  2010-08-04 17:25                                                                   ` Alexander Graf
  2010-08-04 17:27                                                                   ` Anthony Liguori
  0 siblings, 2 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 17:19 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, kvm, Richard W.M. Jones, qemu-devel, Alexander Graf

  On 08/04/2010 08:01 PM, Paolo Bonzini wrote:
>
> That's another story and I totally agree here, but not reusing 
> /dev/sd* is not intrinsic in the design of virtio-blk (and one thing 
> that Windows gets right; everything is SCSI, period).
>

I don't really get why everything must be SCSI.  Everything must support 
read, write, a few other commands, and a large set of optional 
commands.  But why map them all to SCSI?  What's the magic?

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 17:19                                                                 ` Avi Kivity
@ 2010-08-04 17:25                                                                   ` Alexander Graf
  2010-08-04 17:27                                                                   ` Anthony Liguori
  1 sibling, 0 replies; 151+ messages in thread
From: Alexander Graf @ 2010-08-04 17:25 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Gleb Natapov, Richard W.M. Jones, qemu-devel, Paolo Bonzini


On 04.08.2010, at 19:19, Avi Kivity wrote:

> On 08/04/2010 08:01 PM, Paolo Bonzini wrote:
>> 
>> That's another story and I totally agree here, but not reusing /dev/sd* is not intrinsic in the design of virtio-blk (and one thing that Windows gets right; everything is SCSI, period).
>> 
> 
> I don't really get why everything must be SCSI.  Everything must support read, write, a few other commands, and a large set of optional commands.  But why map them all to SCSI?  What's the magic?

Hence the reference to megasas. It implements its own read/write/few other commands and the whole stack of optional commands as SCSI. I think virtio-blk should be the same.

SCSI simply because it's there, it's flexible and it's well defined. You get a working spec and a lot of working implementations.


Alex

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 17:19                                                                 ` Avi Kivity
  2010-08-04 17:25                                                                   ` Alexander Graf
@ 2010-08-04 17:27                                                                   ` Anthony Liguori
  2010-08-04 17:37                                                                     ` Avi Kivity
  1 sibling, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 17:27 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Gleb Natapov, kvm, Richard W.M. Jones, qemu-devel, Alexander Graf,
	Paolo Bonzini

On 08/04/2010 12:19 PM, Avi Kivity wrote:
>  On 08/04/2010 08:01 PM, Paolo Bonzini wrote:
>>
>> That's another story and I totally agree here, but not reusing 
>> /dev/sd* is not intrinsic in the design of virtio-blk (and one thing 
>> that Windows gets right; everything is SCSI, period).
>>
>
> I don't really get why everything must be SCSI.  Everything must 
> support read, write, a few other commands, and a large set of optional 
> commands.  But why map them all to SCSI?  What's the magic?

Because that's what real hardware with only a few rare exceptions.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 17:27                                                                   ` Anthony Liguori
@ 2010-08-04 17:37                                                                     ` Avi Kivity
  2010-08-04 17:53                                                                       ` Anthony Liguori
  0 siblings, 1 reply; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 17:37 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gleb Natapov, kvm, Richard W.M. Jones, qemu-devel, Alexander Graf,
	Paolo Bonzini

  On 08/04/2010 08:27 PM, Anthony Liguori wrote:
> On 08/04/2010 12:19 PM, Avi Kivity wrote:
>>  On 08/04/2010 08:01 PM, Paolo Bonzini wrote:
>>>
>>> That's another story and I totally agree here, but not reusing 
>>> /dev/sd* is not intrinsic in the design of virtio-blk (and one thing 
>>> that Windows gets right; everything is SCSI, period).
>>>
>>
>> I don't really get why everything must be SCSI.  Everything must 
>> support read, write, a few other commands, and a large set of 
>> optional commands.  But why map them all to SCSI?  What's the magic?
>
> Because that's what real hardware with only a few rare exceptions.
>

I thought that IDE was emulated as SCSI even when it wasn't.  But I 
guess now with SATA you're right.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 17:37                                                                     ` Avi Kivity
@ 2010-08-04 17:53                                                                       ` Anthony Liguori
  2010-08-04 18:05                                                                         ` Alexander Graf
  0 siblings, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 17:53 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Gleb Natapov, kvm, Richard W.M. Jones, qemu-devel, Alexander Graf,
	Paolo Bonzini

On 08/04/2010 12:37 PM, Avi Kivity wrote:
>  On 08/04/2010 08:27 PM, Anthony Liguori wrote:
>> On 08/04/2010 12:19 PM, Avi Kivity wrote:
>>>  On 08/04/2010 08:01 PM, Paolo Bonzini wrote:
>>>>
>>>> That's another story and I totally agree here, but not reusing 
>>>> /dev/sd* is not intrinsic in the design of virtio-blk (and one 
>>>> thing that Windows gets right; everything is SCSI, period).
>>>>
>>>
>>> I don't really get why everything must be SCSI.  Everything must 
>>> support read, write, a few other commands, and a large set of 
>>> optional commands.  But why map them all to SCSI?  What's the magic?
>>
>> Because that's what real hardware with only a few rare exceptions.
>>
>
> I thought that IDE was emulated as SCSI even when it wasn't.  But I 
> guess now with SATA you're right.

IDE -> EIDE -> ATA -> SATA

ATA can encapsulate SCSI commands via ATAPI which gives you the ability 
to have ATA based CD-ROMs among other things.

I don't believe that SATA actually uses SCSI commands for read/write 
operations but I think Linux exposes SATA drivers as SCSI anyway.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 17:53                                                                       ` Anthony Liguori
@ 2010-08-04 18:05                                                                         ` Alexander Graf
  0 siblings, 0 replies; 151+ messages in thread
From: Alexander Graf @ 2010-08-04 18:05 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity,
	Paolo Bonzini


On 04.08.2010, at 19:53, Anthony Liguori wrote:

> On 08/04/2010 12:37 PM, Avi Kivity wrote:
>> On 08/04/2010 08:27 PM, Anthony Liguori wrote:
>>> On 08/04/2010 12:19 PM, Avi Kivity wrote:
>>>> On 08/04/2010 08:01 PM, Paolo Bonzini wrote:
>>>>> 
>>>>> That's another story and I totally agree here, but not reusing /dev/sd* is not intrinsic in the design of virtio-blk (and one thing that Windows gets right; everything is SCSI, period).
>>>>> 
>>>> 
>>>> I don't really get why everything must be SCSI.  Everything must support read, write, a few other commands, and a large set of optional commands.  But why map them all to SCSI?  What's the magic?
>>> 
>>> Because that's what real hardware with only a few rare exceptions.
>>> 
>> 
>> I thought that IDE was emulated as SCSI even when it wasn't.  But I guess now with SATA you're right.
> 
> IDE -> EIDE -> ATA -> SATA
> 
> ATA can encapsulate SCSI commands via ATAPI which gives you the ability to have ATA based CD-ROMs among other things.
> 
> I don't believe that SATA actually uses SCSI commands for read/write operations

It doesn't. In fact, it's basically just a wrapper around the normal ATA commands - even for read/write. Plus some additional SATA only commands for parallel read/write.

> but I think Linux exposes SATA drivers as SCSI anyway.

Yup. That's what libata does. Even works with PATA drives. But this is a purely Linux internal thing.


Alex

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 20:49                                             ` Anthony Liguori
  2010-08-03 21:13                                               ` Paolo Bonzini
@ 2010-08-04  5:56                                               ` Avi Kivity
  1 sibling, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04  5:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

  On 08/03/2010 11:49 PM, Anthony Liguori wrote:
>
>> We could demand that OSes write device drivers for more qemu devices
>> -- already OS vendors write thousands of device drivers for all sorts
>> of obscure devices, so this isn't really much of a demand for them.
>> In fact, they're already doing it.
>
> So far, MS hasn't quite gotten the clue yet that they should write 
> device drivers for qemu :-) 

To be fair, we haven't actually demanded that they do.

> In fact, noone has.

Strangely, the reverse has happened - I think virtualbox has written 
virtio device models for their VMM.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 20:00                                           ` Richard W.M. Jones
  2010-08-03 20:49                                             ` Anthony Liguori
@ 2010-08-04  1:17                                             ` Jamie Lokier
  1 sibling, 0 replies; 151+ messages in thread
From: Jamie Lokier @ 2010-08-04  1:17 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: kvm, Avi Kivity, Gleb Natapov, qemu-devel

Richard W.M. Jones wrote:
> We could demand that OSes write device drivers for more qemu devices
> -- already OS vendors write thousands of device drivers for all sorts
> of obscure devices, so this isn't really much of a demand for them.
> In fact, they're already doing it.

Result: Most OSes not working with qemu?

Actually we seem to be going that way.  Recent qemus don't work with
older versions of Windows any more, so we have to use different
versions of qemu for different guests.

-- Jamie

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 19:13                                       ` Richard W.M. Jones
                                                           ` (2 preceding siblings ...)
  2010-08-03 19:22                                         ` Avi Kivity
@ 2010-08-04  8:21                                         ` Avi Kivity
  3 siblings, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04  8:21 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Gleb Natapov, qemu-devel, kvm

  On 08/03/2010 10:13 PM, Richard W.M. Jones wrote:
> On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote:
>> libguestfs does not depend on an x86 architectural feature.
>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We
>> should discourage people from depending on this interface for
>> production use.
> I really don't get this whole thing where we must slavishly
> emulate an exact PC ...

An additional point in favour is that we have a method of resolving 
design arguments.  No need to think, we have the spec in front of us.  
The arguments then devolve into interpretation of the spec.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 18:43                                     ` Avi Kivity
                                                         ` (3 preceding siblings ...)
  2010-08-03 19:13                                       ` Richard W.M. Jones
@ 2010-08-04 14:51                                       ` David S. Ahern
  2010-08-04 14:57                                         ` Anthony Liguori
  4 siblings, 1 reply; 151+ messages in thread
From: David S. Ahern @ 2010-08-04 14:51 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, Gleb Natapov, kvm, Richard W.M. Jones



On 08/03/10 12:43, Avi Kivity wrote:
> libguestfs does not depend on an x86 architectural feature. 
> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We should
> discourage people from depending on this interface for production use.

That is a feature of qemu - and an important one to me as well. Why
should it be discouraged? You end up at the same place -- a running
kernel and in-ram filesystem; why require going through a bootloader
just because the hardware case needs it?

David

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 14:51                                       ` David S. Ahern
@ 2010-08-04 14:57                                         ` Anthony Liguori
  2010-08-04 15:25                                           ` Gleb Natapov
  0 siblings, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-04 14:57 UTC (permalink / raw)
  To: David S. Ahern
  Cc: qemu-devel, Gleb Natapov, Avi Kivity, kvm, Richard W.M. Jones

On 08/04/2010 09:51 AM, David S. Ahern wrote:
>
> On 08/03/10 12:43, Avi Kivity wrote:
>    
>> libguestfs does not depend on an x86 architectural feature.
>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We should
>> discourage people from depending on this interface for production use.
>>      
> That is a feature of qemu - and an important one to me as well. Why
> should it be discouraged? You end up at the same place -- a running
> kernel and in-ram filesystem; why require going through a bootloader
> just because the hardware case needs it?
>    

It's smoke and mirrors.  We're still providing a boot loader it's just a 
little tiny one that we've written soley for this purpose.

And it works fine for production use.  The question is whether we ought 
to be aggressively optimizing it for large initrd sizes.  To be honest, 
after a lot of discussion of possibilities, I've come to the conclusion 
that it's just not worth it.

There are better ways like using string I/O and optimizing the PIO path 
in the kernel.  That should cut down the 1s slow down with a 100MB 
initrd by a bit.  But honestly, shaving a couple hundred ms further off 
the initrd load is just not worth it using the current model.

If this is important to someone, we ought to look at refactoring the 
loader completely to be disk based which is a higher performance interface.

Regards,

Anthony Liguori

> David
>    

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 14:57                                         ` Anthony Liguori
@ 2010-08-04 15:25                                           ` Gleb Natapov
  2010-08-04 15:31                                             ` Alexander Graf
  2010-08-04 23:17                                             ` Kevin O'Connor
  0 siblings, 2 replies; 151+ messages in thread
From: Gleb Natapov @ 2010-08-04 15:25 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, David S. Ahern, kvm

On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote:
> On 08/04/2010 09:51 AM, David S. Ahern wrote:
> >
> >On 08/03/10 12:43, Avi Kivity wrote:
> >>libguestfs does not depend on an x86 architectural feature.
> >>qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We should
> >>discourage people from depending on this interface for production use.
> >That is a feature of qemu - and an important one to me as well. Why
> >should it be discouraged? You end up at the same place -- a running
> >kernel and in-ram filesystem; why require going through a bootloader
> >just because the hardware case needs it?
> 
> It's smoke and mirrors.  We're still providing a boot loader it's
> just a little tiny one that we've written soley for this purpose.
> 
> And it works fine for production use.  The question is whether we
> ought to be aggressively optimizing it for large initrd sizes.  To
> be honest, after a lot of discussion of possibilities, I've come to
> the conclusion that it's just not worth it.
> 
> There are better ways like using string I/O and optimizing the PIO
> path in the kernel.  That should cut down the 1s slow down with a
> 100MB initrd by a bit.  But honestly, shaving a couple hundred ms
> further off the initrd load is just not worth it using the current
> model.
> 
The slow down is not 1s any more. String PIO emulation had many bugs
that were fixed in 2.6.35. I verified how much time it took to load 100M
via fw_cfg interface on older kernel and on 2.6.35. On older kernels on
my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations
that was already committed make it 20s. I have some code prototype that
makes it 11s. I don't see how we can get below that, surely not back to
~2-3sec.

> If this is important to someone, we ought to look at refactoring the
> loader completely to be disk based which is a higher performance
> interface.
> 
> Regards,
> 
> Anthony Liguori
> 
> >David
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 15:25                                           ` Gleb Natapov
@ 2010-08-04 15:31                                             ` Alexander Graf
  2010-08-04 15:48                                               ` Gleb Natapov
  2010-08-04 23:17                                             ` Kevin O'Connor
  1 sibling, 1 reply; 151+ messages in thread
From: Alexander Graf @ 2010-08-04 15:31 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, David S. Ahern


On 04.08.2010, at 17:25, Gleb Natapov wrote:

> On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote:
>> On 08/04/2010 09:51 AM, David S. Ahern wrote:
>>> 
>>> On 08/03/10 12:43, Avi Kivity wrote:
>>>> libguestfs does not depend on an x86 architectural feature.
>>>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We should
>>>> discourage people from depending on this interface for production use.
>>> That is a feature of qemu - and an important one to me as well. Why
>>> should it be discouraged? You end up at the same place -- a running
>>> kernel and in-ram filesystem; why require going through a bootloader
>>> just because the hardware case needs it?
>> 
>> It's smoke and mirrors.  We're still providing a boot loader it's
>> just a little tiny one that we've written soley for this purpose.
>> 
>> And it works fine for production use.  The question is whether we
>> ought to be aggressively optimizing it for large initrd sizes.  To
>> be honest, after a lot of discussion of possibilities, I've come to
>> the conclusion that it's just not worth it.
>> 
>> There are better ways like using string I/O and optimizing the PIO
>> path in the kernel.  That should cut down the 1s slow down with a
>> 100MB initrd by a bit.  But honestly, shaving a couple hundred ms
>> further off the initrd load is just not worth it using the current
>> model.
>> 
> The slow down is not 1s any more. String PIO emulation had many bugs
> that were fixed in 2.6.35. I verified how much time it took to load 100M
> via fw_cfg interface on older kernel and on 2.6.35. On older kernels on
> my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations
> that was already committed make it 20s. I have some code prototype that
> makes it 11s. I don't see how we can get below that, surely not back to
> ~2-3sec.

What exactly is the reason for the slowdown? It can't be only boundary and permission checks, right?


Alex

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 15:31                                             ` Alexander Graf
@ 2010-08-04 15:48                                               ` Gleb Natapov
  2010-08-04 15:59                                                 ` Alexander Graf
  0 siblings, 1 reply; 151+ messages in thread
From: Gleb Natapov @ 2010-08-04 15:48 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, David S. Ahern

On Wed, Aug 04, 2010 at 05:31:12PM +0200, Alexander Graf wrote:
> 
> On 04.08.2010, at 17:25, Gleb Natapov wrote:
> 
> > On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote:
> >> On 08/04/2010 09:51 AM, David S. Ahern wrote:
> >>> 
> >>> On 08/03/10 12:43, Avi Kivity wrote:
> >>>> libguestfs does not depend on an x86 architectural feature.
> >>>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We should
> >>>> discourage people from depending on this interface for production use.
> >>> That is a feature of qemu - and an important one to me as well. Why
> >>> should it be discouraged? You end up at the same place -- a running
> >>> kernel and in-ram filesystem; why require going through a bootloader
> >>> just because the hardware case needs it?
> >> 
> >> It's smoke and mirrors.  We're still providing a boot loader it's
> >> just a little tiny one that we've written soley for this purpose.
> >> 
> >> And it works fine for production use.  The question is whether we
> >> ought to be aggressively optimizing it for large initrd sizes.  To
> >> be honest, after a lot of discussion of possibilities, I've come to
> >> the conclusion that it's just not worth it.
> >> 
> >> There are better ways like using string I/O and optimizing the PIO
> >> path in the kernel.  That should cut down the 1s slow down with a
> >> 100MB initrd by a bit.  But honestly, shaving a couple hundred ms
> >> further off the initrd load is just not worth it using the current
> >> model.
> >> 
> > The slow down is not 1s any more. String PIO emulation had many bugs
> > that were fixed in 2.6.35. I verified how much time it took to load 100M
> > via fw_cfg interface on older kernel and on 2.6.35. On older kernels on
> > my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations
> > that was already committed make it 20s. I have some code prototype that
> > makes it 11s. I don't see how we can get below that, surely not back to
> > ~2-3sec.
> 
> What exactly is the reason for the slowdown? It can't be only boundary and permission checks, right?
> 
> 
The big part of slowdown right now is that write into memory is done
for each byte. It means for each byte we call kvm_write_guest() and
kvm_mmu_pte_write(). The second call is needed in case memory, instruction
is trying to write to, is shadowed. Previously we didn't checked for
that at all. This can be mitigated by introducing write cache and do
combined writes into the memory and unshadow the page if there is more
then one write into it. This optimization saves ~10secs. Currently string
emulation enter guest from time to time to check if event injection is
needed and read from userspace is done in 1K chunks, not 4K like it was,
but when I made reads to be 4K and disabled guest reentry I haven't seen
any speed improvements worth talking about.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 15:48                                               ` Gleb Natapov
@ 2010-08-04 15:59                                                 ` Alexander Graf
  2010-08-04 16:08                                                   ` Gleb Natapov
  0 siblings, 1 reply; 151+ messages in thread
From: Alexander Graf @ 2010-08-04 15:59 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, David S. Ahern


On 04.08.2010, at 17:48, Gleb Natapov wrote:

> On Wed, Aug 04, 2010 at 05:31:12PM +0200, Alexander Graf wrote:
>> 
>> On 04.08.2010, at 17:25, Gleb Natapov wrote:
>> 
>>> On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote:
>>>> On 08/04/2010 09:51 AM, David S. Ahern wrote:
>>>>> 
>>>>> On 08/03/10 12:43, Avi Kivity wrote:
>>>>>> libguestfs does not depend on an x86 architectural feature.
>>>>>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We should
>>>>>> discourage people from depending on this interface for production use.
>>>>> That is a feature of qemu - and an important one to me as well. Why
>>>>> should it be discouraged? You end up at the same place -- a running
>>>>> kernel and in-ram filesystem; why require going through a bootloader
>>>>> just because the hardware case needs it?
>>>> 
>>>> It's smoke and mirrors.  We're still providing a boot loader it's
>>>> just a little tiny one that we've written soley for this purpose.
>>>> 
>>>> And it works fine for production use.  The question is whether we
>>>> ought to be aggressively optimizing it for large initrd sizes.  To
>>>> be honest, after a lot of discussion of possibilities, I've come to
>>>> the conclusion that it's just not worth it.
>>>> 
>>>> There are better ways like using string I/O and optimizing the PIO
>>>> path in the kernel.  That should cut down the 1s slow down with a
>>>> 100MB initrd by a bit.  But honestly, shaving a couple hundred ms
>>>> further off the initrd load is just not worth it using the current
>>>> model.
>>>> 
>>> The slow down is not 1s any more. String PIO emulation had many bugs
>>> that were fixed in 2.6.35. I verified how much time it took to load 100M
>>> via fw_cfg interface on older kernel and on 2.6.35. On older kernels on
>>> my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations
>>> that was already committed make it 20s. I have some code prototype that
>>> makes it 11s. I don't see how we can get below that, surely not back to
>>> ~2-3sec.
>> 
>> What exactly is the reason for the slowdown? It can't be only boundary and permission checks, right?
>> 
>> 
> The big part of slowdown right now is that write into memory is done
> for each byte. It means for each byte we call kvm_write_guest() and
> kvm_mmu_pte_write(). The second call is needed in case memory, instruction
> is trying to write to, is shadowed. Previously we didn't checked for
> that at all. This can be mitigated by introducing write cache and do
> combined writes into the memory and unshadow the page if there is more
> then one write into it. This optimization saves ~10secs. Currently string

Ok, so you tackled that bit already.

> emulation enter guest from time to time to check if event injection is
> needed and read from userspace is done in 1K chunks, not 4K like it was,
> but when I made reads to be 4K and disabled guest reentry I haven't seen
> any speed improvements worth talking about.

So what are we wasting those 10 seconds on then? Does perf tell you anything useful?


Alex

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 15:59                                                 ` Alexander Graf
@ 2010-08-04 16:08                                                   ` Gleb Natapov
  2010-08-04 16:48                                                     ` Avi Kivity
  0 siblings, 1 reply; 151+ messages in thread
From: Gleb Natapov @ 2010-08-04 16:08 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, David S. Ahern

On Wed, Aug 04, 2010 at 05:59:40PM +0200, Alexander Graf wrote:
> 
> On 04.08.2010, at 17:48, Gleb Natapov wrote:
> 
> > On Wed, Aug 04, 2010 at 05:31:12PM +0200, Alexander Graf wrote:
> >> 
> >> On 04.08.2010, at 17:25, Gleb Natapov wrote:
> >> 
> >>> On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote:
> >>>> On 08/04/2010 09:51 AM, David S. Ahern wrote:
> >>>>> 
> >>>>> On 08/03/10 12:43, Avi Kivity wrote:
> >>>>>> libguestfs does not depend on an x86 architectural feature.
> >>>>>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel.  We should
> >>>>>> discourage people from depending on this interface for production use.
> >>>>> That is a feature of qemu - and an important one to me as well. Why
> >>>>> should it be discouraged? You end up at the same place -- a running
> >>>>> kernel and in-ram filesystem; why require going through a bootloader
> >>>>> just because the hardware case needs it?
> >>>> 
> >>>> It's smoke and mirrors.  We're still providing a boot loader it's
> >>>> just a little tiny one that we've written soley for this purpose.
> >>>> 
> >>>> And it works fine for production use.  The question is whether we
> >>>> ought to be aggressively optimizing it for large initrd sizes.  To
> >>>> be honest, after a lot of discussion of possibilities, I've come to
> >>>> the conclusion that it's just not worth it.
> >>>> 
> >>>> There are better ways like using string I/O and optimizing the PIO
> >>>> path in the kernel.  That should cut down the 1s slow down with a
> >>>> 100MB initrd by a bit.  But honestly, shaving a couple hundred ms
> >>>> further off the initrd load is just not worth it using the current
> >>>> model.
> >>>> 
> >>> The slow down is not 1s any more. String PIO emulation had many bugs
> >>> that were fixed in 2.6.35. I verified how much time it took to load 100M
> >>> via fw_cfg interface on older kernel and on 2.6.35. On older kernels on
> >>> my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations
> >>> that was already committed make it 20s. I have some code prototype that
> >>> makes it 11s. I don't see how we can get below that, surely not back to
> >>> ~2-3sec.
> >> 
> >> What exactly is the reason for the slowdown? It can't be only boundary and permission checks, right?
> >> 
> >> 
> > The big part of slowdown right now is that write into memory is done
> > for each byte. It means for each byte we call kvm_write_guest() and
> > kvm_mmu_pte_write(). The second call is needed in case memory, instruction
> > is trying to write to, is shadowed. Previously we didn't checked for
> > that at all. This can be mitigated by introducing write cache and do
> > combined writes into the memory and unshadow the page if there is more
> > then one write into it. This optimization saves ~10secs. Currently string
> 
> Ok, so you tackled that bit already.
> 
> > emulation enter guest from time to time to check if event injection is
> > needed and read from userspace is done in 1K chunks, not 4K like it was,
> > but when I made reads to be 4K and disabled guest reentry I haven't seen
> > any speed improvements worth talking about.
> 
> So what are we wasting those 10 seconds on then? Does perf tell you anything useful?
> 
Not 10, but 7-8 seconds.

After applying cache fix nothing definite as far as I remember (I ran it last time
almost 2 week ago, need to rerun). Code always go through emulator now
and check direction flags to update SI/DI accordingly. Emulator is a big
switch and it calls various callbacks that may also slow things down.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 16:08                                                   ` Gleb Natapov
@ 2010-08-04 16:48                                                     ` Avi Kivity
  0 siblings, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-04 16:48 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Alexander Graf, kvm, qemu-devel, Richard W.M. Jones,
	David S. Ahern

  On 08/04/2010 07:08 PM, Gleb Natapov wrote:
>
> After applying cache fix nothing definite as far as I remember (I ran it last time
> almost 2 week ago, need to rerun). Code always go through emulator now
> and check direction flags to update SI/DI accordingly. Emulator is a big
> switch and it calls various callbacks that may also slow things down.
>

We can have it set up a fast path.  Similar to how real hardware 
optimizes 'rep movs' to copy complete cachelines.

The emulator does all the checks, sets up a callback to be called on 
completion or when an interrupt is made pending, and lets x86.c do all 
the work.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 15:25                                           ` Gleb Natapov
  2010-08-04 15:31                                             ` Alexander Graf
@ 2010-08-04 23:17                                             ` Kevin O'Connor
  2010-08-05  5:26                                               ` Gleb Natapov
  1 sibling, 1 reply; 151+ messages in thread
From: Kevin O'Connor @ 2010-08-04 23:17 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, David S. Ahern

On Wed, Aug 04, 2010 at 06:25:52PM +0300, Gleb Natapov wrote:
> On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote:
> > There are better ways like using string I/O and optimizing the PIO
> > path in the kernel.  That should cut down the 1s slow down with a
> > 100MB initrd by a bit.  But honestly, shaving a couple hundred ms
> > further off the initrd load is just not worth it using the current
> > model.
> > 
> The slow down is not 1s any more. String PIO emulation had many bugs
> that were fixed in 2.6.35. I verified how much time it took to load 100M
> via fw_cfg interface on older kernel and on 2.6.35. On older kernels on
> my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations
> that was already committed make it 20s. I have some code prototype that
> makes it 11s. I don't see how we can get below that, surely not back to
> ~2-3sec.

I guess this slowness is primarily for kvm.  I just ran some tests on
the latest qemu (with TCG).  I pulled in a 400Meg file over fw_cfg
using the SeaBIOS interface - it takes 9.8 seconds (pretty
consistently).  Oddly, if I change SeaBIOS to use insb (string pio) it
takes 11.5 seconds (again, pretty consistently).  These times were
measured on the host - they don't include the extra time it takes qemu
to start up (during which it reads the file into its memory).

-Kevin

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-04 23:17                                             ` Kevin O'Connor
@ 2010-08-05  5:26                                               ` Gleb Natapov
  0 siblings, 0 replies; 151+ messages in thread
From: Gleb Natapov @ 2010-08-05  5:26 UTC (permalink / raw)
  To: Kevin O'Connor
  Cc: kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, David S. Ahern

On Wed, Aug 04, 2010 at 07:17:30PM -0400, Kevin O'Connor wrote:
> On Wed, Aug 04, 2010 at 06:25:52PM +0300, Gleb Natapov wrote:
> > On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote:
> > > There are better ways like using string I/O and optimizing the PIO
> > > path in the kernel.  That should cut down the 1s slow down with a
> > > 100MB initrd by a bit.  But honestly, shaving a couple hundred ms
> > > further off the initrd load is just not worth it using the current
> > > model.
> > > 
> > The slow down is not 1s any more. String PIO emulation had many bugs
> > that were fixed in 2.6.35. I verified how much time it took to load 100M
> > via fw_cfg interface on older kernel and on 2.6.35. On older kernels on
> > my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations
> > that was already committed make it 20s. I have some code prototype that
> > makes it 11s. I don't see how we can get below that, surely not back to
> > ~2-3sec.
> 
> I guess this slowness is primarily for kvm.  I just ran some tests on
> the latest qemu (with TCG).  I pulled in a 400Meg file over fw_cfg
> using the SeaBIOS interface - it takes 9.8 seconds (pretty
> consistently).  Oddly, if I change SeaBIOS to use insb (string pio) it
> takes 11.5 seconds (again, pretty consistently).  These times were
> measured on the host - they don't include the extra time it takes qemu
> to start up (during which it reads the file into its memory).
> 
Yes only KVM is affected, nothing has changed in qemu itself.

--
			Gleb.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 16:50                         ` Avi Kivity
  2010-08-03 16:53                           ` Anthony Liguori
@ 2010-08-03 16:56                           ` Anthony Liguori
  1 sibling, 0 replies; 151+ messages in thread
From: Anthony Liguori @ 2010-08-03 16:56 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel

On 08/03/2010 11:50 AM, Avi Kivity wrote:
>  On 08/03/2010 07:46 PM, Anthony Liguori wrote:
>>> It doesn't appear to support live migration, or hiding the feature 
>>> for -M older.
>>>
>>> It's not a good path to follow.  Tomorrow we'll need to load 300MB 
>>> initrds and we'll have to rework this yet again.  Meanwhile the 
>>> kernel and virtio support demand loading of any image size you'd 
>>> want to use.
>>
>>
>> firmware is totally broken with respect to -M older FWIW.
>>
>
> Well, then this is adding to the brokenness.
>
> fwcfg dma is going to have exactly one user, libguestfs.  Much better 
> to have libguestfs move to some other interface and improve are 
> users-to-interfaces ratio.

BTW, the brokenness is that regardless of -M older, we always use the 
newest firmware.  Because always use the newest firmware, fwcfg is not a 
backwards compatible interface.

Migration totally screws this up.  While we migrate roms (and correctly 
now thanks to Alex's patches), we size the allocation based on the 
newest firmware size.  That means if we ever decreased the size of a 
rom, we'd see total failure (even if we had a compatible fwcfg interface).

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 16:44                     ` Avi Kivity
  2010-08-03 16:46                       ` Anthony Liguori
@ 2010-08-03 16:48                       ` Avi Kivity
  2010-08-03 17:00                         ` Richard W.M. Jones
  2010-08-03 16:56                       ` Richard W.M. Jones
  2 siblings, 1 reply; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 16:48 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: qemu-devel, Gleb Natapov, kvm

  On 08/03/2010 07:44 PM, Avi Kivity wrote:
>
> It's not a good path to follow.  Tomorrow we'll need to load 300MB 
> initrds and we'll have to rework this yet again.  Meanwhile the kernel 
> and virtio support demand loading of any image size you'd want to use.
>

Even better would be to use virtio-9p.  You don't even need an image in 
this case.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 16:48                       ` Avi Kivity
@ 2010-08-03 17:00                         ` Richard W.M. Jones
  2010-08-03 17:05                           ` Avi Kivity
  0 siblings, 1 reply; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-03 17:00 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel

On Tue, Aug 03, 2010 at 07:48:17PM +0300, Avi Kivity wrote:
>  On 08/03/2010 07:44 PM, Avi Kivity wrote:
> >
> >It's not a good path to follow.  Tomorrow we'll need to load 300MB
> >initrds and we'll have to rework this yet again.  Meanwhile the
> >kernel and virtio support demand loading of any image size you'd
> >want to use.
> >
> 
> Even better would be to use virtio-9p.  You don't even need an image
> in this case.

We don't want to expose the whole host filesystem, just selected
files, and we want to use our own configuration files (basically
that's what is in the skeleton part that we do ship).

Of course, if we can use virtio-9p, then excellent.  Is there good
documentation about virtio-9p?  What I can find is fragmentary or
based on reading qemu -help ...

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 17:00                         ` Richard W.M. Jones
@ 2010-08-03 17:05                           ` Avi Kivity
  0 siblings, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 17:05 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: qemu-devel

  On 08/03/2010 08:00 PM, Richard W.M. Jones wrote:
> On Tue, Aug 03, 2010 at 07:48:17PM +0300, Avi Kivity wrote:
>>   On 08/03/2010 07:44 PM, Avi Kivity wrote:
>>> It's not a good path to follow.  Tomorrow we'll need to load 300MB
>>> initrds and we'll have to rework this yet again.  Meanwhile the
>>> kernel and virtio support demand loading of any image size you'd
>>> want to use.
>>>
>> Even better would be to use virtio-9p.  You don't even need an image
>> in this case.
> We don't want to expose the whole host filesystem, just selected
> files, and we want to use our own configuration files (basically
> that's what is in the skeleton part that we do ship).

True.  The guest might landmine its disks with something that the 
libguestfs kernel would step on an be exploited.

You might hardlink the needed files into a private directory tree.

> Of course, if we can use virtio-9p, then excellent.  Is there good
> documentation about virtio-9p?  What I can find is fragmentary or
> based on reading qemu -help ...

Not to my knowledge.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 16:44                     ` Avi Kivity
  2010-08-03 16:46                       ` Anthony Liguori
  2010-08-03 16:48                       ` Avi Kivity
@ 2010-08-03 16:56                       ` Richard W.M. Jones
  2010-08-03 17:08                         ` Avi Kivity
  2 siblings, 1 reply; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-03 16:56 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel, Gleb Natapov, kvm

On Tue, Aug 03, 2010 at 07:44:49PM +0300, Avi Kivity wrote:
>  On 08/03/2010 07:28 PM, Richard W.M. Jones wrote:
> >I have posted a small patch which makes this 650x faster without
> >appreciable complication.
> 
> It doesn't appear to support live migration, or hiding the feature
> for -M older.

AFAICT live migration should still work (even assuming someone live
migrates a domain during early boot, which seems pretty unlikely ...)
Maybe you mean live migration of the dma_* global variables?  I can
fix that.

> It's not a good path to follow.  Tomorrow we'll need to load 300MB
> initrds and we'll have to rework this yet again.

Not a very good straw man ...  The patch would take ~300ms instead
of ~115ms, versus something like 2 mins 40 seconds with the current
method.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://et.redhat.com/~rjones/virt-top

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 16:56                       ` Richard W.M. Jones
@ 2010-08-03 17:08                         ` Avi Kivity
  0 siblings, 0 replies; 151+ messages in thread
From: Avi Kivity @ 2010-08-03 17:08 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: qemu-devel, Gleb Natapov, kvm

  On 08/03/2010 07:56 PM, Richard W.M. Jones wrote:
> On Tue, Aug 03, 2010 at 07:44:49PM +0300, Avi Kivity wrote:
>>   On 08/03/2010 07:28 PM, Richard W.M. Jones wrote:
>>> I have posted a small patch which makes this 650x faster without
>>> appreciable complication.
>> It doesn't appear to support live migration, or hiding the feature
>> for -M older.
> AFAICT live migration should still work (even assuming someone live
> migrates a domain during early boot, which seems pretty unlikely ...)

Live migration is sometimes performed automatically by management tools, 
which have no idea (nor do they care) what the guest is doing.

> Maybe you mean live migration of the dma_* global variables?  I can
> fix that.

Yes.

>> It's not a good path to follow.  Tomorrow we'll need to load 300MB
>> initrds and we'll have to rework this yet again.
> Not a very good straw man ...  The patch would take ~300ms instead
> of ~115ms, versus something like 2 mins 40 seconds with the current
> method.
>

It's still 300ms extra time, with a 900MB footprint.

btw, a DMA interface which blocks the guest and/or qemu for 115ms is not 
something we want to introduce to qemu.  dma is hard, doing something 
simple means it won't work very well.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 14:53               ` Richard W.M. Jones
  2010-08-03 16:10                 ` Avi Kivity
@ 2010-08-03 16:39                 ` Anthony Liguori
  2010-08-03 16:43                   ` Richard W.M. Jones
  1 sibling, 1 reply; 151+ messages in thread
From: Anthony Liguori @ 2010-08-03 16:39 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: kvm, Avi Kivity, Gleb Natapov, qemu-devel

On 08/03/2010 09:53 AM, Richard W.M. Jones wrote:
> On Tue, Aug 03, 2010 at 05:38:25PM +0300, Avi Kivity wrote:
>    
>> The time will only continue to grow as you add features and as the
>> distro bloats naturally.
>>
>> Much better to create it once and only update it if some dependent
>> file changes (basically the current on-the-fly code + save a list of
>> file timestamps).
>>      
> This applies to both cases, the initrd could also be saved, so:
>
>    
>>> Total saving: 115ms.
>>>        
>> 815 ms by my arithmetic.
>>      
> no, not true, 115ms.
>
>    
>> You also save 3*N-2*P memory where N is the size of your initrd and
>> P is the actual amount used by the guest.
>>      
> Can you explain this?
>
>    
>> Loading a file into memory is plenty fast if you use the standard
>> interfaces.  -kernel -initrd is a specialized interface.
>>      
> Why bother with any command line options at all?  After all, they keep
> changing and causing problems for qemu's users ...  Apparently we're
> all doing stuff "wrong", in ways that are never explained by the
> developers.
>    

Let's be fair.  I think we've all agreed to adjust the fw_cfg interface 
to implement DMA.  The only requirement was that the DMA operation not 
be triggered from a single port I/O but rather based on a polling 
operation which better fits the way real hardware works.

Is this a regression?  Probably.  But performance regressions that 
result from correctness fixes don't get reverted.  We have to find an 
approach to improve performance without impacting correctness.

That said, the general view of -kernel/-append is that these are 
developer options and we don't really look at it as a performance 
critical interface.  We could do a better job of communicating this to 
users but that's true of most of the features we support.

Regards,

Anthony Liguori

> Rich.
>
>    

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?
  2010-08-03 16:39                 ` Anthony Liguori
@ 2010-08-03 16:43                   ` Richard W.M. Jones
  0 siblings, 0 replies; 151+ messages in thread
From: Richard W.M. Jones @ 2010-08-03 16:43 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

On Tue, Aug 03, 2010 at 11:39:43AM -0500, Anthony Liguori wrote:
> Let's be fair.  I think we've all agreed to adjust the fw_cfg
> interface to implement DMA.  The only requirement was that the DMA
> operation not be triggered from a single port I/O but rather based
> on a polling operation which better fits the way real hardware
> works.

The patch I posted requires that the caller poll a register, so
hopefully this requirement is satisfied.

The other requirement was that the interface be discoverable, which is
also something in the latest version of the patch that I just posted.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v

^ permalink raw reply	[flat|nested] 151+ messages in thread

end of thread, other threads:[~2010-08-05 15:34 UTC | newest]

Thread overview: 151+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-03 11:13 [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? Richard W.M. Jones
2010-08-03 11:33 ` Gleb Natapov
2010-08-03 12:10   ` Richard W.M. Jones
2010-08-03 12:37     ` Gleb Natapov
2010-08-03 12:48       ` Richard W.M. Jones
2010-08-03 13:19         ` Avi Kivity
2010-08-03 14:05           ` Richard W.M. Jones
2010-08-03 14:38             ` Avi Kivity
2010-08-03 14:53               ` Richard W.M. Jones
2010-08-03 16:10                 ` Avi Kivity
2010-08-03 16:28                   ` Richard W.M. Jones
2010-08-03 16:44                     ` Avi Kivity
2010-08-03 16:46                       ` Anthony Liguori
2010-08-03 16:50                         ` Avi Kivity
2010-08-03 16:53                           ` Anthony Liguori
2010-08-03 17:01                             ` Avi Kivity
2010-08-03 17:42                               ` Anthony Liguori
2010-08-03 17:58                                 ` Avi Kivity
2010-08-03 18:11                                   ` Richard W.M. Jones
2010-08-03 18:26                                   ` Anthony Liguori
2010-08-03 18:43                                     ` Avi Kivity
2010-08-03 18:47                                       ` Avi Kivity
2010-08-03 18:55                                       ` Anthony Liguori
2010-08-03 19:00                                         ` Avi Kivity
2010-08-03 19:05                                       ` Gleb Natapov
2010-08-03 19:09                                         ` Avi Kivity
2010-08-03 19:15                                         ` Anthony Liguori
2010-08-03 19:24                                           ` Avi Kivity
2010-08-03 19:38                                             ` Anthony Liguori
2010-08-03 19:41                                               ` Avi Kivity
2010-08-03 19:47                                                 ` Anthony Liguori
2010-08-04  5:47                                                   ` Avi Kivity
2010-08-03 21:24                                                 ` Gerd Hoffmann
2010-08-03 21:20                                             ` Gerd Hoffmann
2010-08-04  5:53                                               ` Avi Kivity
2010-08-04  7:56                                                 ` Gerd Hoffmann
2010-08-04  8:17                                                   ` Avi Kivity
2010-08-04  8:43                                                     ` Gleb Natapov
2010-08-04  9:22                                                     ` Gerd Hoffmann
2010-08-04 13:04                                                     ` Anthony Liguori
2010-08-04 13:07                                                       ` Gleb Natapov
2010-08-04 13:15                                                         ` Anthony Liguori
2010-08-04 13:24                                                           ` Richard W.M. Jones
2010-08-04 13:26                                                             ` Gleb Natapov
2010-08-04 14:22                                                               ` Anthony Liguori
2010-08-04 14:38                                                                 ` Gleb Natapov
2010-08-04 14:50                                                                   ` Anthony Liguori
2010-08-04 15:01                                                                     ` Gleb Natapov
2010-08-04 15:07                                                                       ` Anthony Liguori
2010-08-04 15:15                                                                         ` Gleb Natapov
2010-08-04 22:41                                                                       ` Kevin O'Connor
2010-08-04 16:26                                                             ` Avi Kivity
2010-08-04 13:34                                                           ` Gleb Natapov
2010-08-04 13:52                                                             ` Anthony Liguori
2010-08-04 14:00                                                               ` Gleb Natapov
2010-08-04 14:14                                                                 ` Anthony Liguori
2010-08-04 14:36                                                                   ` Gleb Natapov
2010-08-04 14:22                                                                 ` Paolo Bonzini
2010-08-04 14:39                                                                   ` Anthony Liguori
2010-08-04 16:33                                                                     ` Avi Kivity
2010-08-04 16:30                                                               ` Avi Kivity
2010-08-04 16:36                                                                 ` Avi Kivity
2010-08-04 16:44                                                                   ` Anthony Liguori
2010-08-04 16:52                                                                     ` Avi Kivity
2010-08-04 17:37                                                                     ` Gleb Natapov
2010-08-05  7:28                                                                     ` Gerd Hoffmann
2010-08-05  7:34                                                                       ` Gleb Natapov
2010-08-05  7:56                                                                         ` Avi Kivity
2010-08-05  7:59                                                                           ` Gleb Natapov
2010-08-05  8:45                                                                             ` Avi Kivity
2010-08-05  8:48                                                                               ` Gleb Natapov
2010-08-05 13:43                                                                       ` Anthony Liguori
2010-08-04 16:45                                                                   ` Alexander Graf
2010-08-04 16:54                                                                     ` Avi Kivity
2010-08-04 17:01                                                                       ` Alexander Graf
2010-08-04 17:14                                                                         ` Avi Kivity
2010-08-04 17:27                                                                           ` Alexander Graf
2010-08-04 17:34                                                                             ` Avi Kivity
2010-08-04 20:06                                                                               ` David S. Ahern
2010-08-04 20:16                                                                                 ` Richard W.M. Jones
2010-08-05  2:38                                                                                 ` Avi Kivity
2010-08-04 17:26                                                                     ` Anthony Liguori
2010-08-04 17:31                                                                       ` Alexander Graf
2010-08-04 17:35                                                                         ` Avi Kivity
2010-08-04 17:36                                                                         ` Anthony Liguori
2010-08-04 17:36                                                                           ` Alexander Graf
2010-08-04 17:46                                                                   ` Richard W.M. Jones
2010-08-04 17:50                                                                     ` Avi Kivity
2010-08-04 18:13                                                                     ` Alexander Graf
2010-08-04 18:16                                                                       ` Anthony Liguori
2010-08-04 18:18                                                                         ` Alexander Graf
2010-08-04 18:19                                                                         ` Avi Kivity
2010-08-04 18:18                                                                       ` Avi Kivity
2010-08-04 16:42                                                                 ` Anthony Liguori
2010-08-04 13:22                                                         ` Richard W.M. Jones
2010-08-04 13:29                                                           ` Gleb Natapov
2010-08-04 16:25                                                       ` Avi Kivity
2010-08-03 22:06                                             ` Richard W.M. Jones
2010-08-04  5:54                                               ` Avi Kivity
2010-08-04  9:24                                                 ` Richard W.M. Jones
2010-08-04  9:27                                                   ` Gleb Natapov
2010-08-04  9:52                                                   ` Avi Kivity
2010-08-04 11:33                                                     ` Richard W.M. Jones
2010-08-04 11:36                                                       ` Avi Kivity
2010-08-04 12:07                                                       ` Gleb Natapov
2010-08-04 12:59                                                   ` Anthony Liguori
2010-08-03 19:26                                           ` Gleb Natapov
2010-08-03 19:13                                       ` Richard W.M. Jones
2010-08-03 19:17                                         ` Gleb Natapov
2010-08-03 19:19                                         ` Anthony Liguori
2010-08-03 19:22                                         ` Avi Kivity
2010-08-03 20:00                                           ` Richard W.M. Jones
2010-08-03 20:49                                             ` Anthony Liguori
2010-08-03 21:13                                               ` Paolo Bonzini
2010-08-03 21:34                                                 ` Anthony Liguori
2010-08-04  7:57                                                   ` Paolo Bonzini
2010-08-04  8:19                                                     ` Avi Kivity
2010-08-04 12:53                                                     ` Anthony Liguori
2010-08-04 16:44                                                       ` Avi Kivity
2010-08-04 16:46                                                         ` Anthony Liguori
2010-08-04 16:48                                                           ` Alexander Graf
2010-08-04 16:49                                                             ` Anthony Liguori
2010-08-04 16:51                                                               ` Alexander Graf
2010-08-04 17:01                                                               ` Paolo Bonzini
2010-08-04 17:19                                                                 ` Avi Kivity
2010-08-04 17:25                                                                   ` Alexander Graf
2010-08-04 17:27                                                                   ` Anthony Liguori
2010-08-04 17:37                                                                     ` Avi Kivity
2010-08-04 17:53                                                                       ` Anthony Liguori
2010-08-04 18:05                                                                         ` Alexander Graf
2010-08-04  5:56                                               ` Avi Kivity
2010-08-04  1:17                                             ` Jamie Lokier
2010-08-04  8:21                                         ` Avi Kivity
2010-08-04 14:51                                       ` David S. Ahern
2010-08-04 14:57                                         ` Anthony Liguori
2010-08-04 15:25                                           ` Gleb Natapov
2010-08-04 15:31                                             ` Alexander Graf
2010-08-04 15:48                                               ` Gleb Natapov
2010-08-04 15:59                                                 ` Alexander Graf
2010-08-04 16:08                                                   ` Gleb Natapov
2010-08-04 16:48                                                     ` Avi Kivity
2010-08-04 23:17                                             ` Kevin O'Connor
2010-08-05  5:26                                               ` Gleb Natapov
2010-08-03 16:56                           ` Anthony Liguori
2010-08-03 16:48                       ` Avi Kivity
2010-08-03 17:00                         ` Richard W.M. Jones
2010-08-03 17:05                           ` Avi Kivity
2010-08-03 16:56                       ` Richard W.M. Jones
2010-08-03 17:08                         ` Avi Kivity
2010-08-03 16:39                 ` Anthony Liguori
2010-08-03 16:43                   ` Richard W.M. Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).