* [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? @ 2010-08-03 11:13 Richard W.M. Jones 2010-08-03 11:33 ` Gleb Natapov 0 siblings, 1 reply; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-03 11:13 UTC (permalink / raw) To: qemu-devel qemu compiled from today's git. Using the following command line: $qemudir/x86_64-softmmu/qemu-system-x86_64 -L $qemudir/pc-bios \ -drive file=/dev/null,if=virtio \ -enable-kvm \ -nodefaults \ -nographic \ -serial stdio \ -m 500 \ -no-reboot \ -no-hpet \ -net user,vlan=0,net=169.254.0.0/16 \ -net nic,model=ne2k_pci,vlan=0 \ -kernel /tmp/libguestfsEyAMut/kernel \ -initrd /tmp/libguestfsEyAMut/initrd \ -append 'panic=1 console=ttyS0 udevtimeout=300 noapic acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_vmchannel=tcp:169.254.2.2:35007 guestfs_verbose=1 TERM=xterm-color ' With kernel 2.6.35 [*], this takes about 1 min 20 s before the guest starts. If I revert back to kernel 2.6.34, it's pretty quick as usual. strace is not very informative. It's in a loop doing select and reading/writing from some file descriptors, including the signalfd and two pipe fds. Anyone seen anything like this? Rich. [*] This Fedora kernel: http://koji.fedoraproject.org/koji/buildinfo?buildID=187085 -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://et.redhat.com/~rjones/libguestfs/ See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 11:13 [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? Richard W.M. Jones @ 2010-08-03 11:33 ` Gleb Natapov 2010-08-03 12:10 ` Richard W.M. Jones 0 siblings, 1 reply; 151+ messages in thread From: Gleb Natapov @ 2010-08-03 11:33 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: qemu-devel On Tue, Aug 03, 2010 at 12:13:06PM +0100, Richard W.M. Jones wrote: > > qemu compiled from today's git. Using the following command line: > > $qemudir/x86_64-softmmu/qemu-system-x86_64 -L $qemudir/pc-bios \ > -drive file=/dev/null,if=virtio \ > -enable-kvm \ > -nodefaults \ > -nographic \ > -serial stdio \ > -m 500 \ > -no-reboot \ > -no-hpet \ > -net user,vlan=0,net=169.254.0.0/16 \ > -net nic,model=ne2k_pci,vlan=0 \ > -kernel /tmp/libguestfsEyAMut/kernel \ > -initrd /tmp/libguestfsEyAMut/initrd \ > -append 'panic=1 console=ttyS0 udevtimeout=300 noapic acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_vmchannel=tcp:169.254.2.2:35007 guestfs_verbose=1 TERM=xterm-color ' > > With kernel 2.6.35 [*], this takes about 1 min 20 s before the guest > starts. > > If I revert back to kernel 2.6.34, it's pretty quick as usual. > > strace is not very informative. It's in a loop doing select and > reading/writing from some file descriptors, including the signalfd and > two pipe fds. > > Anyone seen anything like this? > I assume your initrd is huge. In newer kernels ins/outs are much slower that they were. They are much more correct too. It shouldn't be 1 min 20 sec for 100M initrd though, but it can take 20-30 sec. This belongs to kvm list BTW. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 11:33 ` Gleb Natapov @ 2010-08-03 12:10 ` Richard W.M. Jones 2010-08-03 12:37 ` Gleb Natapov 0 siblings, 1 reply; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-03 12:10 UTC (permalink / raw) To: Gleb Natapov; +Cc: qemu-devel, kvm On Tue, Aug 03, 2010 at 02:33:02PM +0300, Gleb Natapov wrote: > On Tue, Aug 03, 2010 at 12:13:06PM +0100, Richard W.M. Jones wrote: > > > > qemu compiled from today's git. Using the following command line: > > > > $qemudir/x86_64-softmmu/qemu-system-x86_64 -L $qemudir/pc-bios \ > > -drive file=/dev/null,if=virtio \ > > -enable-kvm \ > > -nodefaults \ > > -nographic \ > > -serial stdio \ > > -m 500 \ > > -no-reboot \ > > -no-hpet \ > > -net user,vlan=0,net=169.254.0.0/16 \ > > -net nic,model=ne2k_pci,vlan=0 \ > > -kernel /tmp/libguestfsEyAMut/kernel \ > > -initrd /tmp/libguestfsEyAMut/initrd \ > > -append 'panic=1 console=ttyS0 udevtimeout=300 noapic acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_vmchannel=tcp:169.254.2.2:35007 guestfs_verbose=1 TERM=xterm-color ' > > > > With kernel 2.6.35 [*], this takes about 1 min 20 s before the guest > > starts. > > > > If I revert back to kernel 2.6.34, it's pretty quick as usual. > > > > strace is not very informative. It's in a loop doing select and > > reading/writing from some file descriptors, including the signalfd and > > two pipe fds. > > > > Anyone seen anything like this? > > > I assume your initrd is huge. It's ~110MB, yes. > In newer kernels ins/outs are much slower that they were. They are > much more correct too. It shouldn't be 1 min 20 sec for 100M initrd > though, but it can take 20-30 sec. This belongs to kvm list BTW. I can't see anything about this in the kernel changelog. Can you point me to the commit or the key phrase to look for? Also, what's the point of making in/out "more correct" when they we know we're talking to qemu (eg. from the CPUID) and we know it already worked fine before with qemu? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 12:10 ` Richard W.M. Jones @ 2010-08-03 12:37 ` Gleb Natapov 2010-08-03 12:48 ` Richard W.M. Jones 0 siblings, 1 reply; 151+ messages in thread From: Gleb Natapov @ 2010-08-03 12:37 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: qemu-devel, kvm On Tue, Aug 03, 2010 at 01:10:00PM +0100, Richard W.M. Jones wrote: > On Tue, Aug 03, 2010 at 02:33:02PM +0300, Gleb Natapov wrote: > > On Tue, Aug 03, 2010 at 12:13:06PM +0100, Richard W.M. Jones wrote: > > > > > > qemu compiled from today's git. Using the following command line: > > > > > > $qemudir/x86_64-softmmu/qemu-system-x86_64 -L $qemudir/pc-bios \ > > > -drive file=/dev/null,if=virtio \ > > > -enable-kvm \ > > > -nodefaults \ > > > -nographic \ > > > -serial stdio \ > > > -m 500 \ > > > -no-reboot \ > > > -no-hpet \ > > > -net user,vlan=0,net=169.254.0.0/16 \ > > > -net nic,model=ne2k_pci,vlan=0 \ > > > -kernel /tmp/libguestfsEyAMut/kernel \ > > > -initrd /tmp/libguestfsEyAMut/initrd \ > > > -append 'panic=1 console=ttyS0 udevtimeout=300 noapic acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_vmchannel=tcp:169.254.2.2:35007 guestfs_verbose=1 TERM=xterm-color ' > > > > > > With kernel 2.6.35 [*], this takes about 1 min 20 s before the guest > > > starts. > > > > > > If I revert back to kernel 2.6.34, it's pretty quick as usual. > > > > > > strace is not very informative. It's in a loop doing select and > > > reading/writing from some file descriptors, including the signalfd and > > > two pipe fds. > > > > > > Anyone seen anything like this? > > > > > I assume your initrd is huge. > > It's ~110MB, yes. > > > In newer kernels ins/outs are much slower that they were. They are > > much more correct too. It shouldn't be 1 min 20 sec for 100M initrd > > though, but it can take 20-30 sec. This belongs to kvm list BTW. > > I can't see anything about this in the kernel changelog. Can you > point me to the commit or the key phrase to look for? > 7972995b0c346de76 > Also, what's the point of making in/out "more correct" when they we > know we're talking to qemu (eg. from the CPUID) and we know it already > worked fine before with qemu? > Qemu has nothing to do with that. ins/outs didn't worked correctly for some situation. They didn't work at all if destination/source memory was MMIO (didn't work as in hang vcpu IIRC and this is security risk). Direction flag wasn't handled at all (if it was set instruction injected #GP into a gust). It didn't check that memory it writes to is shadowed in which case special action should be taken. It didn't delivered events during long string operations. May be more. Unfortunately adding all that makes emulation much slower. I already implemented some speedups, and more is possible, but we will not be able to get to previous string io speed which was our upper limit. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 12:37 ` Gleb Natapov @ 2010-08-03 12:48 ` Richard W.M. Jones 2010-08-03 13:19 ` Avi Kivity 0 siblings, 1 reply; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-03 12:48 UTC (permalink / raw) To: Gleb Natapov; +Cc: qemu-devel, kvm On Tue, Aug 03, 2010 at 03:37:14PM +0300, Gleb Natapov wrote: > On Tue, Aug 03, 2010 at 01:10:00PM +0100, Richard W.M. Jones wrote: > > I can't see anything about this in the kernel changelog. Can you > > point me to the commit or the key phrase to look for? > > > 7972995b0c346de76 Thanks - I see. > > Also, what's the point of making in/out "more correct" when they we > > know we're talking to qemu (eg. from the CPUID) and we know it already > > worked fine before with qemu? > > > Qemu has nothing to do with that. ins/outs didn't worked correctly for > some situation. They didn't work at all if destination/source memory > was MMIO (didn't work as in hang vcpu IIRC and this is security risk). > Direction flag wasn't handled at all (if it was set instruction injected > #GP into a gust). It didn't check that memory it writes to is shadowed > in which case special action should be taken. It didn't delivered events > during long string operations. May be more. Unfortunately adding all that > makes emulation much slower. I already implemented some speedups, and > more is possible, but we will not be able to get to previous string io speed > which was our upper limit. Thanks for the explanation. I'll repost my "DMA"-like fw-cfg patch once I've rebased it and done some more testing. This huge regression for a common operation (implementing -initrd) needs to be solved without using inb/rep ins. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#) http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 12:48 ` Richard W.M. Jones @ 2010-08-03 13:19 ` Avi Kivity 2010-08-03 14:05 ` Richard W.M. Jones 0 siblings, 1 reply; 151+ messages in thread From: Avi Kivity @ 2010-08-03 13:19 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: qemu-devel, Gleb Natapov, kvm On 08/03/2010 03:48 PM, Richard W.M. Jones wrote: > > Thanks for the explanation. I'll repost my "DMA"-like fw-cfg patch > once I've rebased it and done some more testing. This huge regression > for a common operation (implementing -initrd) needs to be solved > without using inb/rep ins. Adding more interfaces is easy but a problem in the long term. We'll optimize it as much as we can. Meanwhile, why are you loading huge initrds? Use a cdrom instead (it will also be faster since the guest doesn't need to unpack it). -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 13:19 ` Avi Kivity @ 2010-08-03 14:05 ` Richard W.M. Jones 2010-08-03 14:38 ` Avi Kivity 0 siblings, 1 reply; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-03 14:05 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, Gleb Natapov, kvm On Tue, Aug 03, 2010 at 04:19:39PM +0300, Avi Kivity wrote: > On 08/03/2010 03:48 PM, Richard W.M. Jones wrote: > > > >Thanks for the explanation. I'll repost my "DMA"-like fw-cfg patch > >once I've rebased it and done some more testing. This huge regression > >for a common operation (implementing -initrd) needs to be solved > >without using inb/rep ins. > > Adding more interfaces is easy but a problem in the long term. > We'll optimize it as much as we can. Meanwhile, why are you loading > huge initrds? Use a cdrom instead (it will also be faster since the > guest doesn't need to unpack it). Because it involves rewriting the entire appliance building process, and we don't necessarily know if it'll be faster after we've done that. Look: currently we create the initrd on the fly in 700ms. We've no reason to believe that creating a CD-ROM on the fly wouldn't take around the same time. After all, both processes involve reading all the host files from disk and writing a temporary file. You have to create these things on the fly, because we don't actually ship an appliance to end users, just a tiny (< 1 MB) skeleton. You can't ship a massive statically linked appliance to end users because it's just unmanageable (think: security; updates; bandwidth). Loading the initrd currently takes 115ms (or could do, if a sensible 50 line patch was permitted). So the only possible saving would be the 115ms load time of the initrd. In theory the CD-ROM device could be detected in 0 time. Total saving: 115ms. But will it be any faster, since after spending 115ms, everything runs from memory, versus being loaded from the CD? Let's face the fact that qemu has suffered from an enormous regression. From some hundreds of milliseconds up to over a minute, in the space of 6 months of development. For a very simple operation: loading a file into memory. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://et.redhat.com/~rjones/virt-top ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 14:05 ` Richard W.M. Jones @ 2010-08-03 14:38 ` Avi Kivity 2010-08-03 14:53 ` Richard W.M. Jones 0 siblings, 1 reply; 151+ messages in thread From: Avi Kivity @ 2010-08-03 14:38 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: qemu-devel, Gleb Natapov, kvm On 08/03/2010 05:05 PM, Richard W.M. Jones wrote: > On Tue, Aug 03, 2010 at 04:19:39PM +0300, Avi Kivity wrote: >> On 08/03/2010 03:48 PM, Richard W.M. Jones wrote: >>> Thanks for the explanation. I'll repost my "DMA"-like fw-cfg patch >>> once I've rebased it and done some more testing. This huge regression >>> for a common operation (implementing -initrd) needs to be solved >>> without using inb/rep ins. >> Adding more interfaces is easy but a problem in the long term. >> We'll optimize it as much as we can. Meanwhile, why are you loading >> huge initrds? Use a cdrom instead (it will also be faster since the >> guest doesn't need to unpack it). > Because it involves rewriting the entire appliance building process, > and we don't necessarily know if it'll be faster after we've done > that. > > Look: currently we create the initrd on the fly in 700ms. We've no > reason to believe that creating a CD-ROM on the fly wouldn't take > around the same time. After all, both processes involve reading all > the host files from disk and writing a temporary file. The time will only continue to grow as you add features and as the distro bloats naturally. Much better to create it once and only update it if some dependent file changes (basically the current on-the-fly code + save a list of file timestamps). Alternatively, pass through the host filesystem. > You have to create these things on the fly, because we don't actually > ship an appliance to end users, just a tiny (< 1 MB) skeleton. You > can't ship a massive statically linked appliance to end users because > it's just unmanageable (think: security; updates; bandwidth). Shipping it is indeed out of the question. But on-the-fly creation is not the only alternative. > Loading the initrd currently takes 115ms (or could do, if a sensible > 50 line patch was permitted). > > So the only possible saving would be the 115ms load time of the > initrd. In theory the CD-ROM device could be detected in 0 time. > > Total saving: 115ms. 815 ms by my arithmetic. You also save 3*N-2*P memory where N is the size of your initrd and P is the actual amount used by the guest. > But will it be any faster, since after spending 115ms, everything runs > from memory, versus being loaded from the CD? > > Let's face the fact that qemu has suffered from an enormous > regression. From some hundreds of milliseconds up to over a minute, > in the space of 6 months of development. It wasn't qemu, but kvm. And it didn't take six months, just a few commits. Those aren't going back, they're a lot more important than some libguestfs problem which shouldn't have been coded differently in the first place. > For a very simple operation: > loading a file into memory. Loading a file into memory is plenty fast if you use the standard interfaces. -kernel -initrd is a specialized interface. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 14:38 ` Avi Kivity @ 2010-08-03 14:53 ` Richard W.M. Jones 2010-08-03 16:10 ` Avi Kivity 2010-08-03 16:39 ` Anthony Liguori 0 siblings, 2 replies; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-03 14:53 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, Gleb Natapov, kvm On Tue, Aug 03, 2010 at 05:38:25PM +0300, Avi Kivity wrote: > The time will only continue to grow as you add features and as the > distro bloats naturally. > > Much better to create it once and only update it if some dependent > file changes (basically the current on-the-fly code + save a list of > file timestamps). This applies to both cases, the initrd could also be saved, so: > >Total saving: 115ms. > > 815 ms by my arithmetic. no, not true, 115ms. > You also save 3*N-2*P memory where N is the size of your initrd and > P is the actual amount used by the guest. Can you explain this? > Loading a file into memory is plenty fast if you use the standard > interfaces. -kernel -initrd is a specialized interface. Why bother with any command line options at all? After all, they keep changing and causing problems for qemu's users ... Apparently we're all doing stuff "wrong", in ways that are never explained by the developers. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#) http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 14:53 ` Richard W.M. Jones @ 2010-08-03 16:10 ` Avi Kivity 2010-08-03 16:28 ` Richard W.M. Jones 2010-08-03 16:39 ` Anthony Liguori 1 sibling, 1 reply; 151+ messages in thread From: Avi Kivity @ 2010-08-03 16:10 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: qemu-devel, Gleb Natapov, kvm On 08/03/2010 05:53 PM, Richard W.M. Jones wrote: > >>> Total saving: 115ms. >> 815 ms by my arithmetic. > no, not true, 115ms. If you bypass creating the initrd/cdrom (700 ms) and loading it (115ms) you save 815ms. >> You also save 3*N-2*P memory where N is the size of your initrd and >> P is the actual amount used by the guest. > Can you explain this? (assuming ahead-of-time image generation) initrd: qemu reads image (host pagecache): N qemu stores image in RAM: N guest copies image to its RAM: N guest faults working set (no XIP): P total: 3N+P initramfs: qemu reads image (host pagecache): N qemu stores image: N guest copies image: N guest extracts image (XIP): N total: 4N cdrom: guest faults working set: P kernel faults working set: P total: 2P difference: 3N-P or 4N-2P depending on model >> Loading a file into memory is plenty fast if you use the standard >> interfaces. -kernel -initrd is a specialized interface. > Why bother with any command line options at all? After all, they keep > changing and causing problems for qemu's users ... Apparently we're > all doing stuff "wrong", in ways that are never explained by the > developers. That's a real problem. It's hard to explain the intent behind something, especially when it's obvious to the author and not so obvious to the user. However making everything do everything under all circumstances has its costs. -kernel and -initrd is a developer's interface intended to make life easier for users that use qemu to develop kernels. It was not intended as a high performance DMA engine. Neither was the firmware _configuration_ interface. That is what virtio and to a lesser extent IDE was written to perform. You'll get much better results from them. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 16:10 ` Avi Kivity @ 2010-08-03 16:28 ` Richard W.M. Jones 2010-08-03 16:44 ` Avi Kivity 0 siblings, 1 reply; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-03 16:28 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, Gleb Natapov, kvm On Tue, Aug 03, 2010 at 07:10:18PM +0300, Avi Kivity wrote: > -kernel and -initrd is a developer's interface intended to make life > easier for users that use qemu to develop kernels. It was not > intended as a high performance DMA engine. Neither was the firmware > _configuration_ interface. That is what virtio and to a lesser > extent IDE was written to perform. You'll get much better results > from them. Firmware configuration replaced something which was already working really fast -- preloading the images into memory -- with something which worked slower, and has just recently got _way_ more slow. This is a regression. Plain and simple. I have posted a small patch which makes this 650x faster without appreciable complication. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://et.redhat.com/~rjones/virt-top ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 16:28 ` Richard W.M. Jones @ 2010-08-03 16:44 ` Avi Kivity 2010-08-03 16:46 ` Anthony Liguori ` (2 more replies) 0 siblings, 3 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-03 16:44 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: qemu-devel, Gleb Natapov, kvm On 08/03/2010 07:28 PM, Richard W.M. Jones wrote: > On Tue, Aug 03, 2010 at 07:10:18PM +0300, Avi Kivity wrote: >> -kernel and -initrd is a developer's interface intended to make life >> easier for users that use qemu to develop kernels. It was not >> intended as a high performance DMA engine. Neither was the firmware >> _configuration_ interface. That is what virtio and to a lesser >> extent IDE was written to perform. You'll get much better results >> from them. > Firmware configuration replaced something which was already working > really fast -- preloading the images into memory -- with something > which worked slower, and has just recently got _way_ more slow. > > This is a regression. Plain and simple. It's only a regression if there was any intent at making this a performant interface. Otherwise any change an be interpreted as a regression. Even "binary doesn't hash to exact same signature" is a regression. > I have posted a small patch which makes this 650x faster without > appreciable complication. It doesn't appear to support live migration, or hiding the feature for -M older. It's not a good path to follow. Tomorrow we'll need to load 300MB initrds and we'll have to rework this yet again. Meanwhile the kernel and virtio support demand loading of any image size you'd want to use. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 16:44 ` Avi Kivity @ 2010-08-03 16:46 ` Anthony Liguori 2010-08-03 16:50 ` Avi Kivity 2010-08-03 16:48 ` Avi Kivity 2010-08-03 16:56 ` Richard W.M. Jones 2 siblings, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-03 16:46 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 11:44 AM, Avi Kivity wrote: > On 08/03/2010 07:28 PM, Richard W.M. Jones wrote: >> On Tue, Aug 03, 2010 at 07:10:18PM +0300, Avi Kivity wrote: >>> -kernel and -initrd is a developer's interface intended to make life >>> easier for users that use qemu to develop kernels. It was not >>> intended as a high performance DMA engine. Neither was the firmware >>> _configuration_ interface. That is what virtio and to a lesser >>> extent IDE was written to perform. You'll get much better results >>> from them. >> Firmware configuration replaced something which was already working >> really fast -- preloading the images into memory -- with something >> which worked slower, and has just recently got _way_ more slow. >> >> This is a regression. Plain and simple. > > It's only a regression if there was any intent at making this a > performant interface. Otherwise any change an be interpreted as a > regression. Even "binary doesn't hash to exact same signature" is a > regression. > >> I have posted a small patch which makes this 650x faster without >> appreciable complication. > > It doesn't appear to support live migration, or hiding the feature for > -M older. > > It's not a good path to follow. Tomorrow we'll need to load 300MB > initrds and we'll have to rework this yet again. Meanwhile the kernel > and virtio support demand loading of any image size you'd want to use. firmware is totally broken with respect to -M older FWIW. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 16:46 ` Anthony Liguori @ 2010-08-03 16:50 ` Avi Kivity 2010-08-03 16:53 ` Anthony Liguori 2010-08-03 16:56 ` Anthony Liguori 0 siblings, 2 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-03 16:50 UTC (permalink / raw) To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 07:46 PM, Anthony Liguori wrote: >> It doesn't appear to support live migration, or hiding the feature >> for -M older. >> >> It's not a good path to follow. Tomorrow we'll need to load 300MB >> initrds and we'll have to rework this yet again. Meanwhile the >> kernel and virtio support demand loading of any image size you'd want >> to use. > > > firmware is totally broken with respect to -M older FWIW. > Well, then this is adding to the brokenness. fwcfg dma is going to have exactly one user, libguestfs. Much better to have libguestfs move to some other interface and improve are users-to-interfaces ratio. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 16:50 ` Avi Kivity @ 2010-08-03 16:53 ` Anthony Liguori 2010-08-03 17:01 ` Avi Kivity 2010-08-03 16:56 ` Anthony Liguori 1 sibling, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-03 16:53 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 11:50 AM, Avi Kivity wrote: > On 08/03/2010 07:46 PM, Anthony Liguori wrote: >>> It doesn't appear to support live migration, or hiding the feature >>> for -M older. >>> >>> It's not a good path to follow. Tomorrow we'll need to load 300MB >>> initrds and we'll have to rework this yet again. Meanwhile the >>> kernel and virtio support demand loading of any image size you'd >>> want to use. >> >> >> firmware is totally broken with respect to -M older FWIW. >> > > Well, then this is adding to the brokenness. > > fwcfg dma is going to have exactly one user, libguestfs. Much better > to have libguestfs move to some other interface and improve are > users-to-interfaces ratio. You mean, only one class of users cares about the performance of loading an initrd. However, you've also argued in other threads how important it is not to break libvirt even if it means we have to do silly things (like change help text). So... why is it that libguestfs has to change itself and yet we should bend over backwards so libvirt doesn't have to change itself? Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 16:53 ` Anthony Liguori @ 2010-08-03 17:01 ` Avi Kivity 2010-08-03 17:42 ` Anthony Liguori 0 siblings, 1 reply; 151+ messages in thread From: Avi Kivity @ 2010-08-03 17:01 UTC (permalink / raw) To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 07:53 PM, Anthony Liguori wrote: > On 08/03/2010 11:50 AM, Avi Kivity wrote: >> On 08/03/2010 07:46 PM, Anthony Liguori wrote: >>>> It doesn't appear to support live migration, or hiding the feature >>>> for -M older. >>>> >>>> It's not a good path to follow. Tomorrow we'll need to load 300MB >>>> initrds and we'll have to rework this yet again. Meanwhile the >>>> kernel and virtio support demand loading of any image size you'd >>>> want to use. >>> >>> >>> firmware is totally broken with respect to -M older FWIW. >>> >> >> Well, then this is adding to the brokenness. >> >> fwcfg dma is going to have exactly one user, libguestfs. Much better >> to have libguestfs move to some other interface and improve are >> users-to-interfaces ratio. > > You mean, only one class of users cares about the performance of > loading an initrd. However, you've also argued in other threads how > important it is not to break libvirt even if it means we have to do > silly things (like change help text). > > So... why is it that libguestfs has to change itself and yet we should > bend over backwards so libvirt doesn't have to change itself? libvirt is a major user that is widely deployed, and would be completely broken if we change -help. Changing -help is of no consequence to us. libguestfs is a (pardon me) minor user that is not widely used, and would suffer a performance regression, not total breakage, unless we add a fw-dma interface. Adding the interface is of consequence to us: we have to implement live migration and backwards compatibility, and support this new interface for a long while. In an ideal world we wouldn't tolerate any regression. The world is not ideal, so we prioritize. the -help change scores very high on benfit/cost. fw-dma, much lower. Note in both cases the long term solution is for the user to move to another interface (cap reporting, virtio), so adding an interface which would only be abandoned later by its only user drops the benfit/cost ratio even further. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 17:01 ` Avi Kivity @ 2010-08-03 17:42 ` Anthony Liguori 2010-08-03 17:58 ` Avi Kivity 0 siblings, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-03 17:42 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 12:01 PM, Avi Kivity wrote: >> You mean, only one class of users cares about the performance of >> loading an initrd. However, you've also argued in other threads how >> important it is not to break libvirt even if it means we have to do >> silly things (like change help text). >> >> So... why is it that libguestfs has to change itself and yet we >> should bend over backwards so libvirt doesn't have to change itself? > > > libvirt is a major user that is widely deployed, and would be > completely broken if we change -help. Changing -help is of no > consequence to us. > libguestfs is a (pardon me) minor user that is not widely used, and > would suffer a performance regression, not total breakage, unless we > add a fw-dma interface. Adding the interface is of consequence to us: > we have to implement live migration and backwards compatibility, and > support this new interface for a long while. I certainly buy the argument about making changes of little consequence to us vs. ones that we have to be concerned about long term. However, I don't think we can objectively differentiate between a "major" and "minor" user. Generally speaking, I would rather that we not take the position of "you are a minor user therefore we're not going to accommodate you". Regards, Anthony Liguori > > In an ideal world we wouldn't tolerate any regression. The world is > not ideal, so we prioritize. > > the -help change scores very high on benfit/cost. fw-dma, much lower. > > Note in both cases the long term solution is for the user to move to > another interface (cap reporting, virtio), so adding an interface > which would only be abandoned later by its only user drops the > benfit/cost ratio even further. > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 17:42 ` Anthony Liguori @ 2010-08-03 17:58 ` Avi Kivity 2010-08-03 18:11 ` Richard W.M. Jones 2010-08-03 18:26 ` Anthony Liguori 0 siblings, 2 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-03 17:58 UTC (permalink / raw) To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 08:42 PM, Anthony Liguori wrote: > However, I don't think we can objectively differentiate between a > "major" and "minor" user. Generally speaking, I would rather that we > not take the position of "you are a minor user therefore we're not > going to accommodate you". Again it's a matter of practicalities. With have written virtio drivers for Windows and Linux, but not for FreeDOS or NetWare. To speed up Windows XP we have (in qemu-kvm) kvm-tpr-opt.c that is a gross breach of decency, would we go to the same lengths to speed up Haiku? I suggest that we would not. libvirt and Windows XP did not win "major user" status by making large anonymous donations to qemu developers. They did so by having lots of users. Those users are our end users, and we should be focusing our efforts in a way that maximizes the gain for as large a number of those end users as we can. Not breaking libvirt will be unknowingly appreciated by a large number of users, every day. Not slowing down libguestfs, by a much smaller number for a much shorter time. If it were just a matter of changing the help text I wouldn't mind at all, but introducing an undocumented migration-unsafe broken-dma interface isn't something I'm happy to do. btw, gaining back some of the speed that we lost _is_ something I want to do, since it doesn't break or add any interfaces, and would be a gain not just for libguestfs, but also for Windows installs (which use string pio extensively). Richard, can you test kvm.git master? it already contains one fix and we plan to add more. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 17:58 ` Avi Kivity @ 2010-08-03 18:11 ` Richard W.M. Jones 2010-08-03 18:26 ` Anthony Liguori 1 sibling, 0 replies; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-03 18:11 UTC (permalink / raw) To: Avi Kivity; +Cc: Gleb Natapov, qemu-devel, kvm On Tue, Aug 03, 2010 at 08:58:10PM +0300, Avi Kivity wrote: > Richard, can you test kvm.git > master? it already contains one fix and we plan to add more. Yup, I will ... Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#) http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 17:58 ` Avi Kivity 2010-08-03 18:11 ` Richard W.M. Jones @ 2010-08-03 18:26 ` Anthony Liguori 2010-08-03 18:43 ` Avi Kivity 1 sibling, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-03 18:26 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 12:58 PM, Avi Kivity wrote: > On 08/03/2010 08:42 PM, Anthony Liguori wrote: >> However, I don't think we can objectively differentiate between a >> "major" and "minor" user. Generally speaking, I would rather that we >> not take the position of "you are a minor user therefore we're not >> going to accommodate you". > > Again it's a matter of practicalities. With have written virtio > drivers for Windows and Linux, but not for FreeDOS or NetWare. To > speed up Windows XP we have (in qemu-kvm) kvm-tpr-opt.c that is a > gross breach of decency, would we go to the same lengths to speed up > Haiku? I suggest that we would not. tpr-opt optimizes a legitimate dependence on the x86 architecture that Windows has. While the implementation may be grossly indecent, it certainly fits the overall mission of what we're trying to do in qemu and kvm which is emulate an architecture. You've invested a lot of time and effort into it because it's important to you (or more specifically, your employer). That's because Windows is important to you. If someone as adept and commit as you was heavily invested in Haiku and was willing to implement something equivalent to tpr-opt and also willing to do all of the work of maintaining it, then reject such a patch would be a mistake. If Richard is willing to do the work to make -kernel perform faster in such a way that it fits into the overall mission of what we're building, then I see no reason to reject it. The criteria for evaluating a patch should only depend on how it affects other areas of qemu and whether it impacts overall usability. As a side note, we ought to do a better job of removing features that have created a burden on other areas of qemu that aren't actively being maintained. That's a different discussion though. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 18:26 ` Anthony Liguori @ 2010-08-03 18:43 ` Avi Kivity 2010-08-03 18:47 ` Avi Kivity ` (4 more replies) 0 siblings, 5 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-03 18:43 UTC (permalink / raw) To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 09:26 PM, Anthony Liguori wrote: > On 08/03/2010 12:58 PM, Avi Kivity wrote: >> On 08/03/2010 08:42 PM, Anthony Liguori wrote: >>> However, I don't think we can objectively differentiate between a >>> "major" and "minor" user. Generally speaking, I would rather that >>> we not take the position of "you are a minor user therefore we're >>> not going to accommodate you". >> >> Again it's a matter of practicalities. With have written virtio >> drivers for Windows and Linux, but not for FreeDOS or NetWare. To >> speed up Windows XP we have (in qemu-kvm) kvm-tpr-opt.c that is a >> gross breach of decency, would we go to the same lengths to speed up >> Haiku? I suggest that we would not. > > tpr-opt optimizes a legitimate dependence on the x86 architecture that > Windows has. While the implementation may be grossly indecent, it > certainly fits the overall mission of what we're trying to do in qemu > and kvm which is emulate an architecture. > > You've invested a lot of time and effort into it because it's > important to you (or more specifically, your employer). That's > because Windows is important to you. Correct. > > If someone as adept and commit as you was heavily invested in Haiku > and was willing to implement something equivalent to tpr-opt and also > willing to do all of the work of maintaining it, then reject such a > patch would be a mistake. libguestfs does not depend on an x86 architectural feature. qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We should discourage people from depending on this interface for production use. > > If Richard is willing to do the work to make -kernel perform faster in > such a way that it fits into the overall mission of what we're > building, then I see no reason to reject it. The criteria for > evaluating a patch should only depend on how it affects other areas of > qemu and whether it impacts overall usability. That's true, but extending fwcfg doesn't fit into the overall picture well. We have well defined interfaces for pushing data into a guest: virtio-serial (dma upload), virtio-blk (adds demand paging), and virtio-p9fs (no image needed). Adapting libguestfs to use one of these is a better move than adding yet another interface. A better (though still inaccurate) analogy is would be if the developers of a guest OS came up with a virtual bus for devices and were willing to do the work to make this bus perform better. Would we accept this new work or would we point them at our existing bus (pci) instead? Really, the bar on new interfaces (both to guest and host) should be high, much higher than it is now. Interfaces should be well documented, future proof, migration safe, and orthogonal to existing interfaces. While the first three points could be improved with some effort, adding a new dma interface is not going to be orthogonal to virtio. And frankly, libguestfs is better off switching to one of the other interfaces. Slurping huge initrds isn't the right way to do this. > As a side note, we ought to do a better job of removing features that > have created a burden on other areas of qemu that aren't actively > being maintained. That's a different discussion though. Sure, we need something like Linux' Documentation/feature-removal-schedule.txt for people to ignore. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 18:43 ` Avi Kivity @ 2010-08-03 18:47 ` Avi Kivity 2010-08-03 18:55 ` Anthony Liguori ` (3 subsequent siblings) 4 siblings, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-03 18:47 UTC (permalink / raw) To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 09:43 PM, Avi Kivity wrote: > Really, the bar on new interfaces (both to guest and host) should be > high, much higher than it is now. Interfaces should be well > documented, future proof, migration safe, and orthogonal to existing > interfaces. While the first three points could be improved with some > effort, adding a new dma interface is not going to be orthogonal to > virtio. And frankly, libguestfs is better off switching to one of the > other interfaces. Slurping huge initrds isn't the right way to do this. btw, precedent should play no role here. Just because an older interfaces wasn't documented or migration safe or unit-tested doesn't mean new ones get off the hook. It does help to have a framework in place that we can point people at, for example I added a skeleton Documentation/kvm/api.txt and some unit tests and then made contributors fill them in for new features. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 18:43 ` Avi Kivity 2010-08-03 18:47 ` Avi Kivity @ 2010-08-03 18:55 ` Anthony Liguori 2010-08-03 19:00 ` Avi Kivity 2010-08-03 19:05 ` Gleb Natapov ` (2 subsequent siblings) 4 siblings, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-03 18:55 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 01:43 PM, Avi Kivity wrote: >> >> If Richard is willing to do the work to make -kernel perform faster >> in such a way that it fits into the overall mission of what we're >> building, then I see no reason to reject it. The criteria for >> evaluating a patch should only depend on how it affects other areas >> of qemu and whether it impacts overall usability. > > That's true, but extending fwcfg doesn't fit into the overall picture > well. We have well defined interfaces for pushing data into a guest: > virtio-serial (dma upload), virtio-blk (adds demand paging), and > virtio-p9fs (no image needed). Adapting libguestfs to use one of > these is a better move than adding yet another interface. On real hardware, there's an awful lot of interaction between the firmware and the platform. It's a pretty rich interface. On IBM systems, we actually extend that all the way down to userspace via a virtual USB RNDIS driver that you can use IPMI over. > A better (though still inaccurate) analogy is would be if the > developers of a guest OS came up with a virtual bus for devices and > were willing to do the work to make this bus perform better. Would we > accept this new work or would we point them at our existing bus (pci) > instead? Doesn't this precisely describe virtio-s390? > > Really, the bar on new interfaces (both to guest and host) should be > high, much higher than it is now. Interfaces should be well > documented, future proof, migration safe, and orthogonal to existing > interfaces. Okay, but this is a bigger discussion that I'm very eager to have. But we shouldn't explicitly apply new policies to random patches without clearly stating the policy up front. Regards, Anthony Liguori > While the first three points could be improved with some effort, > adding a new dma interface is not going to be orthogonal to virtio. > And frankly, libguestfs is better off switching to one of the other > interfaces. Slurping huge initrds isn't the right way to do this. > >> As a side note, we ought to do a better job of removing features that >> have created a burden on other areas of qemu that aren't actively >> being maintained. That's a different discussion though. > > Sure, we need something like Linux' > Documentation/feature-removal-schedule.txt for people to ignore. > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 18:55 ` Anthony Liguori @ 2010-08-03 19:00 ` Avi Kivity 0 siblings, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-03 19:00 UTC (permalink / raw) To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 09:55 PM, Anthony Liguori wrote: > On 08/03/2010 01:43 PM, Avi Kivity wrote: >>> >>> If Richard is willing to do the work to make -kernel perform faster >>> in such a way that it fits into the overall mission of what we're >>> building, then I see no reason to reject it. The criteria for >>> evaluating a patch should only depend on how it affects other areas >>> of qemu and whether it impacts overall usability. >> >> That's true, but extending fwcfg doesn't fit into the overall picture >> well. We have well defined interfaces for pushing data into a guest: >> virtio-serial (dma upload), virtio-blk (adds demand paging), and >> virtio-p9fs (no image needed). Adapting libguestfs to use one of >> these is a better move than adding yet another interface. > > On real hardware, there's an awful lot of interaction between the > firmware and the platform. It's a pretty rich interface. On IBM > systems, we actually extend that all the way down to userspace via a > virtual USB RNDIS driver that you can use IPMI over. That is fine and we'll do pv interfaces when we have to. That's fwfg, that's virtio. But let's not do more than we have to. > >> A better (though still inaccurate) analogy is would be if the >> developers of a guest OS came up with a virtual bus for devices and >> were willing to do the work to make this bus perform better. Would >> we accept this new work or would we point them at our existing bus >> (pci) instead? > > Doesn't this precisely describe virtio-s390? As I understood it, s390 had good reasons not to use their native interfaces. On x86 we have no good reason not to use pci and no good reason not to use virtio for dma. >> >> Really, the bar on new interfaces (both to guest and host) should be >> high, much higher than it is now. Interfaces should be well >> documented, future proof, migration safe, and orthogonal to existing >> interfaces. > > Okay, but this is a bigger discussion that I'm very eager to have. > But we shouldn't explicitly apply new policies to random patches > without clearly stating the policy up front. > Migration safety has been part of the criteria for a while. Future proofness less so. Documentation was usually completely missing but I see no reason not to insist on it now, better late than never. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 18:43 ` Avi Kivity 2010-08-03 18:47 ` Avi Kivity 2010-08-03 18:55 ` Anthony Liguori @ 2010-08-03 19:05 ` Gleb Natapov 2010-08-03 19:09 ` Avi Kivity 2010-08-03 19:15 ` Anthony Liguori 2010-08-03 19:13 ` Richard W.M. Jones 2010-08-04 14:51 ` David S. Ahern 4 siblings, 2 replies; 151+ messages in thread From: Gleb Natapov @ 2010-08-03 19:05 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, qemu-devel On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote: > > > >If Richard is willing to do the work to make -kernel perform > >faster in such a way that it fits into the overall mission of what > >we're building, then I see no reason to reject it. The criteria > >for evaluating a patch should only depend on how it affects other > >areas of qemu and whether it impacts overall usability. > > That's true, but extending fwcfg doesn't fit into the overall > picture well. We have well defined interfaces for pushing data into > a guest: virtio-serial (dma upload), virtio-blk (adds demand > paging), and virtio-p9fs (no image needed). Adapting libguestfs to > use one of these is a better move than adding yet another interface. > +1. I already proposed that. Nobody objects against fast fast communication channel between guest and host. In fact we have one: virtio-serial. Of course it is much easier to hack dma semantic into fw_cfg interface than add virtio-serial to seabios, but it doesn't make it right. Does virtio-serial has to be exposed as PCI to a guest or can we expose it as ISA device too in case someone want to use -kernel option but do not see additional PCI device in a guest? -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 19:05 ` Gleb Natapov @ 2010-08-03 19:09 ` Avi Kivity 2010-08-03 19:15 ` Anthony Liguori 1 sibling, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-03 19:09 UTC (permalink / raw) To: Gleb Natapov; +Cc: kvm, Richard W.M. Jones, qemu-devel On 08/03/2010 10:05 PM, Gleb Natapov wrote: > >> That's true, but extending fwcfg doesn't fit into the overall >> picture well. We have well defined interfaces for pushing data into >> a guest: virtio-serial (dma upload), virtio-blk (adds demand >> paging), and virtio-p9fs (no image needed). Adapting libguestfs to >> use one of these is a better move than adding yet another interface. >> > +1. I already proposed that. Nobody objects against fast fast > communication channel between guest and host. In fact we have one: > virtio-serial. Of course it is much easier to hack dma semantic into > fw_cfg interface than add virtio-serial to seabios, but it doesn't make > it right. Does virtio-serial has to be exposed as PCI to a guest or can > we expose it as ISA device too in case someone want to use -kernel option > but do not see additional PCI device in a guest? No need for virtio-serial in firmware. We can have a small initrd slurp a larger filesystem via virtio-serial, or mount a virtio-blk or virtio-p9fs, or boot the whole thing from a virtio-blk image and avoid -kernel -initrd completely. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 19:05 ` Gleb Natapov 2010-08-03 19:09 ` Avi Kivity @ 2010-08-03 19:15 ` Anthony Liguori 2010-08-03 19:24 ` Avi Kivity 2010-08-03 19:26 ` Gleb Natapov 1 sibling, 2 replies; 151+ messages in thread From: Anthony Liguori @ 2010-08-03 19:15 UTC (permalink / raw) To: Gleb Natapov; +Cc: qemu-devel, Avi Kivity, kvm, Richard W.M. Jones On 08/03/2010 02:05 PM, Gleb Natapov wrote: > On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote: > >>> If Richard is willing to do the work to make -kernel perform >>> faster in such a way that it fits into the overall mission of what >>> we're building, then I see no reason to reject it. The criteria >>> for evaluating a patch should only depend on how it affects other >>> areas of qemu and whether it impacts overall usability. >>> >> That's true, but extending fwcfg doesn't fit into the overall >> picture well. We have well defined interfaces for pushing data into >> a guest: virtio-serial (dma upload), virtio-blk (adds demand >> paging), and virtio-p9fs (no image needed). Adapting libguestfs to >> use one of these is a better move than adding yet another interface. >> >> > +1. I already proposed that. Nobody objects against fast fast > communication channel between guest and host. In fact we have one: > virtio-serial. Of course it is much easier to hack dma semantic into > fw_cfg interface than add virtio-serial to seabios, but it doesn't make > it right. Does virtio-serial has to be exposed as PCI to a guest or can > we expose it as ISA device too in case someone want to use -kernel option > but do not see additional PCI device in a guest? > fw_cfg has to be available pretty early on so relying on a PCI device isn't reasonable. Having dual interfaces seems wasteful. We're already doing bulk data transfer over fw_cfg as we need to do it to transfer roms and potentially a boot splash. Even outside of loading an initrd, the performance is going to start to matter with a large number of devices. Regards, Anthony Liguori > -- > Gleb. > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 19:15 ` Anthony Liguori @ 2010-08-03 19:24 ` Avi Kivity 2010-08-03 19:38 ` Anthony Liguori ` (2 more replies) 2010-08-03 19:26 ` Gleb Natapov 1 sibling, 3 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-03 19:24 UTC (permalink / raw) To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 10:15 PM, Anthony Liguori wrote: > > fw_cfg has to be available pretty early on so relying on a PCI device > isn't reasonable. Having dual interfaces seems wasteful. Agree. > > We're already doing bulk data transfer over fw_cfg as we need to do it > to transfer roms and potentially a boot splash. Why do we need to transfer roms? These are devices on the memory bus or pci bus, it just needs to be there at the right address. Boot splash should just be another rom as it would be on a real system. > Even outside of loading an initrd, the performance is going to start > to matter with a large number of devices. I don't really see why. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 19:24 ` Avi Kivity @ 2010-08-03 19:38 ` Anthony Liguori 2010-08-03 19:41 ` Avi Kivity 2010-08-03 21:20 ` Gerd Hoffmann 2010-08-03 22:06 ` Richard W.M. Jones 2 siblings, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-03 19:38 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 02:24 PM, Avi Kivity wrote: > On 08/03/2010 10:15 PM, Anthony Liguori wrote: >> >> fw_cfg has to be available pretty early on so relying on a PCI device >> isn't reasonable. Having dual interfaces seems wasteful. > > Agree. > >> >> We're already doing bulk data transfer over fw_cfg as we need to do >> it to transfer roms and potentially a boot splash. > > Why do we need to transfer roms? These are devices on the memory bus > or pci bus, it just needs to be there at the right address. Not quite. The BIOS owns the option ROM space. The way it works on bare metal is that the PCI ROM BAR gets mapped to some location in physical memory by the BIOS, the BIOS executes the initialization vector, and after initialization, the ROM will reorganize itself into something smaller. It's nice and clean. But ISA is not nearly as clean. Ultimately, to make this mix work in a reasonable way, we have to provide a side channel interface to SeaBIOS such that we can deliver ROMs outside of PCI and still let SeaBIOS decide how ROMs get organized. It's additionally complicated by the fact that we didn't support PCI ROM BAR until recently so to maintain compatibility with -M older, we have to use a side channel to lay out option roms. Regards, Anthony Liguori > Boot splash should just be another rom as it would be on a real system. > >> Even outside of loading an initrd, the performance is going to start >> to matter with a large number of devices. > > I don't really see why. > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 19:38 ` Anthony Liguori @ 2010-08-03 19:41 ` Avi Kivity 2010-08-03 19:47 ` Anthony Liguori 2010-08-03 21:24 ` Gerd Hoffmann 0 siblings, 2 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-03 19:41 UTC (permalink / raw) To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 10:38 PM, Anthony Liguori wrote: >> Why do we need to transfer roms? These are devices on the memory bus >> or pci bus, it just needs to be there at the right address. > > > Not quite. The BIOS owns the option ROM space. The way it works on > bare metal is that the PCI ROM BAR gets mapped to some location in > physical memory by the BIOS, the BIOS executes the initialization > vector, and after initialization, the ROM will reorganize itself into > something smaller. It's nice and clean. > > But ISA is not nearly as clean. So far so good. > Ultimately, to make this mix work in a reasonable way, we have to > provide a side channel interface to SeaBIOS such that we can deliver > ROMs outside of PCI and still let SeaBIOS decide how ROMs get organized. I don't follow. Why do we need this side channel? What would a real ISA machine do? Are there actually enough ISA devices for there to be a problem? > > It's additionally complicated by the fact that we didn't support PCI > ROM BAR until recently so to maintain compatibility with -M older, we > have to use a side channel to lay out option roms. Again I don't follow. We can just lay out the ROMs in memory like we did in the past? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 19:41 ` Avi Kivity @ 2010-08-03 19:47 ` Anthony Liguori 2010-08-04 5:47 ` Avi Kivity 2010-08-03 21:24 ` Gerd Hoffmann 1 sibling, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-03 19:47 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 02:41 PM, Avi Kivity wrote: > On 08/03/2010 10:38 PM, Anthony Liguori wrote: >>> Why do we need to transfer roms? These are devices on the memory >>> bus or pci bus, it just needs to be there at the right address. >> >> >> Not quite. The BIOS owns the option ROM space. The way it works on >> bare metal is that the PCI ROM BAR gets mapped to some location in >> physical memory by the BIOS, the BIOS executes the initialization >> vector, and after initialization, the ROM will reorganize itself into >> something smaller. It's nice and clean. >> >> But ISA is not nearly as clean. > > So far so good. > >> Ultimately, to make this mix work in a reasonable way, we have to >> provide a side channel interface to SeaBIOS such that we can deliver >> ROMs outside of PCI and still let SeaBIOS decide how ROMs get organized. > > I don't follow. Why do we need this side channel? What would a real > ISA machine do? It depends on the ISA machine. In the worst case, there's a DIP switch on the card and if you've got a conflict between two cards, you start flipping DIP switches. It's pure awesomeness. No, I don't want to emulate DIP switches :-) > Are there actually enough ISA devices for there to be a problem? No, but -M older has the same problem. >> >> It's additionally complicated by the fact that we didn't support PCI >> ROM BAR until recently so to maintain compatibility with -M older, we >> have to use a side channel to lay out option roms. > > Again I don't follow. We can just lay out the ROMs in memory like we > did in the past? Because only one component can own the option ROM space. Either that's SeaBIOS and we need a side channel or it's QEMU and we can't use PMM. I guess that's the real issue here. Previously we used etherboot which was well under 32k. We only loaded roms we needed. Now we use gPXE which is much bigger and if you don't use PMM, then you run out of option rom space very quickly. Previously, we loaded option ROMs on demand when a user used -boot n but that was a giant hack and wasn't like bare metal at all. It involved x86-isms in vl.c. Now we always load ROMs so PMM is very important. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 19:47 ` Anthony Liguori @ 2010-08-04 5:47 ` Avi Kivity 0 siblings, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 5:47 UTC (permalink / raw) To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 10:47 PM, Anthony Liguori wrote: > On 08/03/2010 02:41 PM, Avi Kivity wrote: >> On 08/03/2010 10:38 PM, Anthony Liguori wrote: >>>> Why do we need to transfer roms? These are devices on the memory >>>> bus or pci bus, it just needs to be there at the right address. >>> >>> >>> Not quite. The BIOS owns the option ROM space. The way it works on >>> bare metal is that the PCI ROM BAR gets mapped to some location in >>> physical memory by the BIOS, the BIOS executes the initialization >>> vector, and after initialization, the ROM will reorganize itself >>> into something smaller. It's nice and clean. >>> >>> But ISA is not nearly as clean. >> >> So far so good. >> >>> Ultimately, to make this mix work in a reasonable way, we have to >>> provide a side channel interface to SeaBIOS such that we can deliver >>> ROMs outside of PCI and still let SeaBIOS decide how ROMs get >>> organized. >> >> I don't follow. Why do we need this side channel? What would a real >> ISA machine do? > > It depends on the ISA machine. In the worst case, there's a DIP > switch on the card and if you've got a conflict between two cards, you > start flipping DIP switches. It's pure awesomeness. No, I don't want > to emulate DIP switches :-) How else do you set the IRQ line and I/O port base address? static ISADeviceInfo ne2000_isa_info = { .qdev.name = "ne2k_isa", .qdev.size = sizeof(ISANE2000State), .init = isa_ne2000_initfn, .qdev.props = (Property[]) { DEFINE_PROP_HEX32("iobase", ISANE2000State, iobase, 0x300), DEFINE_PROP_UINT32("irq", ISANE2000State, isairq, 9), + DEFINE_PROP_HEX32("rombase", ISANE2000State, isarombase, 0xe8000), DEFINE_NIC_PROPERTIES(ISANE2000State, ne2000.c), DEFINE_PROP_END_OF_LIST(), }, }; we already are emulating DIP switches... > >> Are there actually enough ISA devices for there to be a problem? > > No, but -M older has the same problem. So we do the same solution we did in older. We didn't have fwcfg dma back then. > >>> >>> It's additionally complicated by the fact that we didn't support PCI >>> ROM BAR until recently so to maintain compatibility with -M older, >>> we have to use a side channel to lay out option roms. >> >> Again I don't follow. We can just lay out the ROMs in memory like we >> did in the past? > > Because only one component can own the option ROM space. Either > that's SeaBIOS and we need a side channel or it's QEMU and we can't > use PMM. > > I guess that's the real issue here. Previously we used etherboot > which was well under 32k. We only loaded roms we needed. Now we use > gPXE which is much bigger and if you don't use PMM, then you run out > of option rom space very quickly. A true -M older would use the older ROMs for full compatibility. > > Previously, we loaded option ROMs on demand when a user used -boot n > but that was a giant hack and wasn't like bare metal at all. It > involved x86-isms in vl.c. Now we always load ROMs so PMM is very > important. Though it's a hack, we can load ROMs via the existing fwcfg interface; no need for an extension. Richard is seeing problems loading 100MB initrds, not 64KB ROMs. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 19:41 ` Avi Kivity 2010-08-03 19:47 ` Anthony Liguori @ 2010-08-03 21:24 ` Gerd Hoffmann 1 sibling, 0 replies; 151+ messages in thread From: Gerd Hoffmann @ 2010-08-03 21:24 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm, Gleb Natapov, Richard W.M. Jones Hi, > Again I don't follow. We can just lay out the ROMs in memory like we did > in the past? Well. We have some size issues then. PCI ROMS are loaded by the BIOS in a way that only a small fraction is actually resident in the small 0xd0000 -> 0xe0000 area. That doesn't work if qemu tries to simply copy the whole thing there like old versions did. With the size of the gPXE roms this matters in real life. cheers, Gerd ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 19:24 ` Avi Kivity 2010-08-03 19:38 ` Anthony Liguori @ 2010-08-03 21:20 ` Gerd Hoffmann 2010-08-04 5:53 ` Avi Kivity 2010-08-03 22:06 ` Richard W.M. Jones 2 siblings, 1 reply; 151+ messages in thread From: Gerd Hoffmann @ 2010-08-03 21:20 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm, Gleb Natapov, Richard W.M. Jones Hi, >> We're already doing bulk data transfer over fw_cfg as we need to do it >> to transfer roms and potentially a boot splash. > > Why do we need to transfer roms? These are devices on the memory bus or > pci bus, it just needs to be there at the right address. Indeed. We do that in most cases. The exceptions are: (1) -M somethingold. PCI devices don't have a pci rom bar then by default because they didn't not have one in older qemu versions, so we need some other way to pass the option rom to seabios. (2) vgabios.bin. vgabios needs patches to make loading via pci rom bar work (vgabios-cirrus.bin works fine already). I have patches in the queue to do that. (3) roms not associated with a PCI device: multiboot, extboot, -option-rom command line switch, vgabios for -M isapc. The default configuration (qemu $diskimage) loads two roms: vgabios-cirrus.bin and e1000.bin. Both are loaded via pci rom bar and not via fw_cfg. cheers, Gerd ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 21:20 ` Gerd Hoffmann @ 2010-08-04 5:53 ` Avi Kivity 2010-08-04 7:56 ` Gerd Hoffmann 0 siblings, 1 reply; 151+ messages in thread From: Avi Kivity @ 2010-08-04 5:53 UTC (permalink / raw) To: Gerd Hoffmann; +Cc: qemu-devel, kvm, Gleb Natapov, Richard W.M. Jones On 08/04/2010 12:20 AM, Gerd Hoffmann wrote: > Hi, > >>> We're already doing bulk data transfer over fw_cfg as we need to do it >>> to transfer roms and potentially a boot splash. >> >> Why do we need to transfer roms? These are devices on the memory bus or >> pci bus, it just needs to be there at the right address. > > Indeed. We do that in most cases. The exceptions are: > > (1) -M somethingold. PCI devices don't have a pci rom bar then by > default because they didn't not have one in older qemu versions, > so we need some other way to pass the option rom to seabios. What did we do back then? before we had the fwcfg interface? > (2) vgabios.bin. vgabios needs patches to make loading via pci rom > bar work (vgabios-cirrus.bin works fine already). I have patches > in the queue to do that. So not an issue. > (3) roms not associated with a PCI device: multiboot, extboot, > -option-rom command line switch, vgabios for -M isapc. We could lay those out in high memory (4GB-512MB) and have the bios copy them from there. I believe that's what real hardware does - the flash chip is mapped there (the reset vector is at 4GB-16) and shadowed at the end of the 1MB 8086 range. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 5:53 ` Avi Kivity @ 2010-08-04 7:56 ` Gerd Hoffmann 2010-08-04 8:17 ` Avi Kivity 0 siblings, 1 reply; 151+ messages in thread From: Gerd Hoffmann @ 2010-08-04 7:56 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm, Gleb Natapov, Richard W.M. Jones Hi, >> (1) -M somethingold. PCI devices don't have a pci rom bar then by >> default because they didn't not have one in older qemu versions, >> so we need some other way to pass the option rom to seabios. > > What did we do back then? before we had the fwcfg interface? Have qemu instead of bochs/seabios manage the vgabios/optionrom area (0xc8000 -> 0xe0000) and copy the roms to memory. Which implies the whole rom has to sit there as PMM can't be used then. >> (3) roms not associated with a PCI device: multiboot, extboot, >> -option-rom command line switch, vgabios for -M isapc. > > We could lay those out in high memory (4GB-512MB) and have the bios copy > them from there. Yea, we could. But it is pointless IMHO. $ ls -l *.bin -rwxrwxr-x. 1 kraxel kraxel 1536 Jul 15 15:51 extboot.bin* -rwxrwxr-x. 1 kraxel kraxel 1024 Jul 15 15:51 linuxboot.bin* -rwxrwxr-x. 1 kraxel kraxel 1024 Jul 15 15:51 multiboot.bin* -rwxrwxr-x. 1 kraxel kraxel 8960 Jul 15 15:51 vapic.bin* That are the ones we can't load via pci rom bar. Look how small they are. cheers, Gerd ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 7:56 ` Gerd Hoffmann @ 2010-08-04 8:17 ` Avi Kivity 2010-08-04 8:43 ` Gleb Natapov ` (2 more replies) 0 siblings, 3 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 8:17 UTC (permalink / raw) To: Gerd Hoffmann; +Cc: qemu-devel, kvm, Gleb Natapov, Richard W.M. Jones On 08/04/2010 10:56 AM, Gerd Hoffmann wrote: > Hi, > >>> (1) -M somethingold. PCI devices don't have a pci rom bar then by >>> default because they didn't not have one in older qemu versions, >>> so we need some other way to pass the option rom to seabios. >> >> What did we do back then? before we had the fwcfg interface? > > Have qemu instead of bochs/seabios manage the vgabios/optionrom area > (0xc8000 -> 0xe0000) and copy the roms to memory. Which implies the > whole rom has to sit there as PMM can't be used then. Do we actually need PMM for isapc? Did PMM exist before pci? > >>> (3) roms not associated with a PCI device: multiboot, extboot, >>> -option-rom command line switch, vgabios for -M isapc. >> >> We could lay those out in high memory (4GB-512MB) and have the bios copy >> them from there. > > Yea, we could. But it is pointless IMHO. > > $ ls -l *.bin > -rwxrwxr-x. 1 kraxel kraxel 1536 Jul 15 15:51 extboot.bin* > -rwxrwxr-x. 1 kraxel kraxel 1024 Jul 15 15:51 linuxboot.bin* > -rwxrwxr-x. 1 kraxel kraxel 1024 Jul 15 15:51 multiboot.bin* > -rwxrwxr-x. 1 kraxel kraxel 8960 Jul 15 15:51 vapic.bin* > > That are the ones we can't load via pci rom bar. Look how small they > are. So they can just sit there? I'm confused, either there is enough address space and we don't need to play games, or there isn't and we do. For playing games, there are three options: - existing fwcfg - fwcfg+dma - put roms in 4GB-2MB (or whatever we decide the flash size is) and have the BIOS copy them Existing fwcfg is the least amount of work and probably satisfactory for isapc. fwcfg+dma is IMO going off a tangent. High memory flash is the most hardware-like solution, pretty easy from a qemu point of view but requires more work. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 8:17 ` Avi Kivity @ 2010-08-04 8:43 ` Gleb Natapov 2010-08-04 9:22 ` Gerd Hoffmann 2010-08-04 13:04 ` Anthony Liguori 2 siblings, 0 replies; 151+ messages in thread From: Gleb Natapov @ 2010-08-04 8:43 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm, Gerd Hoffmann, Richard W.M. Jones On Wed, Aug 04, 2010 at 11:17:28AM +0300, Avi Kivity wrote: > On 08/04/2010 10:56 AM, Gerd Hoffmann wrote: > > Hi, > > > >>>(1) -M somethingold. PCI devices don't have a pci rom bar then by > >>>default because they didn't not have one in older qemu versions, > >>>so we need some other way to pass the option rom to seabios. > >> > >>What did we do back then? before we had the fwcfg interface? > > > >Have qemu instead of bochs/seabios manage the vgabios/optionrom > >area (0xc8000 -> 0xe0000) and copy the roms to memory. Which > >implies the whole rom has to sit there as PMM can't be used then. > > Do we actually need PMM for isapc? Did PMM exist before pci? > > > > >>>(3) roms not associated with a PCI device: multiboot, extboot, > >>>-option-rom command line switch, vgabios for -M isapc. > >> > >>We could lay those out in high memory (4GB-512MB) and have the bios copy > >>them from there. > > > >Yea, we could. But it is pointless IMHO. > > > >$ ls -l *.bin > >-rwxrwxr-x. 1 kraxel kraxel 1536 Jul 15 15:51 extboot.bin* > >-rwxrwxr-x. 1 kraxel kraxel 1024 Jul 15 15:51 linuxboot.bin* > >-rwxrwxr-x. 1 kraxel kraxel 1024 Jul 15 15:51 multiboot.bin* > >-rwxrwxr-x. 1 kraxel kraxel 8960 Jul 15 15:51 vapic.bin* > > > >That are the ones we can't load via pci rom bar. Look how small > >they are. > > So they can just sit there? I'm confused, either there is enough > address space and we don't need to play games, or there isn't and we > do. > > For playing games, there are three options: > - existing fwcfg > - fwcfg+dma > - put roms in 4GB-2MB (or whatever we decide the flash size is) and > have the BIOS copy them > > Existing fwcfg is the least amount of work and probably satisfactory > for isapc. fwcfg+dma is IMO going off a tangent. High memory flash > is the most hardware-like solution, pretty easy from a qemu point of > view but requires more work. > We can do interface like that: guest enumerates available roms using fwcfg. Guest can tell host to map rom into guest specified IOMEM region. Guest copies rom from IOMEM region and tell host to unmap it. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 8:17 ` Avi Kivity 2010-08-04 8:43 ` Gleb Natapov @ 2010-08-04 9:22 ` Gerd Hoffmann 2010-08-04 13:04 ` Anthony Liguori 2 siblings, 0 replies; 151+ messages in thread From: Gerd Hoffmann @ 2010-08-04 9:22 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm, Gleb Natapov, Richard W.M. Jones On 08/04/10 10:17, Avi Kivity wrote: > On 08/04/2010 10:56 AM, Gerd Hoffmann wrote: >> Hi, >> >>>> (1) -M somethingold. PCI devices don't have a pci rom bar then by >>>> default because they didn't not have one in older qemu versions, >>>> so we need some other way to pass the option rom to seabios. >>> >>> What did we do back then? before we had the fwcfg interface? >> >> Have qemu instead of bochs/seabios manage the vgabios/optionrom area >> (0xc8000 -> 0xe0000) and copy the roms to memory. Which implies the >> whole rom has to sit there as PMM can't be used then. > > Do we actually need PMM for isapc? Did PMM exist before pci? I don't know. >>>> (3) roms not associated with a PCI device: multiboot, extboot, >>>> -option-rom command line switch, vgabios for -M isapc. >>> >>> We could lay those out in high memory (4GB-512MB) and have the bios copy >>> them from there. >> >> Yea, we could. But it is pointless IMHO. >> >> $ ls -l *.bin >> -rwxrwxr-x. 1 kraxel kraxel 1536 Jul 15 15:51 extboot.bin* >> -rwxrwxr-x. 1 kraxel kraxel 1024 Jul 15 15:51 linuxboot.bin* >> -rwxrwxr-x. 1 kraxel kraxel 1024 Jul 15 15:51 multiboot.bin* >> -rwxrwxr-x. 1 kraxel kraxel 8960 Jul 15 15:51 vapic.bin* >> >> That are the ones we can't load via pci rom bar. Look how small they are. > > So they can just sit there? I'm confused, either there is enough address > space and we don't need to play games, or there isn't and we do. Well. Looks like I should be a bit more verbose. The old (qemu 0.11) way was to have qemu load roms to memory and bochsbios/seabios scan the memory area for option rom signatures to find them. All option roms have to fit in there then, completely: vgabios (~40k) etherboot rom (~32k) extboot rom (~1k) The new way is to have seabios load roms to memory: vgabios (~40k) gPXE rom header (~2k IIRC) extboot rom (~1k) Thanks to SeaBIOS loading the roms only a small part of the gPXE rom has to live in the option rom area, everything else is stored somewhere else in high memory (using PMM, don't ask me how this works in detail). gPXE roms are ~56k in size (e1000 even 72k), so they would fill up the option rom area pretty quickly if we would load them the old way without PMM. Another advantage of seabios loading the roms is that parts of the 0xe0000 segment can be used then. Seabios size is just a bit more than 64k, so most of the 0xe0000 -> 0xf0000 area isn't actually used by seabios. seabios has two ways get the roms: (1) fw_cfg and (2) pci rom bar. The ones listed above are the ones which have to go through fw_cfg. There are more roms which have to fit into the option rom space (vgabios, one gPXE per nic), but these don't depend on fw_cfg. > For playing games, there are three options: > - existing fwcfg Gived the size that is good+fast enougth for the roms IMO. Kernel+initrd is another story though. We are talking about megabytes not kilobytes then. Standard fedora initramfs is ~14M on x86_64. cheers, Gerd ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 8:17 ` Avi Kivity 2010-08-04 8:43 ` Gleb Natapov 2010-08-04 9:22 ` Gerd Hoffmann @ 2010-08-04 13:04 ` Anthony Liguori 2010-08-04 13:07 ` Gleb Natapov 2010-08-04 16:25 ` Avi Kivity 2 siblings, 2 replies; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 13:04 UTC (permalink / raw) To: Avi Kivity Cc: qemu-devel, kvm, Gerd Hoffmann, Gleb Natapov, Richard W.M. Jones On 08/04/2010 03:17 AM, Avi Kivity wrote: > For playing games, there are three options: > - existing fwcfg > - fwcfg+dma > - put roms in 4GB-2MB (or whatever we decide the flash size is) and > have the BIOS copy them > > Existing fwcfg is the least amount of work and probably satisfactory > for isapc. fwcfg+dma is IMO going off a tangent. High memory flash > is the most hardware-like solution, pretty easy from a qemu point of > view but requires more work. The only trouble I see is that high memory isn't always available. If it's a 32-bit PC and you've exhausted RAM space, then you're only left with the PCI hole and it's not clear to me if you can really pull out 100mb of space there as an option ROM without breaking something. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 13:04 ` Anthony Liguori @ 2010-08-04 13:07 ` Gleb Natapov 2010-08-04 13:15 ` Anthony Liguori 2010-08-04 13:22 ` Richard W.M. Jones 2010-08-04 16:25 ` Avi Kivity 1 sibling, 2 replies; 151+ messages in thread From: Gleb Natapov @ 2010-08-04 13:07 UTC (permalink / raw) To: Anthony Liguori Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, kvm, Gerd Hoffmann On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: > On 08/04/2010 03:17 AM, Avi Kivity wrote: > >For playing games, there are three options: > >- existing fwcfg > >- fwcfg+dma > >- put roms in 4GB-2MB (or whatever we decide the flash size is) > >and have the BIOS copy them > > > >Existing fwcfg is the least amount of work and probably > >satisfactory for isapc. fwcfg+dma is IMO going off a tangent. > >High memory flash is the most hardware-like solution, pretty easy > >from a qemu point of view but requires more work. > > The only trouble I see is that high memory isn't always available. > If it's a 32-bit PC and you've exhausted RAM space, then you're only > left with the PCI hole and it's not clear to me if you can really > pull out 100mb of space there as an option ROM without breaking > something. > We can map it on demand. Guest tells qemu to map rom "A" to address X by writing into some io port. Guest copies rom. Guest tells qemu to unmap it. Better then DMA interface IMHO. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 13:07 ` Gleb Natapov @ 2010-08-04 13:15 ` Anthony Liguori 2010-08-04 13:24 ` Richard W.M. Jones 2010-08-04 13:34 ` Gleb Natapov 2010-08-04 13:22 ` Richard W.M. Jones 1 sibling, 2 replies; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 13:15 UTC (permalink / raw) To: Gleb Natapov Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, kvm, Gerd Hoffmann On 08/04/2010 08:07 AM, Gleb Natapov wrote: > On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: > >> On 08/04/2010 03:17 AM, Avi Kivity wrote: >> >>> For playing games, there are three options: >>> - existing fwcfg >>> - fwcfg+dma >>> - put roms in 4GB-2MB (or whatever we decide the flash size is) >>> and have the BIOS copy them >>> >>> Existing fwcfg is the least amount of work and probably >>> satisfactory for isapc. fwcfg+dma is IMO going off a tangent. >>> High memory flash is the most hardware-like solution, pretty easy >>> >> >from a qemu point of view but requires more work. >> >> The only trouble I see is that high memory isn't always available. >> If it's a 32-bit PC and you've exhausted RAM space, then you're only >> left with the PCI hole and it's not clear to me if you can really >> pull out 100mb of space there as an option ROM without breaking >> something. >> >> > We can map it on demand. Guest tells qemu to map rom "A" to address X by > writing into some io port. Guest copies rom. Guest tells qemu to unmap > it. Better then DMA interface IMHO. > That's what I thought too, but in a 32-bit guest using ~3.5GB of RAM, where can you safely get 100MB of memory to full map the ROM? If you're going to map chunks at a time, you are basically doing DMA. And what's the upper limit on ROM size that we impose? 100MB is already at the ridiculously large size. Regards, Anthony Liguori > -- > Gleb. > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 13:15 ` Anthony Liguori @ 2010-08-04 13:24 ` Richard W.M. Jones 2010-08-04 13:26 ` Gleb Natapov 2010-08-04 16:26 ` Avi Kivity 2010-08-04 13:34 ` Gleb Natapov 1 sibling, 2 replies; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-04 13:24 UTC (permalink / raw) To: Anthony Liguori; +Cc: qemu-devel, kvm, Avi Kivity, Gleb Natapov, Gerd Hoffmann On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote: > On 08/04/2010 08:07 AM, Gleb Natapov wrote: > >On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: > >>On 08/04/2010 03:17 AM, Avi Kivity wrote: > >>>For playing games, there are three options: > >>>- existing fwcfg > >>>- fwcfg+dma > >>>- put roms in 4GB-2MB (or whatever we decide the flash size is) > >>>and have the BIOS copy them > >>> > >>>Existing fwcfg is the least amount of work and probably > >>>satisfactory for isapc. fwcfg+dma is IMO going off a tangent. > >>>High memory flash is the most hardware-like solution, pretty easy > >>>from a qemu point of view but requires more work. > >> > >>The only trouble I see is that high memory isn't always available. > >>If it's a 32-bit PC and you've exhausted RAM space, then you're only > >>left with the PCI hole and it's not clear to me if you can really > >>pull out 100mb of space there as an option ROM without breaking > >>something. > >> > >We can map it on demand. Guest tells qemu to map rom "A" to address X by > >writing into some io port. Guest copies rom. Guest tells qemu to unmap > >it. Better then DMA interface IMHO. > > That's what I thought too, but in a 32-bit guest using ~3.5GB of > RAM, where can you safely get 100MB of memory to full map the ROM? > If you're going to map chunks at a time, you are basically doing > DMA. It's boot time, so you can just map it over some existing RAM surely? Linuxboot.bin can work out where to map it so it won't be in any memory either being used or the target for the copy. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://et.redhat.com/~rjones/libguestfs/ See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 13:24 ` Richard W.M. Jones @ 2010-08-04 13:26 ` Gleb Natapov 2010-08-04 14:22 ` Anthony Liguori 2010-08-04 16:26 ` Avi Kivity 1 sibling, 1 reply; 151+ messages in thread From: Gleb Natapov @ 2010-08-04 13:26 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: qemu-devel, kvm, Avi Kivity, Gerd Hoffmann On Wed, Aug 04, 2010 at 02:24:08PM +0100, Richard W.M. Jones wrote: > On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote: > > On 08/04/2010 08:07 AM, Gleb Natapov wrote: > > >On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: > > >>On 08/04/2010 03:17 AM, Avi Kivity wrote: > > >>>For playing games, there are three options: > > >>>- existing fwcfg > > >>>- fwcfg+dma > > >>>- put roms in 4GB-2MB (or whatever we decide the flash size is) > > >>>and have the BIOS copy them > > >>> > > >>>Existing fwcfg is the least amount of work and probably > > >>>satisfactory for isapc. fwcfg+dma is IMO going off a tangent. > > >>>High memory flash is the most hardware-like solution, pretty easy > > >>>from a qemu point of view but requires more work. > > >> > > >>The only trouble I see is that high memory isn't always available. > > >>If it's a 32-bit PC and you've exhausted RAM space, then you're only > > >>left with the PCI hole and it's not clear to me if you can really > > >>pull out 100mb of space there as an option ROM without breaking > > >>something. > > >> > > >We can map it on demand. Guest tells qemu to map rom "A" to address X by > > >writing into some io port. Guest copies rom. Guest tells qemu to unmap > > >it. Better then DMA interface IMHO. > > > > That's what I thought too, but in a 32-bit guest using ~3.5GB of > > RAM, where can you safely get 100MB of memory to full map the ROM? > > If you're going to map chunks at a time, you are basically doing > > DMA. > > It's boot time, so you can just map it over some existing RAM surely? Not with current qemu. This is broken now. > Linuxboot.bin can work out where to map it so it won't be in any > memory either being used or the target for the copy. > -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 13:26 ` Gleb Natapov @ 2010-08-04 14:22 ` Anthony Liguori 2010-08-04 14:38 ` Gleb Natapov 0 siblings, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 14:22 UTC (permalink / raw) To: Gleb Natapov Cc: qemu-devel, Gerd Hoffmann, Richard W.M. Jones, kvm, Avi Kivity On 08/04/2010 08:26 AM, Gleb Natapov wrote: > On Wed, Aug 04, 2010 at 02:24:08PM +0100, Richard W.M. Jones wrote: > >> On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote: >> >>> On 08/04/2010 08:07 AM, Gleb Natapov wrote: >>> >>>> On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: >>>> >>>>> On 08/04/2010 03:17 AM, Avi Kivity wrote: >>>>> >>>>>> For playing games, there are three options: >>>>>> - existing fwcfg >>>>>> - fwcfg+dma >>>>>> - put roms in 4GB-2MB (or whatever we decide the flash size is) >>>>>> and have the BIOS copy them >>>>>> >>>>>> Existing fwcfg is the least amount of work and probably >>>>>> satisfactory for isapc. fwcfg+dma is IMO going off a tangent. >>>>>> High memory flash is the most hardware-like solution, pretty easy >>>>>> >>>>> >from a qemu point of view but requires more work. >>>>> >>>>> The only trouble I see is that high memory isn't always available. >>>>> If it's a 32-bit PC and you've exhausted RAM space, then you're only >>>>> left with the PCI hole and it's not clear to me if you can really >>>>> pull out 100mb of space there as an option ROM without breaking >>>>> something. >>>>> >>>>> >>>> We can map it on demand. Guest tells qemu to map rom "A" to address X by >>>> writing into some io port. Guest copies rom. Guest tells qemu to unmap >>>> it. Better then DMA interface IMHO. >>>> >>> That's what I thought too, but in a 32-bit guest using ~3.5GB of >>> RAM, where can you safely get 100MB of memory to full map the ROM? >>> If you're going to map chunks at a time, you are basically doing >>> DMA. >>> >> It's boot time, so you can just map it over some existing RAM surely? >> > Not with current qemu. This is broken now. > But even if it wasn't it can potentially create havoc. I think we currently believe that the northbridge likely never forwards RAM access to a device so this doesn't fit how hardware would work. More importantly, BIOSes and ROMs do very funny things with RAM. It's not unusual for a ROM to muck with the e820 map to allocate RAM for itself which means there's always the chance that we're going to walk over RAM being used for something else. Regards, Anthony Liguori >> Linuxboot.bin can work out where to map it so it won't be in any >> memory either being used or the target for the copy. >> >> > -- > Gleb. > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 14:22 ` Anthony Liguori @ 2010-08-04 14:38 ` Gleb Natapov 2010-08-04 14:50 ` Anthony Liguori 0 siblings, 1 reply; 151+ messages in thread From: Gleb Natapov @ 2010-08-04 14:38 UTC (permalink / raw) To: Anthony Liguori Cc: qemu-devel, Gerd Hoffmann, Richard W.M. Jones, kvm, Avi Kivity On Wed, Aug 04, 2010 at 09:22:22AM -0500, Anthony Liguori wrote: > On 08/04/2010 08:26 AM, Gleb Natapov wrote: > >On Wed, Aug 04, 2010 at 02:24:08PM +0100, Richard W.M. Jones wrote: > >>On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote: > >>>On 08/04/2010 08:07 AM, Gleb Natapov wrote: > >>>>On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: > >>>>>On 08/04/2010 03:17 AM, Avi Kivity wrote: > >>>>>>For playing games, there are three options: > >>>>>>- existing fwcfg > >>>>>>- fwcfg+dma > >>>>>>- put roms in 4GB-2MB (or whatever we decide the flash size is) > >>>>>>and have the BIOS copy them > >>>>>> > >>>>>>Existing fwcfg is the least amount of work and probably > >>>>>>satisfactory for isapc. fwcfg+dma is IMO going off a tangent. > >>>>>>High memory flash is the most hardware-like solution, pretty easy > >>>>>>from a qemu point of view but requires more work. > >>>>> > >>>>>The only trouble I see is that high memory isn't always available. > >>>>>If it's a 32-bit PC and you've exhausted RAM space, then you're only > >>>>>left with the PCI hole and it's not clear to me if you can really > >>>>>pull out 100mb of space there as an option ROM without breaking > >>>>>something. > >>>>> > >>>>We can map it on demand. Guest tells qemu to map rom "A" to address X by > >>>>writing into some io port. Guest copies rom. Guest tells qemu to unmap > >>>>it. Better then DMA interface IMHO. > >>>That's what I thought too, but in a 32-bit guest using ~3.5GB of > >>>RAM, where can you safely get 100MB of memory to full map the ROM? > >>>If you're going to map chunks at a time, you are basically doing > >>>DMA. > >>It's boot time, so you can just map it over some existing RAM surely? > >Not with current qemu. This is broken now. > > But even if it wasn't it can potentially create havoc. I think we > currently believe that the northbridge likely never forwards RAM > access to a device so this doesn't fit how hardware would work. > Good point. > More importantly, BIOSes and ROMs do very funny things with RAM. > It's not unusual for a ROM to muck with the e820 map to allocate RAM > for itself which means there's always the chance that we're going to > walk over RAM being used for something else. > ROM does not muck with the e820. It uses PMM to allocate memory and the memory it gets is marked as reserved in e820 map. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 14:38 ` Gleb Natapov @ 2010-08-04 14:50 ` Anthony Liguori 2010-08-04 15:01 ` Gleb Natapov 0 siblings, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 14:50 UTC (permalink / raw) To: Gleb Natapov Cc: qemu-devel, Gerd Hoffmann, Richard W.M. Jones, kvm, Avi Kivity On 08/04/2010 09:38 AM, Gleb Natapov wrote: >> >> But even if it wasn't it can potentially create havoc. I think we >> currently believe that the northbridge likely never forwards RAM >> access to a device so this doesn't fit how hardware would work. >> >> > Good point. > > >> More importantly, BIOSes and ROMs do very funny things with RAM. >> It's not unusual for a ROM to muck with the e820 map to allocate RAM >> for itself which means there's always the chance that we're going to >> walk over RAM being used for something else. >> >> > ROM does not muck with the e820. It uses PMM to allocate memory and the > memory it gets is marked as reserved in e820 map. > PMM allocations are only valid during the init function's execution. It's intention is to enable the use of scratch memory to decompress or otherwise modify the ROM to shrink its size. If a ROM needs memory after the init function, it needs to use the traditional tricks to allocate long term memory and the most popular one is modifying the e820 tables. See src/arch/i386/firmware/pcbios/e820mangler.S in gPXE. Regards, Anthony Liguori > -- > Gleb. > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 14:50 ` Anthony Liguori @ 2010-08-04 15:01 ` Gleb Natapov 2010-08-04 15:07 ` Anthony Liguori 2010-08-04 22:41 ` Kevin O'Connor 0 siblings, 2 replies; 151+ messages in thread From: Gleb Natapov @ 2010-08-04 15:01 UTC (permalink / raw) To: Anthony Liguori Cc: qemu-devel, Gerd Hoffmann, Richard W.M. Jones, kvm, Avi Kivity On Wed, Aug 04, 2010 at 09:50:55AM -0500, Anthony Liguori wrote: > On 08/04/2010 09:38 AM, Gleb Natapov wrote: > >> > >>But even if it wasn't it can potentially create havoc. I think we > >>currently believe that the northbridge likely never forwards RAM > >>access to a device so this doesn't fit how hardware would work. > >> > >Good point. > > > >>More importantly, BIOSes and ROMs do very funny things with RAM. > >>It's not unusual for a ROM to muck with the e820 map to allocate RAM > >>for itself which means there's always the chance that we're going to > >>walk over RAM being used for something else. > >> > >ROM does not muck with the e820. It uses PMM to allocate memory and the > >memory it gets is marked as reserved in e820 map. > > PMM allocations are only valid during the init function's execution. > It's intention is to enable the use of scratch memory to decompress > or otherwise modify the ROM to shrink its size. > Hm, may be. I read seabios code differently, but may be I misread it. > If a ROM needs memory after the init function, it needs to use the > traditional tricks to allocate long term memory and the most popular > one is modifying the e820 tables. > e820 has no in memory format, > See src/arch/i386/firmware/pcbios/e820mangler.S in gPXE. so this ugly code intercepts int15 and mangle result. OMG. How this can even work if more then two ROMs want to do that? -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 15:01 ` Gleb Natapov @ 2010-08-04 15:07 ` Anthony Liguori 2010-08-04 15:15 ` Gleb Natapov 2010-08-04 22:41 ` Kevin O'Connor 1 sibling, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 15:07 UTC (permalink / raw) To: Gleb Natapov Cc: qemu-devel, Gerd Hoffmann, Richard W.M. Jones, kvm, Avi Kivity On 08/04/2010 10:01 AM, Gleb Natapov wrote: > > Hm, may be. I read seabios code differently, but may be I misread it. > The BIOS Boot Specification spells it all out pretty clearly. >> If a ROM needs memory after the init function, it needs to use the >> traditional tricks to allocate long term memory and the most popular >> one is modifying the e820 tables. >> >> > e820 has no in memory format, > Indeed. >> See src/arch/i386/firmware/pcbios/e820mangler.S in gPXE. >> > so this ugly code intercepts int15 and mangle result. OMG. How this can > even work if more then two ROMs want to do that? > You have to save the old handlers and invoke them. Where do you save the old handlers? There's tricks you can do by trying to use some unused vectors and also temporarily using the stack. But basically, yeah, I'm amazed every time I see a PC boot that it all actually works :-) Regards, Anthony Liguori > -- > Gleb. > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 15:07 ` Anthony Liguori @ 2010-08-04 15:15 ` Gleb Natapov 0 siblings, 0 replies; 151+ messages in thread From: Gleb Natapov @ 2010-08-04 15:15 UTC (permalink / raw) To: Anthony Liguori Cc: qemu-devel, Gerd Hoffmann, Richard W.M. Jones, kvm, Avi Kivity On Wed, Aug 04, 2010 at 10:07:24AM -0500, Anthony Liguori wrote: > On 08/04/2010 10:01 AM, Gleb Natapov wrote: > > > >Hm, may be. I read seabios code differently, but may be I misread it. > > The BIOS Boot Specification spells it all out pretty clearly. > I have the spec. Isn't this enough to be an expert? Or do you mean I should read it too? > >>If a ROM needs memory after the init function, it needs to use the > >>traditional tricks to allocate long term memory and the most popular > >>one is modifying the e820 tables. > >> > >e820 has no in memory format, > > Indeed. > > >>See src/arch/i386/firmware/pcbios/e820mangler.S in gPXE. > >so this ugly code intercepts int15 and mangle result. OMG. How this can > >even work if more then two ROMs want to do that? > > You have to save the old handlers and invoke them. Where do you > save the old handlers? There's tricks you can do by trying to use > some unused vectors and also temporarily using the stack. > > But basically, yeah, I'm amazed every time I see a PC boot that it > all actually works :-) > Heh. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 15:01 ` Gleb Natapov 2010-08-04 15:07 ` Anthony Liguori @ 2010-08-04 22:41 ` Kevin O'Connor 1 sibling, 0 replies; 151+ messages in thread From: Kevin O'Connor @ 2010-08-04 22:41 UTC (permalink / raw) To: Gleb Natapov Cc: kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann, Avi Kivity On Wed, Aug 04, 2010 at 06:01:54PM +0300, Gleb Natapov wrote: > On Wed, Aug 04, 2010 at 09:50:55AM -0500, Anthony Liguori wrote: > > On 08/04/2010 09:38 AM, Gleb Natapov wrote: > > >ROM does not muck with the e820. It uses PMM to allocate memory and the > > >memory it gets is marked as reserved in e820 map. Every ROM is implemented differently - there's no way to really know what they'll do. > > PMM allocations are only valid during the init function's execution. > > It's intention is to enable the use of scratch memory to decompress > > or otherwise modify the ROM to shrink its size. > > > Hm, may be. I read seabios code differently, but may be I misread it. There is a PCIv3 extension to PMM which supports long term memory allocations. SeaBIOS does implement this. The base PMM spec though only supports memory allocations during the POST phase. -Kevin ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 13:24 ` Richard W.M. Jones 2010-08-04 13:26 ` Gleb Natapov @ 2010-08-04 16:26 ` Avi Kivity 1 sibling, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 16:26 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: qemu-devel, kvm, Gleb Natapov, Gerd Hoffmann On 08/04/2010 04:24 PM, Richard W.M. Jones wrote: > > It's boot time, so you can just map it over some existing RAM surely? > Linuxboot.bin can work out where to map it so it won't be in any > memory either being used or the target for the copy. There's no such thing as boot time from the host's point of view. There are interfaces and they should work whatever the guest is doing right now. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 13:15 ` Anthony Liguori 2010-08-04 13:24 ` Richard W.M. Jones @ 2010-08-04 13:34 ` Gleb Natapov 2010-08-04 13:52 ` Anthony Liguori 1 sibling, 1 reply; 151+ messages in thread From: Gleb Natapov @ 2010-08-04 13:34 UTC (permalink / raw) To: Anthony Liguori Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, kvm, Gerd Hoffmann On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote: > On 08/04/2010 08:07 AM, Gleb Natapov wrote: > >On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: > >>On 08/04/2010 03:17 AM, Avi Kivity wrote: > >>>For playing games, there are three options: > >>>- existing fwcfg > >>>- fwcfg+dma > >>>- put roms in 4GB-2MB (or whatever we decide the flash size is) > >>>and have the BIOS copy them > >>> > >>>Existing fwcfg is the least amount of work and probably > >>>satisfactory for isapc. fwcfg+dma is IMO going off a tangent. > >>>High memory flash is the most hardware-like solution, pretty easy > >>>from a qemu point of view but requires more work. > >> > >>The only trouble I see is that high memory isn't always available. > >>If it's a 32-bit PC and you've exhausted RAM space, then you're only > >>left with the PCI hole and it's not clear to me if you can really > >>pull out 100mb of space there as an option ROM without breaking > >>something. > >> > >We can map it on demand. Guest tells qemu to map rom "A" to address X by > >writing into some io port. Guest copies rom. Guest tells qemu to unmap > >it. Better then DMA interface IMHO. > > That's what I thought too, but in a 32-bit guest using ~3.5GB of > RAM, where can you safely get 100MB of memory to full map the ROM? > If you're going to map chunks at a time, you are basically doing > DMA. > This is not like DMA event if done in chunks and chunks can be pretty big. The code that dials with copying may temporary unmap some pci devices to have more space there. > And what's the upper limit on ROM size that we impose? 100MB is > already at the ridiculously large size. > Agree. We have two solutions: 1. Avoid the problem 2. Fix the problem. Both are fine with me and I prefer 1, but if we are going with 2 I prefer something sane. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 13:34 ` Gleb Natapov @ 2010-08-04 13:52 ` Anthony Liguori 2010-08-04 14:00 ` Gleb Natapov 2010-08-04 16:30 ` Avi Kivity 0 siblings, 2 replies; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 13:52 UTC (permalink / raw) To: Gleb Natapov Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, kvm, Gerd Hoffmann On 08/04/2010 08:34 AM, Gleb Natapov wrote: > On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote: > >> On 08/04/2010 08:07 AM, Gleb Natapov wrote: >> >>> On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: >>> >>>> On 08/04/2010 03:17 AM, Avi Kivity wrote: >>>> >>>>> For playing games, there are three options: >>>>> - existing fwcfg >>>>> - fwcfg+dma >>>>> - put roms in 4GB-2MB (or whatever we decide the flash size is) >>>>> and have the BIOS copy them >>>>> >>>>> Existing fwcfg is the least amount of work and probably >>>>> satisfactory for isapc. fwcfg+dma is IMO going off a tangent. >>>>> High memory flash is the most hardware-like solution, pretty easy >>>>> >>>> >from a qemu point of view but requires more work. >>>> >>>> The only trouble I see is that high memory isn't always available. >>>> If it's a 32-bit PC and you've exhausted RAM space, then you're only >>>> left with the PCI hole and it's not clear to me if you can really >>>> pull out 100mb of space there as an option ROM without breaking >>>> something. >>>> >>>> >>> We can map it on demand. Guest tells qemu to map rom "A" to address X by >>> writing into some io port. Guest copies rom. Guest tells qemu to unmap >>> it. Better then DMA interface IMHO. >>> >> That's what I thought too, but in a 32-bit guest using ~3.5GB of >> RAM, where can you safely get 100MB of memory to full map the ROM? >> If you're going to map chunks at a time, you are basically doing >> DMA. >> >> > This is not like DMA event if done in chunks and chunks can be pretty > big. The code that dials with copying may temporary unmap some pci > devices to have more space there. > That's a bit complicated because SeaBIOS is managing the PCI devices whereas the kernel code is running as an option rom. I don't know the BIOS PCI interfaces well so I don't know how doable this is. Maybe we're just being too fancy here. We could rewrite -kernel/-append/-initrd to just generate a floppy image in RAM, and just boot from floppy. Regards, Anthony Liguori > >> And what's the upper limit on ROM size that we impose? 100MB is >> already at the ridiculously large size. >> >> > Agree. We have two solutions: > 1. Avoid the problem > 2. Fix the problem. > > Both are fine with me and I prefer 1, but if we are going with 2 I > prefer something sane. > > -- > Gleb. > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 13:52 ` Anthony Liguori @ 2010-08-04 14:00 ` Gleb Natapov 2010-08-04 14:14 ` Anthony Liguori 2010-08-04 14:22 ` Paolo Bonzini 2010-08-04 16:30 ` Avi Kivity 1 sibling, 2 replies; 151+ messages in thread From: Gleb Natapov @ 2010-08-04 14:00 UTC (permalink / raw) To: Anthony Liguori Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, kvm, Gerd Hoffmann On Wed, Aug 04, 2010 at 08:52:44AM -0500, Anthony Liguori wrote: > On 08/04/2010 08:34 AM, Gleb Natapov wrote: > >On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote: > >>On 08/04/2010 08:07 AM, Gleb Natapov wrote: > >>>On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: > >>>>On 08/04/2010 03:17 AM, Avi Kivity wrote: > >>>>>For playing games, there are three options: > >>>>>- existing fwcfg > >>>>>- fwcfg+dma > >>>>>- put roms in 4GB-2MB (or whatever we decide the flash size is) > >>>>>and have the BIOS copy them > >>>>> > >>>>>Existing fwcfg is the least amount of work and probably > >>>>>satisfactory for isapc. fwcfg+dma is IMO going off a tangent. > >>>>>High memory flash is the most hardware-like solution, pretty easy > >>>>>from a qemu point of view but requires more work. > >>>> > >>>>The only trouble I see is that high memory isn't always available. > >>>>If it's a 32-bit PC and you've exhausted RAM space, then you're only > >>>>left with the PCI hole and it's not clear to me if you can really > >>>>pull out 100mb of space there as an option ROM without breaking > >>>>something. > >>>> > >>>We can map it on demand. Guest tells qemu to map rom "A" to address X by > >>>writing into some io port. Guest copies rom. Guest tells qemu to unmap > >>>it. Better then DMA interface IMHO. > >>That's what I thought too, but in a 32-bit guest using ~3.5GB of > >>RAM, where can you safely get 100MB of memory to full map the ROM? > >>If you're going to map chunks at a time, you are basically doing > >>DMA. > >> > >This is not like DMA event if done in chunks and chunks can be pretty > >big. The code that dials with copying may temporary unmap some pci > >devices to have more space there. > > That's a bit complicated because SeaBIOS is managing the PCI devices > whereas the kernel code is running as an option rom. I don't know > the BIOS PCI interfaces well so I don't know how doable this is. > Unmapping device and mapping it at the same place is easy. Enumerating pci devices from multiboot.bin looks like unneeded churn though. > Maybe we're just being too fancy here. > > We could rewrite -kernel/-append/-initrd to just generate a floppy > image in RAM, and just boot from floppy. > May be. Can floppy be 100M? -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 14:00 ` Gleb Natapov @ 2010-08-04 14:14 ` Anthony Liguori 2010-08-04 14:36 ` Gleb Natapov 2010-08-04 14:22 ` Paolo Bonzini 1 sibling, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 14:14 UTC (permalink / raw) To: Gleb Natapov Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, kvm, Gerd Hoffmann On 08/04/2010 09:00 AM, Gleb Natapov wrote: > On Wed, Aug 04, 2010 at 08:52:44AM -0500, Anthony Liguori wrote: > >> On 08/04/2010 08:34 AM, Gleb Natapov wrote: >> >>> On Wed, Aug 04, 2010 at 08:15:04AM -0500, Anthony Liguori wrote: >>> >>>> On 08/04/2010 08:07 AM, Gleb Natapov wrote: >>>> >>>>> On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: >>>>> >>>>>> On 08/04/2010 03:17 AM, Avi Kivity wrote: >>>>>> >>>>>>> For playing games, there are three options: >>>>>>> - existing fwcfg >>>>>>> - fwcfg+dma >>>>>>> - put roms in 4GB-2MB (or whatever we decide the flash size is) >>>>>>> and have the BIOS copy them >>>>>>> >>>>>>> Existing fwcfg is the least amount of work and probably >>>>>>> satisfactory for isapc. fwcfg+dma is IMO going off a tangent. >>>>>>> High memory flash is the most hardware-like solution, pretty easy >>>>>>> >>>>>> >from a qemu point of view but requires more work. >>>>>> >>>>>> The only trouble I see is that high memory isn't always available. >>>>>> If it's a 32-bit PC and you've exhausted RAM space, then you're only >>>>>> left with the PCI hole and it's not clear to me if you can really >>>>>> pull out 100mb of space there as an option ROM without breaking >>>>>> something. >>>>>> >>>>>> >>>>> We can map it on demand. Guest tells qemu to map rom "A" to address X by >>>>> writing into some io port. Guest copies rom. Guest tells qemu to unmap >>>>> it. Better then DMA interface IMHO. >>>>> >>>> That's what I thought too, but in a 32-bit guest using ~3.5GB of >>>> RAM, where can you safely get 100MB of memory to full map the ROM? >>>> If you're going to map chunks at a time, you are basically doing >>>> DMA. >>>> >>>> >>> This is not like DMA event if done in chunks and chunks can be pretty >>> big. The code that dials with copying may temporary unmap some pci >>> devices to have more space there. >>> >> That's a bit complicated because SeaBIOS is managing the PCI devices >> whereas the kernel code is running as an option rom. I don't know >> the BIOS PCI interfaces well so I don't know how doable this is. >> >> > Unmapping device and mapping it at the same place is easy. Enumerating > pci devices from multiboot.bin looks like unneeded churn though. > > >> Maybe we're just being too fancy here. >> >> We could rewrite -kernel/-append/-initrd to just generate a floppy >> image in RAM, and just boot from floppy. >> >> > May be. Can floppy be 100M? > No, I forgot just how small they are. R/O usb mass storage device? CDROM? I'm beginning thing that loading such a large initrd through fwcfg is simply a dead end. Regards, Anthony Liguori > -- > Gleb. > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 14:14 ` Anthony Liguori @ 2010-08-04 14:36 ` Gleb Natapov 0 siblings, 0 replies; 151+ messages in thread From: Gleb Natapov @ 2010-08-04 14:36 UTC (permalink / raw) To: Anthony Liguori Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, kvm, Gerd Hoffmann On Wed, Aug 04, 2010 at 09:14:01AM -0500, Anthony Liguori wrote: > >Unmapping device and mapping it at the same place is easy. Enumerating > >pci devices from multiboot.bin looks like unneeded churn though. > > > >>Maybe we're just being too fancy here. > >> > >>We could rewrite -kernel/-append/-initrd to just generate a floppy > >>image in RAM, and just boot from floppy. > >> > >May be. Can floppy be 100M? > > No, I forgot just how small they are. R/O usb mass storage device? > CDROM? I'm beginning thing that loading such a large initrd through > fwcfg is simply a dead end. > Well, libguestfs can use CDROM by itself to begin with. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 14:00 ` Gleb Natapov 2010-08-04 14:14 ` Anthony Liguori @ 2010-08-04 14:22 ` Paolo Bonzini 2010-08-04 14:39 ` Anthony Liguori 1 sibling, 1 reply; 151+ messages in thread From: Paolo Bonzini @ 2010-08-04 14:22 UTC (permalink / raw) To: Gleb Natapov Cc: kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, Gerd Hoffmann On 08/04/2010 04:00 PM, Gleb Natapov wrote: >> Maybe we're just being too fancy here. >> >> We could rewrite -kernel/-append/-initrd to just generate a floppy >> image in RAM, and just boot from floppy. >> > May be. Can floppy be 100M? Well, in theory you can have 16384 bytes/sector, 256 tracks, 255 sectors, 2 heads... that makes 2^(14+8+8+1) = 2 GB. :) Not sure the BIOS would read such a beast, or SYSLINUX. By the way, if libguestfs insists for an initrd rather than a CDROM image, it could do something in between and make an ISO image with ISOLINUX and the required kernel/initrd pair. (By the way, a network installation image for a typical distribution has a 120M initrd, so it's not just libguestfs. It is very useful to pass the network installation images directly to qemu via -kernel/-initrd). Paolo ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 14:22 ` Paolo Bonzini @ 2010-08-04 14:39 ` Anthony Liguori 2010-08-04 16:33 ` Avi Kivity 0 siblings, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 14:39 UTC (permalink / raw) To: Paolo Bonzini Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, Gerd Hoffmann On 08/04/2010 09:22 AM, Paolo Bonzini wrote: > On 08/04/2010 04:00 PM, Gleb Natapov wrote: >>> Maybe we're just being too fancy here. >>> >>> We could rewrite -kernel/-append/-initrd to just generate a floppy >>> image in RAM, and just boot from floppy. >>> >> May be. Can floppy be 100M? > > Well, in theory you can have 16384 bytes/sector, 256 tracks, 255 > sectors, 2 heads... that makes 2^(14+8+8+1) = 2 GB. :) Not sure the > BIOS would read such a beast, or SYSLINUX. > > By the way, if libguestfs insists for an initrd rather than a CDROM > image, it could do something in between and make an ISO image with > ISOLINUX and the required kernel/initrd pair. > > (By the way, a network installation image for a typical distribution > has a 120M initrd, so it's not just libguestfs. It is very useful to > pass the network installation images directly to qemu via > -kernel/-initrd). We could make kernel an awful lot smarter but unless we've got someone just itching to write 16-bit option rom code, I think our best bet is to try to leverage a standard bootloader and expose a disk containing the kernel/initrd. Otherwise, we just stick with what we have and deal with the performance as is. Regards, Anthony Liguori > > Paolo ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 14:39 ` Anthony Liguori @ 2010-08-04 16:33 ` Avi Kivity 0 siblings, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 16:33 UTC (permalink / raw) To: Anthony Liguori Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann, Paolo Bonzini On 08/04/2010 05:39 PM, Anthony Liguori wrote: > > We could make kernel an awful lot smarter but unless we've got someone > just itching to write 16-bit option rom code, I think our best bet is > to try to leverage a standard bootloader and expose a disk containing > the kernel/initrd. > A problem with that is that the booted kernel would see that disk and try to do something with it. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 13:52 ` Anthony Liguori 2010-08-04 14:00 ` Gleb Natapov @ 2010-08-04 16:30 ` Avi Kivity 2010-08-04 16:36 ` Avi Kivity 2010-08-04 16:42 ` Anthony Liguori 1 sibling, 2 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 16:30 UTC (permalink / raw) To: Anthony Liguori Cc: qemu-devel, kvm, Gerd Hoffmann, Gleb Natapov, Richard W.M. Jones On 08/04/2010 04:52 PM, Anthony Liguori wrote: >>> >> This is not like DMA event if done in chunks and chunks can be pretty >> big. The code that dials with copying may temporary unmap some pci >> devices to have more space there. > > > That's a bit complicated because SeaBIOS is managing the PCI devices > whereas the kernel code is running as an option rom. I don't know the > BIOS PCI interfaces well so I don't know how doable this is. > > Maybe we're just being too fancy here. > > We could rewrite -kernel/-append/-initrd to just generate a floppy > image in RAM, and just boot from floppy. How could this work? the RAM belongs to SeaBIOS immediately after reset, it would just scribble over it. Or worse, not scribble on it until some date in the future. -kernel data has to find its way to memory after the bios gives control to some optionrom. An alternative would be to embed knowledge of -kernel in seabios, but I don't think it's a good one. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:30 ` Avi Kivity @ 2010-08-04 16:36 ` Avi Kivity 2010-08-04 16:44 ` Anthony Liguori ` (2 more replies) 2010-08-04 16:42 ` Anthony Liguori 1 sibling, 3 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 16:36 UTC (permalink / raw) To: Anthony Liguori Cc: qemu-devel, kvm, Gerd Hoffmann, Gleb Natapov, Richard W.M. Jones On 08/04/2010 07:30 PM, Avi Kivity wrote: > On 08/04/2010 04:52 PM, Anthony Liguori wrote: >>>> >>> This is not like DMA event if done in chunks and chunks can be pretty >>> big. The code that dials with copying may temporary unmap some pci >>> devices to have more space there. >> >> >> That's a bit complicated because SeaBIOS is managing the PCI devices >> whereas the kernel code is running as an option rom. I don't know >> the BIOS PCI interfaces well so I don't know how doable this is. >> >> Maybe we're just being too fancy here. >> >> We could rewrite -kernel/-append/-initrd to just generate a floppy >> image in RAM, and just boot from floppy. > > How could this work? the RAM belongs to SeaBIOS immediately after > reset, it would just scribble over it. Or worse, not scribble on it > until some date in the future. > > -kernel data has to find its way to memory after the bios gives > control to some optionrom. An alternative would be to embed knowledge > of -kernel in seabios, but I don't think it's a good one. > Oh, you meant host RAM, not guest RAM. Disregard. This is basically my suggestion to libguestfs: instead of generating an initrd, generate a bootable cdrom, and boot from that. The result is faster and has a smaller memory footprint. Everyone wins. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:36 ` Avi Kivity @ 2010-08-04 16:44 ` Anthony Liguori 2010-08-04 16:52 ` Avi Kivity ` (2 more replies) 2010-08-04 16:45 ` Alexander Graf 2010-08-04 17:46 ` Richard W.M. Jones 2 siblings, 3 replies; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 16:44 UTC (permalink / raw) To: Avi Kivity Cc: qemu-devel, kvm, Gerd Hoffmann, Gleb Natapov, Richard W.M. Jones On 08/04/2010 11:36 AM, Avi Kivity wrote: > On 08/04/2010 07:30 PM, Avi Kivity wrote: >> On 08/04/2010 04:52 PM, Anthony Liguori wrote: >>>>> >>>> This is not like DMA event if done in chunks and chunks can be pretty >>>> big. The code that dials with copying may temporary unmap some pci >>>> devices to have more space there. >>> >>> >>> That's a bit complicated because SeaBIOS is managing the PCI devices >>> whereas the kernel code is running as an option rom. I don't know >>> the BIOS PCI interfaces well so I don't know how doable this is. >>> >>> Maybe we're just being too fancy here. >>> >>> We could rewrite -kernel/-append/-initrd to just generate a floppy >>> image in RAM, and just boot from floppy. >> >> How could this work? the RAM belongs to SeaBIOS immediately after >> reset, it would just scribble over it. Or worse, not scribble on it >> until some date in the future. >> >> -kernel data has to find its way to memory after the bios gives >> control to some optionrom. An alternative would be to embed >> knowledge of -kernel in seabios, but I don't think it's a good one. >> > > Oh, you meant host RAM, not guest RAM. Disregard. > > This is basically my suggestion to libguestfs: instead of generating > an initrd, generate a bootable cdrom, and boot from that. The result > is faster and has a smaller memory footprint. Everyone wins. Yeah, but we could also do that entirely in QEMU. If that's what we suggest doing, there's no reason not to do it instead of the option rom trickery that we do today. The option rom stuff has a number of short comings. Because we hijack int19, extboot doesn't get to run. That means that if you use -kernel to load a grub (the Ubuntu guys for their own absurd reasons) then grub does not see extboot backed disks. The solution for them is the same, generate a proper disk and boot from that disk. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:44 ` Anthony Liguori @ 2010-08-04 16:52 ` Avi Kivity 2010-08-04 17:37 ` Gleb Natapov 2010-08-05 7:28 ` Gerd Hoffmann 2 siblings, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 16:52 UTC (permalink / raw) To: Anthony Liguori Cc: qemu-devel, kvm, Gerd Hoffmann, Gleb Natapov, Richard W.M. Jones On 08/04/2010 07:44 PM, Anthony Liguori wrote: > > The option rom stuff has a number of short comings. Because we hijack > int19, extboot doesn't get to run. That means that if you use -kernel > to load a grub (the Ubuntu guys for their own absurd reasons) then > grub does not see extboot backed disks. The solution for them is the > same, generate a proper disk and boot from that disk. > Let's print it out and hand out leaflets at the upcoming kvm forum. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:44 ` Anthony Liguori 2010-08-04 16:52 ` Avi Kivity @ 2010-08-04 17:37 ` Gleb Natapov 2010-08-05 7:28 ` Gerd Hoffmann 2 siblings, 0 replies; 151+ messages in thread From: Gleb Natapov @ 2010-08-04 17:37 UTC (permalink / raw) To: Anthony Liguori Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, kvm, Gerd Hoffmann On Wed, Aug 04, 2010 at 11:44:33AM -0500, Anthony Liguori wrote: > On 08/04/2010 11:36 AM, Avi Kivity wrote: > > On 08/04/2010 07:30 PM, Avi Kivity wrote: > >> On 08/04/2010 04:52 PM, Anthony Liguori wrote: > >>>>> > >>>>This is not like DMA event if done in chunks and chunks can be pretty > >>>>big. The code that dials with copying may temporary unmap some pci > >>>>devices to have more space there. > >>> > >>> > >>>That's a bit complicated because SeaBIOS is managing the PCI > >>>devices whereas the kernel code is running as an option rom. > >>>I don't know the BIOS PCI interfaces well so I don't know how > >>>doable this is. > >>> > >>>Maybe we're just being too fancy here. > >>> > >>>We could rewrite -kernel/-append/-initrd to just generate a > >>>floppy image in RAM, and just boot from floppy. > >> > >>How could this work? the RAM belongs to SeaBIOS immediately > >>after reset, it would just scribble over it. Or worse, not > >>scribble on it until some date in the future. > >> > >>-kernel data has to find its way to memory after the bios gives > >>control to some optionrom. An alternative would be to embed > >>knowledge of -kernel in seabios, but I don't think it's a good > >>one. > >> > > > >Oh, you meant host RAM, not guest RAM. Disregard. > > > >This is basically my suggestion to libguestfs: instead of > >generating an initrd, generate a bootable cdrom, and boot from > >that. The result is faster and has a smaller memory footprint. > >Everyone wins. > > Yeah, but we could also do that entirely in QEMU. If that's what we > suggest doing, there's no reason not to do it instead of the option > rom trickery that we do today. > > The option rom stuff has a number of short comings. Because we > hijack int19, extboot doesn't get to run. That means that if you > use -kernel to load a grub (the Ubuntu guys for their own absurd > reasons) then grub does not see extboot backed disks. The solution > for them is the same, generate a proper disk and boot from that > disk. > Extboot is not so relevant any more. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:44 ` Anthony Liguori 2010-08-04 16:52 ` Avi Kivity 2010-08-04 17:37 ` Gleb Natapov @ 2010-08-05 7:28 ` Gerd Hoffmann 2010-08-05 7:34 ` Gleb Natapov 2010-08-05 13:43 ` Anthony Liguori 2 siblings, 2 replies; 151+ messages in thread From: Gerd Hoffmann @ 2010-08-05 7:28 UTC (permalink / raw) To: Anthony Liguori Cc: qemu-devel, kvm, Avi Kivity, Gleb Natapov, Richard W.M. Jones Hi, > The option rom stuff has a number of short comings. Because we hijack > int19, extboot doesn't get to run. That means that if you use -kernel to > load a grub (the Ubuntu guys for their own absurd reasons) then grub > does not see extboot backed disks. The solution for them is the same, > generate a proper disk and boot from that disk. Oh, having extboot + linuxboot + multiboot register a BEV (correct acronym?) entry instead of hijacking int19 would fix that too. Additional bonus will be that they are selectable in the boot menu. cheers, Gerd ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-05 7:28 ` Gerd Hoffmann @ 2010-08-05 7:34 ` Gleb Natapov 2010-08-05 7:56 ` Avi Kivity 2010-08-05 13:43 ` Anthony Liguori 1 sibling, 1 reply; 151+ messages in thread From: Gleb Natapov @ 2010-08-05 7:34 UTC (permalink / raw) To: Gerd Hoffmann; +Cc: qemu-devel, kvm, Avi Kivity, Richard W.M. Jones On Thu, Aug 05, 2010 at 09:28:57AM +0200, Gerd Hoffmann wrote: > Hi, > > >The option rom stuff has a number of short comings. Because we hijack > >int19, extboot doesn't get to run. That means that if you use -kernel to > >load a grub (the Ubuntu guys for their own absurd reasons) then grub > >does not see extboot backed disks. The solution for them is the same, > >generate a proper disk and boot from that disk. > > Oh, having extboot + linuxboot + multiboot register a BEV (correct > acronym?) entry instead of hijacking int19 would fix that too. > Additional bonus will be that they are selectable in the boot menu. > Good idea except that we are not good at communicating to seabios where do we want to boot from by default. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-05 7:34 ` Gleb Natapov @ 2010-08-05 7:56 ` Avi Kivity 2010-08-05 7:59 ` Gleb Natapov 0 siblings, 1 reply; 151+ messages in thread From: Avi Kivity @ 2010-08-05 7:56 UTC (permalink / raw) To: Gleb Natapov; +Cc: qemu-devel, kvm, Gerd Hoffmann, Richard W.M. Jones On 08/05/2010 10:34 AM, Gleb Natapov wrote: > On Thu, Aug 05, 2010 at 09:28:57AM +0200, Gerd Hoffmann wrote: >> Hi, >> >>> The option rom stuff has a number of short comings. Because we hijack >>> int19, extboot doesn't get to run. That means that if you use -kernel to >>> load a grub (the Ubuntu guys for their own absurd reasons) then grub >>> does not see extboot backed disks. The solution for them is the same, >>> generate a proper disk and boot from that disk. >> Oh, having extboot + linuxboot + multiboot register a BEV (correct >> acronym?) entry instead of hijacking int19 would fix that too. >> Additional bonus will be that they are selectable in the boot menu. >> > Good idea except that we are not good at communicating to seabios where > do we want to boot from by default. We have the firmware configuration interface for that, if we can tolerate its speed. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-05 7:56 ` Avi Kivity @ 2010-08-05 7:59 ` Gleb Natapov 2010-08-05 8:45 ` Avi Kivity 0 siblings, 1 reply; 151+ messages in thread From: Gleb Natapov @ 2010-08-05 7:59 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm, Gerd Hoffmann, Richard W.M. Jones On Thu, Aug 05, 2010 at 10:56:52AM +0300, Avi Kivity wrote: > On 08/05/2010 10:34 AM, Gleb Natapov wrote: > >On Thu, Aug 05, 2010 at 09:28:57AM +0200, Gerd Hoffmann wrote: > >> Hi, > >> > >>>The option rom stuff has a number of short comings. Because we hijack > >>>int19, extboot doesn't get to run. That means that if you use -kernel to > >>>load a grub (the Ubuntu guys for their own absurd reasons) then grub > >>>does not see extboot backed disks. The solution for them is the same, > >>>generate a proper disk and boot from that disk. > >>Oh, having extboot + linuxboot + multiboot register a BEV (correct > >>acronym?) entry instead of hijacking int19 would fix that too. > >>Additional bonus will be that they are selectable in the boot menu. > >> > >Good idea except that we are not good at communicating to seabios where > >do we want to boot from by default. > > We have the firmware configuration interface for that, if we can > tolerate its speed. > To pass default boot device, sure :) The question is what to pass so that seabios will be able to unambiguously determine what device to boot from. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-05 7:59 ` Gleb Natapov @ 2010-08-05 8:45 ` Avi Kivity 2010-08-05 8:48 ` Gleb Natapov 0 siblings, 1 reply; 151+ messages in thread From: Avi Kivity @ 2010-08-05 8:45 UTC (permalink / raw) To: Gleb Natapov; +Cc: qemu-devel, kvm, Gerd Hoffmann, Richard W.M. Jones On 08/05/2010 10:59 AM, Gleb Natapov wrote: > >> We have the firmware configuration interface for that, if we can >> tolerate its speed. >> > To pass default boot device, sure :) The question is what to pass so > that seabios will be able to unambiguously determine what device to > boot from. IMO seabios should (if it doesn't already) store this information in CMOS non-volatile memory (which can be backed by a small disk image). This allows the user to play with the configuration at boot time, and if we document the format, management tools can read and write it as well. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-05 8:45 ` Avi Kivity @ 2010-08-05 8:48 ` Gleb Natapov 0 siblings, 0 replies; 151+ messages in thread From: Gleb Natapov @ 2010-08-05 8:48 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm, Gerd Hoffmann, Richard W.M. Jones On Thu, Aug 05, 2010 at 11:45:33AM +0300, Avi Kivity wrote: > On 08/05/2010 10:59 AM, Gleb Natapov wrote: > > > >>We have the firmware configuration interface for that, if we can > >>tolerate its speed. > >> > >To pass default boot device, sure :) The question is what to pass so > >that seabios will be able to unambiguously determine what device to > >boot from. > > IMO seabios should (if it doesn't already) store this information in > CMOS non-volatile memory (which can be backed by a small disk > image). This allows the user to play with the configuration at boot > time, and if we document the format, management tools can read and > write it as well. > The important part is to find the way to unambiguously pass default boot device between seabios/qemu/management. Afterward we can do many things with it. Pass it on command line, save it in external disk image etc. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-05 7:28 ` Gerd Hoffmann 2010-08-05 7:34 ` Gleb Natapov @ 2010-08-05 13:43 ` Anthony Liguori 1 sibling, 0 replies; 151+ messages in thread From: Anthony Liguori @ 2010-08-05 13:43 UTC (permalink / raw) To: Gerd Hoffmann Cc: qemu-devel, kvm, Avi Kivity, Gleb Natapov, Richard W.M. Jones On 08/05/2010 02:28 AM, Gerd Hoffmann wrote: > Hi, > >> The option rom stuff has a number of short comings. Because we hijack >> int19, extboot doesn't get to run. That means that if you use -kernel to >> load a grub (the Ubuntu guys for their own absurd reasons) then grub >> does not see extboot backed disks. The solution for them is the same, >> generate a proper disk and boot from that disk. > > Oh, having extboot + linuxboot + multiboot register a BEV (correct > acronym?) entry instead of hijacking int19 would fix that too. > Additional bonus will be that they are selectable in the boot menu. Well, extboot doesn't hijack int19, it hijacks int13. It'll appear as the disk 0x80. It would be better though to do a BCV rom such that the extboot disk appeared as an independent disk instead of hijacking disk 0x80. linuxboot/multiboot should be BEV roms, no doubt. Regards, Anthony Liguori > cheers, > Gerd > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:36 ` Avi Kivity 2010-08-04 16:44 ` Anthony Liguori @ 2010-08-04 16:45 ` Alexander Graf 2010-08-04 16:54 ` Avi Kivity 2010-08-04 17:26 ` Anthony Liguori 2010-08-04 17:46 ` Richard W.M. Jones 2 siblings, 2 replies; 151+ messages in thread From: Alexander Graf @ 2010-08-04 16:45 UTC (permalink / raw) To: Avi Kivity Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann On 04.08.2010, at 18:36, Avi Kivity wrote: > On 08/04/2010 07:30 PM, Avi Kivity wrote: >> On 08/04/2010 04:52 PM, Anthony Liguori wrote: >>>>> >>>> This is not like DMA event if done in chunks and chunks can be pretty >>>> big. The code that dials with copying may temporary unmap some pci >>>> devices to have more space there. >>> >>> >>> That's a bit complicated because SeaBIOS is managing the PCI devices whereas the kernel code is running as an option rom. I don't know the BIOS PCI interfaces well so I don't know how doable this is. >>> >>> Maybe we're just being too fancy here. >>> >>> We could rewrite -kernel/-append/-initrd to just generate a floppy image in RAM, and just boot from floppy. >> >> How could this work? the RAM belongs to SeaBIOS immediately after reset, it would just scribble over it. Or worse, not scribble on it until some date in the future. >> >> -kernel data has to find its way to memory after the bios gives control to some optionrom. An alternative would be to embed knowledge of -kernel in seabios, but I don't think it's a good one. >> > > Oh, you meant host RAM, not guest RAM. Disregard. > > This is basically my suggestion to libguestfs: instead of generating an initrd, generate a bootable cdrom, and boot from that. The result is faster and has a smaller memory footprint. Everyone wins. Frankly, I partially agreed to your point when we were talking about 300ms vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose the wrong interface to transfer kernel+initrd data into the guest. Now the question is how to fix that. I would veto against anything normally guest-OS-visible. By occupying the floppy, you lose a floppy drive in the guest. By occupying a disk, you see an unwanted disk in the guest. By taking virtio-serial you see an unwanted virtio-serial line in the guest. fw_cfg is great because it's a private interface nobody else accesses. I see two alternatives out of this mess: 1) Speed up string PIO so we're actually fast again. 2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways) Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2. Alex ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:45 ` Alexander Graf @ 2010-08-04 16:54 ` Avi Kivity 2010-08-04 17:01 ` Alexander Graf 2010-08-04 17:26 ` Anthony Liguori 1 sibling, 1 reply; 151+ messages in thread From: Avi Kivity @ 2010-08-04 16:54 UTC (permalink / raw) To: Alexander Graf Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann On 08/04/2010 07:45 PM, Alexander Graf wrote: > > I see two alternatives out of this mess: > > 1) Speed up string PIO so we're actually fast again. Certainly, the best option given that it needs no new interfaces, and improves the most workloads. > 2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways) A guest/host interface is not private. > Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2. 3) don't use -kernel for 100MB or more. It's not the right tool. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:54 ` Avi Kivity @ 2010-08-04 17:01 ` Alexander Graf 2010-08-04 17:14 ` Avi Kivity 0 siblings, 1 reply; 151+ messages in thread From: Alexander Graf @ 2010-08-04 17:01 UTC (permalink / raw) To: Avi Kivity Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann On 04.08.2010, at 18:54, Avi Kivity wrote: > On 08/04/2010 07:45 PM, Alexander Graf wrote: >> >> I see two alternatives out of this mess: >> >> 1) Speed up string PIO so we're actually fast again. > > Certainly, the best option given that it needs no new interfaces, and improves the most workloads. > >> 2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways) > > A guest/host interface is not private. fw_cfg is as private as it gets with host/guest interfaces. It's about as close as CPU specific MSRs or SMC chips. > >> Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2. > > 3) don't use -kernel for 100MB or more. It's not the right tool. Why not? You're the one always ranting about caring about users. Now you get at least 3 users from the Qemu development community actually using a feature and you just claim it's wrong? Please, we've added way more useless features for worse reasons. Alex ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 17:01 ` Alexander Graf @ 2010-08-04 17:14 ` Avi Kivity 2010-08-04 17:27 ` Alexander Graf 0 siblings, 1 reply; 151+ messages in thread From: Avi Kivity @ 2010-08-04 17:14 UTC (permalink / raw) To: Alexander Graf Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann On 08/04/2010 08:01 PM, Alexander Graf wrote: > >> >>> 2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways) >> A guest/host interface is not private. > fw_cfg is as private as it gets with host/guest interfaces. It's about as close as CPU specific MSRs or SMC chips. > Well, it isn't. Two external projects already use it. You can't change it due to the needs to live migrate from older versions. >>> Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2. >> 3) don't use -kernel for 100MB or more. It's not the right tool. > Why not? You're the one always ranting about caring about users. Now you get at least 3 users from the Qemu development community actually using a feature and you just claim it's wrong? Please, we've added way more useless features for worse reasons. > It's not wrong in itself, but using it with supersized initrds is wrong. The data is stored in qemu, host pagecache, and the guest, so three copies, it's limited by guest RAM, has to be live migrated. Sure we could optimize it, but it's better to spend our efforts on more mainstream users. If you want to pull large amounts of data into the guest efficiently, use virtio-blk. That's what it's for. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 17:14 ` Avi Kivity @ 2010-08-04 17:27 ` Alexander Graf 2010-08-04 17:34 ` Avi Kivity 0 siblings, 1 reply; 151+ messages in thread From: Alexander Graf @ 2010-08-04 17:27 UTC (permalink / raw) To: Avi Kivity Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann On 04.08.2010, at 19:14, Avi Kivity wrote: > On 08/04/2010 08:01 PM, Alexander Graf wrote: >> >>> >>>> 2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways) >>> A guest/host interface is not private. >> fw_cfg is as private as it gets with host/guest interfaces. It's about as close as CPU specific MSRs or SMC chips. >> > > Well, it isn't. Two external projects already use it. You can't change it due to the needs to live migrate from older versions. You can always extend it. You can even break it with a new -M. > >>>> Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2. >>> 3) don't use -kernel for 100MB or more. It's not the right tool. >> Why not? You're the one always ranting about caring about users. Now you get at least 3 users from the Qemu development community actually using a feature and you just claim it's wrong? Please, we've added way more useless features for worse reasons. >> > > It's not wrong in itself, but using it with supersized initrds is wrong. The data is stored in qemu, host pagecache, and the guest, so three copies, it's limited by guest RAM, has to be live migrated. Sure we could optimize it, but it's better to spend our efforts on more mainstream users. It's only stored twice. The host pagecache copy is gone during the lifetime of the VM. Migration also doesn't make sense for most -kernel/-initrd use cases. And it's awesome for fast prototyping. Of course, once that fast becomes dog slow, it's not useful anymore. I bet within the time everybody spent on this thread we would have a working and stable DMA fw_cfg interface plus extra spare time for supporting breakage already. Alex ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 17:27 ` Alexander Graf @ 2010-08-04 17:34 ` Avi Kivity 2010-08-04 20:06 ` David S. Ahern 0 siblings, 1 reply; 151+ messages in thread From: Avi Kivity @ 2010-08-04 17:34 UTC (permalink / raw) To: Alexander Graf Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann On 08/04/2010 08:27 PM, Alexander Graf wrote: >> >> Well, it isn't. Two external projects already use it. You can't change it due to the needs to live migrate from older versions. > You can always extend it. You can even break it with a new -M. Yes. But it's a pain to make sure it all works out. We're already suffering from this where we have no choice, why do it where we have a choice? >> It's not wrong in itself, but using it with supersized initrds is wrong. The data is stored in qemu, host pagecache, and the guest, so three copies, it's limited by guest RAM, has to be live migrated. Sure we could optimize it, but it's better to spend our efforts on more mainstream users. > It's only stored twice. The host pagecache copy is gone during the lifetime of the VM. It has still evicted some other pagecache. Footprint is footprint. 300MB to cat some file in a guest. > Migration also doesn't make sense for most -kernel/-initrd use cases. You're just inviting a bug report here. If we add a feature, let's make it work. > And it's awesome for fast prototyping. Of course, once that fast becomes dog slow, it's not useful anymore. For the Nth time, it's only slow with 100MB initrds. > I bet within the time everybody spent on this thread we would have a working and stable DMA fw_cfg interface plus extra spare time for supporting breakage already. The time would have been better spent improving kvm's pio or porting libguestfs to use a cdrom. I'm also hoping to get the point across that adding pv interfaces like crazy is not sustainable. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 17:34 ` Avi Kivity @ 2010-08-04 20:06 ` David S. Ahern 2010-08-04 20:16 ` Richard W.M. Jones 2010-08-05 2:38 ` Avi Kivity 0 siblings, 2 replies; 151+ messages in thread From: David S. Ahern @ 2010-08-04 20:06 UTC (permalink / raw) To: Avi Kivity Cc: Alexander Graf, kvm, Gleb Natapov, qemu-devel, Richard W.M. Jones, Gerd Hoffmann On 08/04/10 11:34, Avi Kivity wrote: >> And it's awesome for fast prototyping. Of course, once that fast >> becomes dog slow, it's not useful anymore. > > For the Nth time, it's only slow with 100MB initrds. 100MB is really not that large for an initrd. Consider the deployment of stateless nodes - something that virtualization allows the rapid deployment of. 1 kernel, 1 initrd with the various binaries to be run. Create nodes as needed by launching a shell command - be it for more capacity, isolation, etc. Why require an iso or disk wrapper for a binary blob that is all to be run out of memory? The -append argument allows boot parameters to be specified at launch. That is a very powerful and simple design option. David ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 20:06 ` David S. Ahern @ 2010-08-04 20:16 ` Richard W.M. Jones 2010-08-05 2:38 ` Avi Kivity 1 sibling, 0 replies; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-04 20:16 UTC (permalink / raw) To: David S. Ahern Cc: kvm, Gleb Natapov, qemu-devel, Alexander Graf, Gerd Hoffmann, Avi Kivity On Wed, Aug 04, 2010 at 02:06:58PM -0600, David S. Ahern wrote: > > > On 08/04/10 11:34, Avi Kivity wrote: > > >> And it's awesome for fast prototyping. Of course, once that fast > >> becomes dog slow, it's not useful anymore. > > > > For the Nth time, it's only slow with 100MB initrds. > > 100MB is really not that large for an initrd. <note> I'd just like to note that the libguestfs initrd is uncompressed. The reason for this is I found that the decompression code in Linux is really slow. I have to admit I didn't look into why this is. By not compressing it on the host and decompressing it on the guest, we saved a bunch of boot time (3-5 seconds IIRC). Anyway, comparing 115MB libguestfs initrd and other initrd sizes may not be a fair comparison, since almost every other initrd you will see will be compressed. </note> Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 20:06 ` David S. Ahern 2010-08-04 20:16 ` Richard W.M. Jones @ 2010-08-05 2:38 ` Avi Kivity 1 sibling, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-05 2:38 UTC (permalink / raw) To: David S. Ahern Cc: Alexander Graf, kvm, Gleb Natapov, qemu-devel, Richard W.M. Jones, Gerd Hoffmann On 08/04/2010 11:06 PM, David S. Ahern wrote: > > On 08/04/10 11:34, Avi Kivity wrote: > >>> And it's awesome for fast prototyping. Of course, once that fast >>> becomes dog slow, it's not useful anymore. >> For the Nth time, it's only slow with 100MB initrds. > 100MB is really not that large for an initrd. > > Consider the deployment of stateless nodes - something that > virtualization allows the rapid deployment of. 1 kernel, 1 initrd with > the various binaries to be run. Create nodes as needed by launching a > shell command - be it for more capacity, isolation, etc. Why require an > iso or disk wrapper for a binary blob that is all to be run out of > memory? It's inefficient. First qemu reads the initrd and stores it in memory (where it is kept while the guest runs in case you migrate or reboot). Then the guest copies it into temporary storage (where we currently have the slowdown). Then the guest decompresses and extracts it to tmpfs (initramfs model). Finally the guest runs init out of initrd, typically using just a part of the 100MB+. Whereas with a disk image, individual pages are copied to the guest on demand without taking space in qemu. With cache=none, they don't even affect host pagecache. > The -append argument allows boot parameters to be specified at > launch. That is a very powerful and simple design option. Good point. You still have it with a small initrd that bootstraps a larger image. Note -append probably works even without -kernel, it's just that the guest isn't tooled to look at it. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:45 ` Alexander Graf 2010-08-04 16:54 ` Avi Kivity @ 2010-08-04 17:26 ` Anthony Liguori 2010-08-04 17:31 ` Alexander Graf 1 sibling, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 17:26 UTC (permalink / raw) To: Alexander Graf Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, Gerd Hoffmann On 08/04/2010 11:45 AM, Alexander Graf wrote: > Frankly, I partially agreed to your point when we were talking about 300ms vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose the wrong interface to transfer kernel+initrd data into the guest. > > Now the question is how to fix that. I would veto against anything normally guest-OS-visible. By occupying the floppy, you lose a floppy drive in the guest. By occupying a disk, you see an unwanted disk in the guest. Introduce a new virtio device type (say, id 6). Teach SeaBIOS that 6 is exactly like virtio-blk (id 2). Make it clear that id 6 is only to be used by firmware and that normal guest drivers should not be written for id 6. Problem is now solved and everyone's happy. Now we can all go back to making slides for next week :-) Regards, Anthony Liguori > By taking virtio-serial you see an unwanted virtio-serial line in the guest. fw_cfg is great because it's a private interface nobody else accesses. > > I see two alternatives out of this mess: > > 1) Speed up string PIO so we're actually fast again. > 2) Using a different interface (that could also be DMA fw_cfg - remember, we're on a private interface anyways) > > Admittedly 1 would also help in more cases than just booting with -kernel and -initrd, but if that won't get us to acceptable levels (and yes, 8 seconds for 100MB is unacceptable) I don't see any way around 2. > > > Alex > > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 17:26 ` Anthony Liguori @ 2010-08-04 17:31 ` Alexander Graf 2010-08-04 17:35 ` Avi Kivity 2010-08-04 17:36 ` Anthony Liguori 0 siblings, 2 replies; 151+ messages in thread From: Alexander Graf @ 2010-08-04 17:31 UTC (permalink / raw) To: Anthony Liguori Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, Gerd Hoffmann On 04.08.2010, at 19:26, Anthony Liguori wrote: > On 08/04/2010 11:45 AM, Alexander Graf wrote: >> Frankly, I partially agreed to your point when we were talking about 300ms vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose the wrong interface to transfer kernel+initrd data into the guest. >> >> Now the question is how to fix that. I would veto against anything normally guest-OS-visible. By occupying the floppy, you lose a floppy drive in the guest. By occupying a disk, you see an unwanted disk in the guest. > > > Introduce a new virtio device type (say, id 6). Teach SeaBIOS that 6 is exactly like virtio-blk (id 2). Make it clear that id 6 is only to be used by firmware and that normal guest drivers should not be written for id 6. Why not make id 6 be a fw_cfg virtio interface? That way we'd stay 100% compatible to everything we have and also get a fast path for reading big chunks of data from fw_cfg. All we'd need is a command to set the 'file' we're in. Even better yet, why not use virtio-9p and expose all of fw_cfg as files? Then implement a simple virtio-9p client in SeaBIOS and maybe even get direct kernel/initrd boot from a real 9p system ;). Alex ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 17:31 ` Alexander Graf @ 2010-08-04 17:35 ` Avi Kivity 2010-08-04 17:36 ` Anthony Liguori 1 sibling, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 17:35 UTC (permalink / raw) To: Alexander Graf Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Gerd Hoffmann On 08/04/2010 08:31 PM, Alexander Graf wrote: > > Even better yet, why not use virtio-9p and expose all of fw_cfg as files? Then implement a simple virtio-9p client in SeaBIOS and maybe even get direct kernel/initrd boot from a real 9p system ;). > libguestfs could use 9pfs directly. That will be way faster and reduce the footprint dramatically (the guest will demand load only the pages it needs). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 17:31 ` Alexander Graf 2010-08-04 17:35 ` Avi Kivity @ 2010-08-04 17:36 ` Anthony Liguori 2010-08-04 17:36 ` Alexander Graf 1 sibling, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 17:36 UTC (permalink / raw) To: Alexander Graf Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, Gerd Hoffmann On 08/04/2010 12:31 PM, Alexander Graf wrote: > On 04.08.2010, at 19:26, Anthony Liguori wrote: > > >> On 08/04/2010 11:45 AM, Alexander Graf wrote: >> >>> Frankly, I partially agreed to your point when we were talking about 300ms vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose the wrong interface to transfer kernel+initrd data into the guest. >>> >>> Now the question is how to fix that. I would veto against anything normally guest-OS-visible. By occupying the floppy, you lose a floppy drive in the guest. By occupying a disk, you see an unwanted disk in the guest. >>> >> >> Introduce a new virtio device type (say, id 6). Teach SeaBIOS that 6 is exactly like virtio-blk (id 2). Make it clear that id 6 is only to be used by firmware and that normal guest drivers should not be written for id 6. >> > Why not make id 6 be a fw_cfg virtio interface? Because that's a ton more work and we need fw_cfg to be available before PCI is. IOW, fw_cfg cannot be a PCI interface. Regards, Anthony Liguori > That way we'd stay 100% compatible to everything we have and also get a fast path for reading big chunks of data from fw_cfg. All we'd need is a command to set the 'file' we're in. > > Even better yet, why not use virtio-9p and expose all of fw_cfg as files? Then implement a simple virtio-9p client in SeaBIOS and maybe even get direct kernel/initrd boot from a real 9p system ;). > > > Alex > > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 17:36 ` Anthony Liguori @ 2010-08-04 17:36 ` Alexander Graf 0 siblings, 0 replies; 151+ messages in thread From: Alexander Graf @ 2010-08-04 17:36 UTC (permalink / raw) To: Anthony Liguori Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, Gerd Hoffmann On 04.08.2010, at 19:36, Anthony Liguori wrote: > On 08/04/2010 12:31 PM, Alexander Graf wrote: >> On 04.08.2010, at 19:26, Anthony Liguori wrote: >> >> >>> On 08/04/2010 11:45 AM, Alexander Graf wrote: >>> >>>> Frankly, I partially agreed to your point when we were talking about 300ms vs. 2 seconds. Now that we're talking 8 seconds that whole point is moot. We chose the wrong interface to transfer kernel+initrd data into the guest. >>>> >>>> Now the question is how to fix that. I would veto against anything normally guest-OS-visible. By occupying the floppy, you lose a floppy drive in the guest. By occupying a disk, you see an unwanted disk in the guest. >>>> >>> >>> Introduce a new virtio device type (say, id 6). Teach SeaBIOS that 6 is exactly like virtio-blk (id 2). Make it clear that id 6 is only to be used by firmware and that normal guest drivers should not be written for id 6. >>> >> Why not make id 6 be a fw_cfg virtio interface? > > Because that's a ton more work and we need fw_cfg to be available before PCI is. IOW, fw_cfg cannot be a PCI interface. in addition to fw_cfg. So you'd have the same contents be exposed using both interfaces. Alex ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:36 ` Avi Kivity 2010-08-04 16:44 ` Anthony Liguori 2010-08-04 16:45 ` Alexander Graf @ 2010-08-04 17:46 ` Richard W.M. Jones 2010-08-04 17:50 ` Avi Kivity 2010-08-04 18:13 ` Alexander Graf 2 siblings, 2 replies; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-04 17:46 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, kvm, Gleb Natapov, Gerd Hoffmann On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote: > This is basically my suggestion to libguestfs: instead of generating > an initrd, generate a bootable cdrom, and boot from that. The > result is faster and has a smaller memory footprint. Everyone wins. We had some discussion of this upstream & decided to do this. It should save the time it takes for the guest kernel to unpack the initrd, so maybe another second off boot time, which could bring us ever closer to the "golden" 5 second boot target. It's not trivial mind you, and won't happen straightaway. Part of it is that it requires reworking the appliance builder (a matter of just coding really). The less trivial part is that we have to 'hide' the CD device throughout the publically available interfaces. Then of course, a lot of testing. I will note that virt-install uses the -initrd interface for installing guests (large initrds too). And I've talked with a sysadmin who was using -kernel and -initrd for deploying VM hosting. In his case he did it so he could centralize kernel distribution / updates, and have the guests use /dev/vda == filesystem which made provisioning easy [for him -- I would have used libguestfs ...]. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/ ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 17:46 ` Richard W.M. Jones @ 2010-08-04 17:50 ` Avi Kivity 2010-08-04 18:13 ` Alexander Graf 1 sibling, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 17:50 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: qemu-devel, kvm, Gleb Natapov, Gerd Hoffmann On 08/04/2010 08:46 PM, Richard W.M. Jones wrote: > On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote: >> This is basically my suggestion to libguestfs: instead of generating >> an initrd, generate a bootable cdrom, and boot from that. The >> result is faster and has a smaller memory footprint. Everyone wins. > We had some discussion of this upstream& decided to do this. It > should save the time it takes for the guest kernel to unpack the > initrd, so maybe another second off boot time, which could bring us > ever closer to the "golden" 5 second boot target. > Great. IMO it's the right thing even if initrd took zero time. > It's not trivial mind you, and won't happen straightaway. Part of it > is that it requires reworking the appliance builder (a matter of just > coding really). The less trivial part is that we have to 'hide' the > CD device throughout the publically available interfaces. Then of > course, a lot of testing. > > I will note that virt-install uses the -initrd interface for > installing guests (large initrds too). And I've talked with a > sysadmin who was using -kernel and -initrd for deploying VM hosting. > In his case he did it so he could centralize kernel distribution / > updates, and have the guests use /dev/vda == filesystem which made > provisioning easy [for him -- I would have used libguestfs ...]. We still plan to improve pio speed. (note a few added seconds to guest install or bootup is not such a drag compared to the hit on an interactive tool like libguestfs). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 17:46 ` Richard W.M. Jones 2010-08-04 17:50 ` Avi Kivity @ 2010-08-04 18:13 ` Alexander Graf 2010-08-04 18:16 ` Anthony Liguori 2010-08-04 18:18 ` Avi Kivity 1 sibling, 2 replies; 151+ messages in thread From: Alexander Graf @ 2010-08-04 18:13 UTC (permalink / raw) To: Richard W.M.Jones Cc: Gleb Natapov, kvm, qemu-devel, Avi Kivity, Gerd Hoffmann On 04.08.2010, at 19:46, Richard W.M. Jones wrote: > On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote: >> This is basically my suggestion to libguestfs: instead of generating >> an initrd, generate a bootable cdrom, and boot from that. The >> result is faster and has a smaller memory footprint. Everyone wins. > > We had some discussion of this upstream & decided to do this. It > should save the time it takes for the guest kernel to unpack the > initrd, so maybe another second off boot time, which could bring us > ever closer to the "golden" 5 second boot target. > > It's not trivial mind you, and won't happen straightaway. Part of it > is that it requires reworking the appliance builder (a matter of just > coding really). The less trivial part is that we have to 'hide' the > CD device throughout the publically available interfaces. Then of > course, a lot of testing. Why not go with 9p? That would save off even more time, as you don't have to generate an iso. You could just copy all the relevant executables into tmpfs and boot from there using your kernel and a very small (pre-built) initrd. Alex ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 18:13 ` Alexander Graf @ 2010-08-04 18:16 ` Anthony Liguori 2010-08-04 18:18 ` Alexander Graf 2010-08-04 18:19 ` Avi Kivity 2010-08-04 18:18 ` Avi Kivity 1 sibling, 2 replies; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 18:16 UTC (permalink / raw) To: Alexander Graf Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M.Jones, Gerd Hoffmann, Avi Kivity On 08/04/2010 01:13 PM, Alexander Graf wrote: > On 04.08.2010, at 19:46, Richard W.M. Jones wrote: > > >> On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote: >> >>> This is basically my suggestion to libguestfs: instead of generating >>> an initrd, generate a bootable cdrom, and boot from that. The >>> result is faster and has a smaller memory footprint. Everyone wins. >>> >> We had some discussion of this upstream& decided to do this. It >> should save the time it takes for the guest kernel to unpack the >> initrd, so maybe another second off boot time, which could bring us >> ever closer to the "golden" 5 second boot target. >> >> It's not trivial mind you, and won't happen straightaway. Part of it >> is that it requires reworking the appliance builder (a matter of just >> coding really). The less trivial part is that we have to 'hide' the >> CD device throughout the publically available interfaces. Then of >> course, a lot of testing. >> > Why not go with 9p? That would save off even more time, as you don't have to generate an iso. You could just copy all the relevant executables into tmpfs and boot from there using your kernel and a very small (pre-built) initrd. > You can't boot from 9p. Regards, Anthony Liguori > Alex > > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 18:16 ` Anthony Liguori @ 2010-08-04 18:18 ` Alexander Graf 2010-08-04 18:19 ` Avi Kivity 1 sibling, 0 replies; 151+ messages in thread From: Alexander Graf @ 2010-08-04 18:18 UTC (permalink / raw) To: Anthony Liguori Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M.Jones, Gerd Hoffmann, Avi Kivity On 04.08.2010, at 20:16, Anthony Liguori wrote: > On 08/04/2010 01:13 PM, Alexander Graf wrote: >> On 04.08.2010, at 19:46, Richard W.M. Jones wrote: >> >> >>> On Wed, Aug 04, 2010 at 07:36:04PM +0300, Avi Kivity wrote: >>> >>>> This is basically my suggestion to libguestfs: instead of generating >>>> an initrd, generate a bootable cdrom, and boot from that. The >>>> result is faster and has a smaller memory footprint. Everyone wins. >>>> >>> We had some discussion of this upstream& decided to do this. It >>> should save the time it takes for the guest kernel to unpack the >>> initrd, so maybe another second off boot time, which could bring us >>> ever closer to the "golden" 5 second boot target. >>> >>> It's not trivial mind you, and won't happen straightaway. Part of it >>> is that it requires reworking the appliance builder (a matter of just >>> coding really). The less trivial part is that we have to 'hide' the >>> CD device throughout the publically available interfaces. Then of >>> course, a lot of testing. >>> >> Why not go with 9p? That would save off even more time, as you don't have to generate an iso. You could just copy all the relevant executables into tmpfs and boot from there using your kernel and a very small (pre-built) initrd. >> > > You can't boot from 9p. But you could still use -kernel and -initrd for that, no? Alex ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 18:16 ` Anthony Liguori 2010-08-04 18:18 ` Alexander Graf @ 2010-08-04 18:19 ` Avi Kivity 1 sibling, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 18:19 UTC (permalink / raw) To: Anthony Liguori Cc: Gleb Natapov, kvm, Richard W.M.Jones, qemu-devel, Alexander Graf, Gerd Hoffmann On 08/04/2010 09:16 PM, Anthony Liguori wrote: >> Why not go with 9p? That would save off even more time, as you don't >> have to generate an iso. You could just copy all the relevant >> executables into tmpfs and boot from there using your kernel and a >> very small (pre-built) initrd. > > You can't boot from 9p. > As Alex said, you boot from a non-100MB initrd (or cdrom) and mount the 9pfs. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 18:13 ` Alexander Graf 2010-08-04 18:16 ` Anthony Liguori @ 2010-08-04 18:18 ` Avi Kivity 1 sibling, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 18:18 UTC (permalink / raw) To: Alexander Graf Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M.Jones, Gerd Hoffmann On 08/04/2010 09:13 PM, Alexander Graf wrote: > >> It's not trivial mind you, and won't happen straightaway. Part of it >> is that it requires reworking the appliance builder (a matter of just >> coding really). The less trivial part is that we have to 'hide' the >> CD device throughout the publically available interfaces. Then of >> course, a lot of testing. > Why not go with 9p? That would save off even more time, as you don't have to generate an iso. You could just copy all the relevant executables into tmpfs and boot from there using your kernel and a very small (pre-built) initrd. Yes - and you don't need to copy, just hardlink if your /tmp and /usr are on the same filesystem. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:30 ` Avi Kivity 2010-08-04 16:36 ` Avi Kivity @ 2010-08-04 16:42 ` Anthony Liguori 1 sibling, 0 replies; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 16:42 UTC (permalink / raw) To: Avi Kivity Cc: qemu-devel, kvm, Gerd Hoffmann, Gleb Natapov, Richard W.M. Jones On 08/04/2010 11:30 AM, Avi Kivity wrote: > On 08/04/2010 04:52 PM, Anthony Liguori wrote: >>>> >>> This is not like DMA event if done in chunks and chunks can be pretty >>> big. The code that dials with copying may temporary unmap some pci >>> devices to have more space there. >> >> >> That's a bit complicated because SeaBIOS is managing the PCI devices >> whereas the kernel code is running as an option rom. I don't know >> the BIOS PCI interfaces well so I don't know how doable this is. >> >> Maybe we're just being too fancy here. >> >> We could rewrite -kernel/-append/-initrd to just generate a floppy >> image in RAM, and just boot from floppy. > > How could this work? the RAM belongs to SeaBIOS immediately after > reset, it would just scribble over it. Or worse, not scribble on it > until some date in the future. I mean host RAM, not guest RAM. Regards, Anthony Liguori > > -kernel data has to find its way to memory after the bios gives > control to some optionrom. An alternative would be to embed knowledge > of -kernel in seabios, but I don't think it's a good one. > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 13:07 ` Gleb Natapov 2010-08-04 13:15 ` Anthony Liguori @ 2010-08-04 13:22 ` Richard W.M. Jones 2010-08-04 13:29 ` Gleb Natapov 1 sibling, 1 reply; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-04 13:22 UTC (permalink / raw) To: Gleb Natapov; +Cc: qemu-devel, kvm, Avi Kivity, Gerd Hoffmann On Wed, Aug 04, 2010 at 04:07:09PM +0300, Gleb Natapov wrote: > On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: > > On 08/04/2010 03:17 AM, Avi Kivity wrote: > > >For playing games, there are three options: > > >- existing fwcfg > > >- fwcfg+dma > > >- put roms in 4GB-2MB (or whatever we decide the flash size is) > > >and have the BIOS copy them > > > > > >Existing fwcfg is the least amount of work and probably > > >satisfactory for isapc. fwcfg+dma is IMO going off a tangent. > > >High memory flash is the most hardware-like solution, pretty easy > > >from a qemu point of view but requires more work. > > > > The only trouble I see is that high memory isn't always available. > > If it's a 32-bit PC and you've exhausted RAM space, then you're only > > left with the PCI hole and it's not clear to me if you can really > > pull out 100mb of space there as an option ROM without breaking > > something. > > > We can map it on demand. Guest tells qemu to map rom "A" to address X by > writing into some io port. Guest copies rom. Guest tells qemu to unmap > it. Better then DMA interface IMHO. I think this is a fine idea. Do you want me to try to implement something like this? (I'm on holiday this week and next week at the KVM Forum, so it won't be for a while ...) Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/ ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 13:22 ` Richard W.M. Jones @ 2010-08-04 13:29 ` Gleb Natapov 0 siblings, 0 replies; 151+ messages in thread From: Gleb Natapov @ 2010-08-04 13:29 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: qemu-devel, kvm, Avi Kivity, Gerd Hoffmann On Wed, Aug 04, 2010 at 02:22:29PM +0100, Richard W.M. Jones wrote: > > On Wed, Aug 04, 2010 at 04:07:09PM +0300, Gleb Natapov wrote: > > On Wed, Aug 04, 2010 at 08:04:09AM -0500, Anthony Liguori wrote: > > > On 08/04/2010 03:17 AM, Avi Kivity wrote: > > > >For playing games, there are three options: > > > >- existing fwcfg > > > >- fwcfg+dma > > > >- put roms in 4GB-2MB (or whatever we decide the flash size is) > > > >and have the BIOS copy them > > > > > > > >Existing fwcfg is the least amount of work and probably > > > >satisfactory for isapc. fwcfg+dma is IMO going off a tangent. > > > >High memory flash is the most hardware-like solution, pretty easy > > > >from a qemu point of view but requires more work. > > > > > > The only trouble I see is that high memory isn't always available. > > > If it's a 32-bit PC and you've exhausted RAM space, then you're only > > > left with the PCI hole and it's not clear to me if you can really > > > pull out 100mb of space there as an option ROM without breaking > > > something. > > > > > We can map it on demand. Guest tells qemu to map rom "A" to address X by > > writing into some io port. Guest copies rom. Guest tells qemu to unmap > > it. Better then DMA interface IMHO. > > I think this is a fine idea. Do you want me to try to implement > something like this? (I'm on holiday this week and next week at > the KVM Forum, so it won't be for a while ...) > I wouldn't do that without principal agreement from Avi and Anthony :) -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 13:04 ` Anthony Liguori 2010-08-04 13:07 ` Gleb Natapov @ 2010-08-04 16:25 ` Avi Kivity 1 sibling, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 16:25 UTC (permalink / raw) To: Anthony Liguori Cc: qemu-devel, kvm, Gerd Hoffmann, Gleb Natapov, Richard W.M. Jones On 08/04/2010 04:04 PM, Anthony Liguori wrote: > On 08/04/2010 03:17 AM, Avi Kivity wrote: >> For playing games, there are three options: >> - existing fwcfg >> - fwcfg+dma >> - put roms in 4GB-2MB (or whatever we decide the flash size is) and >> have the BIOS copy them >> >> Existing fwcfg is the least amount of work and probably satisfactory >> for isapc. fwcfg+dma is IMO going off a tangent. High memory flash >> is the most hardware-like solution, pretty easy from a qemu point of >> view but requires more work. > > The only trouble I see is that high memory isn't always available. If > it's a 32-bit PC and you've exhausted RAM space, then you're only left > with the PCI hole and it's not clear to me if you can really pull out > 100mb of space there as an option ROM without breaking something. > 100MB is out of the question, certainly. I'm talking about your isapc problem, not about a cdrom replacement. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 19:24 ` Avi Kivity 2010-08-03 19:38 ` Anthony Liguori 2010-08-03 21:20 ` Gerd Hoffmann @ 2010-08-03 22:06 ` Richard W.M. Jones 2010-08-04 5:54 ` Avi Kivity 2 siblings, 1 reply; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-03 22:06 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, Gleb Natapov, qemu-devel On Tue, Aug 03, 2010 at 10:24:41PM +0300, Avi Kivity wrote: > Why do we need to transfer roms? These are devices on the memory > bus or pci bus, it just needs to be there at the right address. > Boot splash should just be another rom as it would be on a real > system. Just like the initrd? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://et.redhat.com/~rjones/libguestfs/ See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 22:06 ` Richard W.M. Jones @ 2010-08-04 5:54 ` Avi Kivity 2010-08-04 9:24 ` Richard W.M. Jones 0 siblings, 1 reply; 151+ messages in thread From: Avi Kivity @ 2010-08-04 5:54 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: kvm, Gleb Natapov, qemu-devel On 08/04/2010 01:06 AM, Richard W.M. Jones wrote: > On Tue, Aug 03, 2010 at 10:24:41PM +0300, Avi Kivity wrote: >> Why do we need to transfer roms? These are devices on the memory >> bus or pci bus, it just needs to be there at the right address. >> Boot splash should just be another rom as it would be on a real >> system. > Just like the initrd? There isn't enough address space for a 100MB initrd in ROM. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 5:54 ` Avi Kivity @ 2010-08-04 9:24 ` Richard W.M. Jones 2010-08-04 9:27 ` Gleb Natapov ` (2 more replies) 0 siblings, 3 replies; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-04 9:24 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, Gleb Natapov, qemu-devel On Wed, Aug 04, 2010 at 08:54:35AM +0300, Avi Kivity wrote: > On 08/04/2010 01:06 AM, Richard W.M. Jones wrote: > >On Tue, Aug 03, 2010 at 10:24:41PM +0300, Avi Kivity wrote: > >>Why do we need to transfer roms? These are devices on the memory > >>bus or pci bus, it just needs to be there at the right address. > >>Boot splash should just be another rom as it would be on a real > >>system. > >Just like the initrd? > > There isn't enough address space for a 100MB initrd in ROM. Because of limits of the original PC, sure, where you had to fit everything in 0xa0000-0xfffff or whatever it was. But this isn't a real PC. You can map the read-only memory anywhere you want. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://et.redhat.com/~rjones/libguestfs/ See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 9:24 ` Richard W.M. Jones @ 2010-08-04 9:27 ` Gleb Natapov 2010-08-04 9:52 ` Avi Kivity 2010-08-04 12:59 ` Anthony Liguori 2 siblings, 0 replies; 151+ messages in thread From: Gleb Natapov @ 2010-08-04 9:27 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: kvm, Avi Kivity, qemu-devel On Wed, Aug 04, 2010 at 10:24:28AM +0100, Richard W.M. Jones wrote: > On Wed, Aug 04, 2010 at 08:54:35AM +0300, Avi Kivity wrote: > > On 08/04/2010 01:06 AM, Richard W.M. Jones wrote: > > >On Tue, Aug 03, 2010 at 10:24:41PM +0300, Avi Kivity wrote: > > >>Why do we need to transfer roms? These are devices on the memory > > >>bus or pci bus, it just needs to be there at the right address. > > >>Boot splash should just be another rom as it would be on a real > > >>system. > > >Just like the initrd? > > > > There isn't enough address space for a 100MB initrd in ROM. > > Because of limits of the original PC, sure, where you had to fit > everything in 0xa0000-0xfffff or whatever it was. > > But this isn't a real PC. > In what way it is not? > You can map the read-only memory anywhere you want. > You can't. Guests expects certain memory layouts. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 9:24 ` Richard W.M. Jones 2010-08-04 9:27 ` Gleb Natapov @ 2010-08-04 9:52 ` Avi Kivity 2010-08-04 11:33 ` Richard W.M. Jones 2010-08-04 12:59 ` Anthony Liguori 2 siblings, 1 reply; 151+ messages in thread From: Avi Kivity @ 2010-08-04 9:52 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: kvm, Gleb Natapov, qemu-devel On 08/04/2010 12:24 PM, Richard W.M. Jones wrote: >>> >>> Just like the initrd? >> There isn't enough address space for a 100MB initrd in ROM. > Because of limits of the original PC, sure, where you had to fit > everything in 0xa0000-0xfffff or whatever it was. > > But this isn't a real PC. > > You can map the read-only memory anywhere you want. I wasn't talking about the 1MB limit, rather the 4GB limit. Of that, 3-3.5GB are reserved for RAM, 0.5-1GB for PCI. Putting large amounts of ROM in that space will cost us PCI space. 100 MB initrds are a bad idea for multiple reasons. Demand paging is there for a reason. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 9:52 ` Avi Kivity @ 2010-08-04 11:33 ` Richard W.M. Jones 2010-08-04 11:36 ` Avi Kivity 2010-08-04 12:07 ` Gleb Natapov 0 siblings, 2 replies; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-04 11:33 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, Gleb Natapov, qemu-devel On Wed, Aug 04, 2010 at 12:52:23PM +0300, Avi Kivity wrote: > On 08/04/2010 12:24 PM, Richard W.M. Jones wrote: > >>> > >>>Just like the initrd? > >>There isn't enough address space for a 100MB initrd in ROM. > >Because of limits of the original PC, sure, where you had to fit > >everything in 0xa0000-0xfffff or whatever it was. > > > >But this isn't a real PC. > > > >You can map the read-only memory anywhere you want. > > I wasn't talking about the 1MB limit, rather the 4GB limit. Of > that, 3-3.5GB are reserved for RAM, 0.5-1GB for PCI. Putting large > amounts of ROM in that space will cost us PCI space. I'm only allocating 500MB of RAM, so there's easily enough space to put a large ROM, with tons of room for growth (of both RAM and ROM). Yes, even real hardware has done this. The Weitek math copro mapped itself in at physical memory addresses c0000000 (a 32 MB window IIRC). Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://et.redhat.com/~rjones/virt-top ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 11:33 ` Richard W.M. Jones @ 2010-08-04 11:36 ` Avi Kivity 2010-08-04 12:07 ` Gleb Natapov 1 sibling, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 11:36 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: kvm, Gleb Natapov, qemu-devel On 08/04/2010 02:33 PM, Richard W.M. Jones wrote: > > I'm only allocating 500MB of RAM, so there's easily enough space to > put a large ROM, with tons of room for growth (of both RAM and ROM). > Yes, even real hardware has done this. The Weitek math copro mapped > itself in at physical memory addresses c0000000 (a 32 MB window IIRC). I'm sure it will work for your use case, but it becomes a feature that only works if you have a guest with a small amount of memory and few pci devices. With a larger guest it fails. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 11:33 ` Richard W.M. Jones 2010-08-04 11:36 ` Avi Kivity @ 2010-08-04 12:07 ` Gleb Natapov 1 sibling, 0 replies; 151+ messages in thread From: Gleb Natapov @ 2010-08-04 12:07 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: kvm, Avi Kivity, qemu-devel On Wed, Aug 04, 2010 at 12:33:18PM +0100, Richard W.M. Jones wrote: > On Wed, Aug 04, 2010 at 12:52:23PM +0300, Avi Kivity wrote: > > On 08/04/2010 12:24 PM, Richard W.M. Jones wrote: > > >>> > > >>>Just like the initrd? > > >>There isn't enough address space for a 100MB initrd in ROM. > > >Because of limits of the original PC, sure, where you had to fit > > >everything in 0xa0000-0xfffff or whatever it was. > > > > > >But this isn't a real PC. > > > > > >You can map the read-only memory anywhere you want. > > > > I wasn't talking about the 1MB limit, rather the 4GB limit. Of > > that, 3-3.5GB are reserved for RAM, 0.5-1GB for PCI. Putting large > > amounts of ROM in that space will cost us PCI space. > > I'm only allocating 500MB of RAM, so there's easily enough space to > put a large ROM, with tons of room for growth (of both RAM and ROM). > Yes, even real hardware has done this. The Weitek math copro mapped > itself in at physical memory addresses c0000000 (a 32 MB window IIRC). > c0000000 is 3G. This is where PCI windows starts usually (configurable in the chipset). Don't see anything unusual in this particular HW. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 9:24 ` Richard W.M. Jones 2010-08-04 9:27 ` Gleb Natapov 2010-08-04 9:52 ` Avi Kivity @ 2010-08-04 12:59 ` Anthony Liguori 2 siblings, 0 replies; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 12:59 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: kvm, Avi Kivity, Gleb Natapov, qemu-devel On 08/04/2010 04:24 AM, Richard W.M. Jones wrote: > On Wed, Aug 04, 2010 at 08:54:35AM +0300, Avi Kivity wrote: > >> On 08/04/2010 01:06 AM, Richard W.M. Jones wrote: >> >>> On Tue, Aug 03, 2010 at 10:24:41PM +0300, Avi Kivity wrote: >>> >>>> Why do we need to transfer roms? These are devices on the memory >>>> bus or pci bus, it just needs to be there at the right address. >>>> Boot splash should just be another rom as it would be on a real >>>> system. >>>> >>> Just like the initrd? >>> >> There isn't enough address space for a 100MB initrd in ROM. >> > Because of limits of the original PC, sure, where you had to fit > everything in 0xa0000-0xfffff or whatever it was. > > But this isn't a real PC. > > You can map the read-only memory anywhere you want. > It's not that simple. Option roms are initialized in 16-bit mode so the physical address space is limited. The address mappings have very well defined semantics. Regards, Anthony Liguori > Rich. > > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 19:15 ` Anthony Liguori 2010-08-03 19:24 ` Avi Kivity @ 2010-08-03 19:26 ` Gleb Natapov 1 sibling, 0 replies; 151+ messages in thread From: Gleb Natapov @ 2010-08-03 19:26 UTC (permalink / raw) To: Anthony Liguori; +Cc: qemu-devel, Avi Kivity, kvm, Richard W.M. Jones On Tue, Aug 03, 2010 at 02:15:05PM -0500, Anthony Liguori wrote: > On 08/03/2010 02:05 PM, Gleb Natapov wrote: > >On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote: > >>>If Richard is willing to do the work to make -kernel perform > >>>faster in such a way that it fits into the overall mission of what > >>>we're building, then I see no reason to reject it. The criteria > >>>for evaluating a patch should only depend on how it affects other > >>>areas of qemu and whether it impacts overall usability. > >>That's true, but extending fwcfg doesn't fit into the overall > >>picture well. We have well defined interfaces for pushing data into > >>a guest: virtio-serial (dma upload), virtio-blk (adds demand > >>paging), and virtio-p9fs (no image needed). Adapting libguestfs to > >>use one of these is a better move than adding yet another interface. > >> > >+1. I already proposed that. Nobody objects against fast fast > >communication channel between guest and host. In fact we have one: > >virtio-serial. Of course it is much easier to hack dma semantic into > >fw_cfg interface than add virtio-serial to seabios, but it doesn't make > >it right. Does virtio-serial has to be exposed as PCI to a guest or can > >we expose it as ISA device too in case someone want to use -kernel option > >but do not see additional PCI device in a guest? > > fw_cfg has to be available pretty early on so relying on a PCI > device isn't reasonable. Having dual interfaces seems wasteful. > fw_cfg wasn't mean to be used for bulk transfers (seabios doesn't even use string pio to access it which make load time 50 times slower that what Richard reports). It was meant to be easy to use on very early stages of booting. Kernel/initrd are loaded on very late stage of booting at which point PCI is fully initialized. > We're already doing bulk data transfer over fw_cfg as we need to do > it to transfer roms and potentially a boot splash. Even outside of > loading an initrd, the performance is going to start to matter with > a large number of devices. > Most roms are loaded from rom PIC bars, so this leaves us with boot splash, but boot splash image should be relatively small and if user wants it he does not care about boot time already since bios need to pause to show the boot splash anyway. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 18:43 ` Avi Kivity ` (2 preceding siblings ...) 2010-08-03 19:05 ` Gleb Natapov @ 2010-08-03 19:13 ` Richard W.M. Jones 2010-08-03 19:17 ` Gleb Natapov ` (3 more replies) 2010-08-04 14:51 ` David S. Ahern 4 siblings, 4 replies; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-03 19:13 UTC (permalink / raw) To: Avi Kivity; +Cc: Gleb Natapov, qemu-devel, kvm On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote: > libguestfs does not depend on an x86 architectural feature. > qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We > should discourage people from depending on this interface for > production use. I really don't get this whole thing where we must slavishly emulate an exact PC ... Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://et.redhat.com/~rjones/virt-top ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 19:13 ` Richard W.M. Jones @ 2010-08-03 19:17 ` Gleb Natapov 2010-08-03 19:19 ` Anthony Liguori ` (2 subsequent siblings) 3 siblings, 0 replies; 151+ messages in thread From: Gleb Natapov @ 2010-08-03 19:17 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: kvm, Avi Kivity, qemu-devel On Tue, Aug 03, 2010 at 08:13:46PM +0100, Richard W.M. Jones wrote: > On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote: > > libguestfs does not depend on an x86 architectural feature. > > qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We > > should discourage people from depending on this interface for > > production use. > > I really don't get this whole thing where we must slavishly > emulate an exact PC ... > May be because you don't have to dial with consequences of not doing so? -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 19:13 ` Richard W.M. Jones 2010-08-03 19:17 ` Gleb Natapov @ 2010-08-03 19:19 ` Anthony Liguori 2010-08-03 19:22 ` Avi Kivity 2010-08-04 8:21 ` Avi Kivity 3 siblings, 0 replies; 151+ messages in thread From: Anthony Liguori @ 2010-08-03 19:19 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: kvm, Avi Kivity, Gleb Natapov, qemu-devel On 08/03/2010 02:13 PM, Richard W.M. Jones wrote: > On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote: > >> libguestfs does not depend on an x86 architectural feature. >> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We >> should discourage people from depending on this interface for >> production use. >> > I really don't get this whole thing where we must slavishly > emulate an exact PC ... > History has shown that when we deviate, we usually get it wrong and it becomes very painful to fix. Regards, Anthony Liguori > Rich. > > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 19:13 ` Richard W.M. Jones 2010-08-03 19:17 ` Gleb Natapov 2010-08-03 19:19 ` Anthony Liguori @ 2010-08-03 19:22 ` Avi Kivity 2010-08-03 20:00 ` Richard W.M. Jones 2010-08-04 8:21 ` Avi Kivity 3 siblings, 1 reply; 151+ messages in thread From: Avi Kivity @ 2010-08-03 19:22 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: Gleb Natapov, qemu-devel, kvm On 08/03/2010 10:13 PM, Richard W.M. Jones wrote: > On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote: >> libguestfs does not depend on an x86 architectural feature. >> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We >> should discourage people from depending on this interface for >> production use. > I really don't get this whole thing where we must slavishly > emulate an exact PC ... This has two motivations: - documented interfaces: we suck at documentation. We seldom document. Even when we do document something, the documentation is often inaccurate, misleading, and incomplete. While an "exact PC" unfortunately doesn't exist, it's a lot closer to reality than, say, an "exact Linux syscall interface". If we adopt an existing interface, we already have the documentation, and if there's a conflict between the documentation and our implementation, it's clear who wins (well, not always). - preexisting guests: if we design a new interface, we get to update all guests; and there are many of them. Whereas an "exact PC" will be seen by the guest vendors as well who will then add whatever support is necessary. Obviously we break this when we have to, but we don't, we shouldn't. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 19:22 ` Avi Kivity @ 2010-08-03 20:00 ` Richard W.M. Jones 2010-08-03 20:49 ` Anthony Liguori 2010-08-04 1:17 ` Jamie Lokier 0 siblings, 2 replies; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-03 20:00 UTC (permalink / raw) To: Avi Kivity; +Cc: Gleb Natapov, qemu-devel, kvm On Tue, Aug 03, 2010 at 10:22:22PM +0300, Avi Kivity wrote: > On 08/03/2010 10:13 PM, Richard W.M. Jones wrote: > >On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote: > >>libguestfs does not depend on an x86 architectural feature. > >>qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We > >>should discourage people from depending on this interface for > >>production use. > >I really don't get this whole thing where we must slavishly > >emulate an exact PC ... > > This has two motivations: > > - documented interfaces: we suck at documentation. We seldom > document. Even when we do document something, the documentation is > often inaccurate, misleading, and incomplete. While an "exact PC" > unfortunately doesn't exist, it's a lot closer to reality than, say, > an "exact Linux syscall interface". If we adopt an existing > interface, we already have the documentation, and if there's a > conflict between the documentation and our implementation, it's > clear who wins (well, not always). > > - preexisting guests: if we design a new interface, we get to update > all guests; and there are many of them. Whereas an "exact PC" will > be seen by the guest vendors as well who will then add whatever > support is necessary. On the other hand we end up with stuff like only being able to add 29 virtio-blk devices to a single guest. As best as I can tell, this comes from PCI, and this limit required a bunch of hacks when implementing virt-df. These are reasonable motivations, but I think they are partially about us: We could document things better and make things future-proof. I'm surprised by how lacking the doc requirements are for qemu (compared to, hmm, libguestfs for example). We could demand that OSes write device drivers for more qemu devices -- already OS vendors write thousands of device drivers for all sorts of obscure devices, so this isn't really much of a demand for them. In fact, they're already doing it. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/ ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 20:00 ` Richard W.M. Jones @ 2010-08-03 20:49 ` Anthony Liguori 2010-08-03 21:13 ` Paolo Bonzini 2010-08-04 5:56 ` Avi Kivity 2010-08-04 1:17 ` Jamie Lokier 1 sibling, 2 replies; 151+ messages in thread From: Anthony Liguori @ 2010-08-03 20:49 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: kvm, Avi Kivity, Gleb Natapov, qemu-devel On 08/03/2010 03:00 PM, Richard W.M. Jones wrote: > On Tue, Aug 03, 2010 at 10:22:22PM +0300, Avi Kivity wrote: > >> On 08/03/2010 10:13 PM, Richard W.M. Jones wrote: >> >>> On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote: >>> >>>> libguestfs does not depend on an x86 architectural feature. >>>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We >>>> should discourage people from depending on this interface for >>>> production use. >>>> >>> I really don't get this whole thing where we must slavishly >>> emulate an exact PC ... >>> >> This has two motivations: >> >> - documented interfaces: we suck at documentation. We seldom >> document. Even when we do document something, the documentation is >> often inaccurate, misleading, and incomplete. While an "exact PC" >> unfortunately doesn't exist, it's a lot closer to reality than, say, >> an "exact Linux syscall interface". If we adopt an existing >> interface, we already have the documentation, and if there's a >> conflict between the documentation and our implementation, it's >> clear who wins (well, not always). >> >> - preexisting guests: if we design a new interface, we get to update >> all guests; and there are many of them. Whereas an "exact PC" will >> be seen by the guest vendors as well who will then add whatever >> support is necessary. >> > On the other hand we end up with stuff like only being able to add 29 > virtio-blk devices to a single guest. As best as I can tell, this > comes from PCI No, this comes from us being too clever for our own good and not following the way hardware does it. All modern systems keep disks on their own dedicated bus. In virtio-blk, we have a 1-1 relationship between disks and PCI devices. That's a perfect example of what happens when we try to "improve" things. > , and this limit required a bunch of hacks when > implementing virt-df. > > These are reasonable motivations, but I think they are partially about > us: > > We could document things better and make things future-proof. I'm > surprised by how lacking the doc requirements are for qemu (compared > to, hmm, libguestfs for example). > We enjoy complaining about our lack of documentation more than we like actually writing documentation. > We could demand that OSes write device drivers for more qemu devices > -- already OS vendors write thousands of device drivers for all sorts > of obscure devices, so this isn't really much of a demand for them. > In fact, they're already doing it. > So far, MS hasn't quite gotten the clue yet that they should write device drivers for qemu :-) In fact, noone has. Regards, Anthony Liguori > Rich. > > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 20:49 ` Anthony Liguori @ 2010-08-03 21:13 ` Paolo Bonzini 2010-08-03 21:34 ` Anthony Liguori 2010-08-04 5:56 ` Avi Kivity 1 sibling, 1 reply; 151+ messages in thread From: Paolo Bonzini @ 2010-08-03 21:13 UTC (permalink / raw) To: Anthony Liguori Cc: qemu-devel, kvm, Richard W.M. Jones, Gleb Natapov, Avi Kivity On 08/03/2010 10:49 PM, Anthony Liguori wrote: >> On the other hand we end up with stuff like only being able to add 29 >> virtio-blk devices to a single guest. As best as I can tell, this >> comes from PCI > > No, this comes from us being too clever for our own good and not > following the way hardware does it. > > All modern systems keep disks on their own dedicated bus. In > virtio-blk, we have a 1-1 relationship between disks and PCI devices. > That's a perfect example of what happens when we try to "improve" things. Comparing (from personal experience) the complexity of the Windows drivers for Xen and virtio shows that it's not a bad idea at all. Paolo ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 21:13 ` Paolo Bonzini @ 2010-08-03 21:34 ` Anthony Liguori 2010-08-04 7:57 ` Paolo Bonzini 0 siblings, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-03 21:34 UTC (permalink / raw) To: Paolo Bonzini Cc: qemu-devel, kvm, Richard W.M. Jones, Gleb Natapov, Avi Kivity On 08/03/2010 04:13 PM, Paolo Bonzini wrote: > On 08/03/2010 10:49 PM, Anthony Liguori wrote: >>> On the other hand we end up with stuff like only being able to add 29 >>> virtio-blk devices to a single guest. As best as I can tell, this >>> comes from PCI >> >> No, this comes from us being too clever for our own good and not >> following the way hardware does it. >> >> All modern systems keep disks on their own dedicated bus. In >> virtio-blk, we have a 1-1 relationship between disks and PCI devices. >> That's a perfect example of what happens when we try to "improve" >> things. > > Comparing (from personal experience) the complexity of the Windows > drivers for Xen and virtio shows that it's not a bad idea at all. Not quite sure what you're suggesting, but I could have been clearer. Instead of having virtio-blk where a virtio disk has a 1-1 mapping to a PCI device, we probably should have just done virtio-scsi. Since most OSes have a SCSI-centric block layer, it would have resulted in much simpler drivers and we could support more than 1 disk per PCI slot. I had thought Christoph was working on such a device at some point in time... Regards, Anthony Liguori > > Paolo ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 21:34 ` Anthony Liguori @ 2010-08-04 7:57 ` Paolo Bonzini 2010-08-04 8:19 ` Avi Kivity 2010-08-04 12:53 ` Anthony Liguori 0 siblings, 2 replies; 151+ messages in thread From: Paolo Bonzini @ 2010-08-04 7:57 UTC (permalink / raw) To: Anthony Liguori Cc: qemu-devel, kvm, Richard W.M. Jones, Gleb Natapov, Avi Kivity On 08/03/2010 11:34 PM, Anthony Liguori wrote: >> >> Comparing (from personal experience) the complexity of the Windows >> drivers for Xen and virtio shows that it's not a bad idea at all. > > Not quite sure what you're suggesting, but I could have been clearer. > Instead of having virtio-blk where a virtio disk has a 1-1 mapping to a > PCI device, we probably should have just done virtio-scsi. If you did virtio-scsi you might have as well ditched virtio-pci altogether and provide a single PCI device just like Xen does. Just make your network device also speak SCSI (which is actually in the spec...), and the same for serial devices. But now your driver that has to implement its own hot-plug/hot-unplug mechanism rather than deferring it to the PCI subsystem of the OS (like Xen), greatly adding to the complication. In fact, a SCSI controller's firmware has a lot of other communication channels with the driver besides SCSI commands, and all this would be mapped into additional complexity on both the host side and the guest side. Yet another reminder of Xen. Despite the shortcomings, I think virtio-pci is the best example of balancing PV-specific aspects (do not make things too complicated) and "real world" aspects (do not invent new buses and the like). Paolo ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 7:57 ` Paolo Bonzini @ 2010-08-04 8:19 ` Avi Kivity 2010-08-04 12:53 ` Anthony Liguori 1 sibling, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 8:19 UTC (permalink / raw) To: Paolo Bonzini; +Cc: kvm, Gleb Natapov, Richard W.M. Jones, qemu-devel On 08/04/2010 10:57 AM, Paolo Bonzini wrote: > On 08/03/2010 11:34 PM, Anthony Liguori wrote: >>> >>> Comparing (from personal experience) the complexity of the Windows >>> drivers for Xen and virtio shows that it's not a bad idea at all. >> >> Not quite sure what you're suggesting, but I could have been clearer. >> Instead of having virtio-blk where a virtio disk has a 1-1 mapping to a >> PCI device, we probably should have just done virtio-scsi. > > If you did virtio-scsi you might have as well ditched virtio-pci > altogether and provide a single PCI device just like Xen does. Just > make your network device also speak SCSI (which is actually in the > spec...), and the same for serial devices. > > But now your driver that has to implement its own hot-plug/hot-unplug > mechanism rather than deferring it to the PCI subsystem of the OS > (like Xen), greatly adding to the complication. In fact, a SCSI > controller's firmware has a lot of other communication channels with > the driver besides SCSI commands, and all this would be mapped into > additional complexity on both the host side and the guest side. Yet > another reminder of Xen. > > Despite the shortcomings, I think virtio-pci is the best example of > balancing PV-specific aspects (do not make things too complicated) and > "real world" aspects (do not invent new buses and the like). Making virtio-blk a controller doesn't involve much difficulty. We add LUN to all requests, and send a configuration interrupt (which we already have) when a LUN is added or removed. Add some config space for discovering available LUNs. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 7:57 ` Paolo Bonzini 2010-08-04 8:19 ` Avi Kivity @ 2010-08-04 12:53 ` Anthony Liguori 2010-08-04 16:44 ` Avi Kivity 1 sibling, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 12:53 UTC (permalink / raw) To: Paolo Bonzini Cc: qemu-devel, kvm, Richard W.M. Jones, Gleb Natapov, Avi Kivity On 08/04/2010 02:57 AM, Paolo Bonzini wrote: > On 08/03/2010 11:34 PM, Anthony Liguori wrote: >>> >>> Comparing (from personal experience) the complexity of the Windows >>> drivers for Xen and virtio shows that it's not a bad idea at all. >> >> Not quite sure what you're suggesting, but I could have been clearer. >> Instead of having virtio-blk where a virtio disk has a 1-1 mapping to a >> PCI device, we probably should have just done virtio-scsi. > > If you did virtio-scsi you might have as well ditched virtio-pci > altogether and provide a single PCI device just like Xen does. Just > make your network device also speak SCSI (which is actually in the > spec...), and the same for serial devices. > > But now your driver that has to implement its own hot-plug/hot-unplug > mechanism rather than deferring it to the PCI subsystem of the OS > (like Xen), greatly adding to the complication. In fact, a SCSI > controller's firmware has a lot of other communication channels with > the driver besides SCSI commands, and all this would be mapped into > additional complexity on both the host side and the guest side. Yet > another reminder of Xen. > > Despite the shortcomings, I think virtio-pci is the best example of > balancing PV-specific aspects (do not make things too complicated) and > "real world" aspects (do not invent new buses and the like). So how do we enable support for more than 20 disks? I think a virtio-scsi is inevitable.. Regards, Anthony Liguori > Paolo ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 12:53 ` Anthony Liguori @ 2010-08-04 16:44 ` Avi Kivity 2010-08-04 16:46 ` Anthony Liguori 0 siblings, 1 reply; 151+ messages in thread From: Avi Kivity @ 2010-08-04 16:44 UTC (permalink / raw) To: Anthony Liguori Cc: Paolo Bonzini, kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/04/2010 03:53 PM, Anthony Liguori wrote: > > So how do we enable support for more than 20 disks? I think a > virtio-scsi is inevitable.. Not only for large numbers of disks, also for JBOD performance. If you have one queue per disk you'll have low queue depths and high interrupt rates. Aggregating many spindles into a single queue is important for reducing overhead. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:44 ` Avi Kivity @ 2010-08-04 16:46 ` Anthony Liguori 2010-08-04 16:48 ` Alexander Graf 0 siblings, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 16:46 UTC (permalink / raw) To: Avi Kivity Cc: Paolo Bonzini, kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/04/2010 11:44 AM, Avi Kivity wrote: > On 08/04/2010 03:53 PM, Anthony Liguori wrote: >> >> So how do we enable support for more than 20 disks? I think a >> virtio-scsi is inevitable.. > > Not only for large numbers of disks, also for JBOD performance. If > you have one queue per disk you'll have low queue depths and high > interrupt rates. Aggregating many spindles into a single queue is > important for reducing overhead. Right, the only question is, to you inject your own bus or do you just reuse SCSI. On the surface, it seems like reusing SCSI has a significant number of advantages. For instance, without changing the guest's drivers, we can implement PV cdroms or PC tape drivers. It also supports SCSI level pass through which is pretty nice for enabling things like NPIV. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:46 ` Anthony Liguori @ 2010-08-04 16:48 ` Alexander Graf 2010-08-04 16:49 ` Anthony Liguori 0 siblings, 1 reply; 151+ messages in thread From: Alexander Graf @ 2010-08-04 16:48 UTC (permalink / raw) To: Anthony Liguori Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, Paolo Bonzini On 04.08.2010, at 18:46, Anthony Liguori wrote: > On 08/04/2010 11:44 AM, Avi Kivity wrote: >> On 08/04/2010 03:53 PM, Anthony Liguori wrote: >>> >>> So how do we enable support for more than 20 disks? I think a virtio-scsi is inevitable.. >> >> Not only for large numbers of disks, also for JBOD performance. If you have one queue per disk you'll have low queue depths and high interrupt rates. Aggregating many spindles into a single queue is important for reducing overhead. > > Right, the only question is, to you inject your own bus or do you just reuse SCSI. On the surface, it seems like reusing SCSI has a significant number of advantages. For instance, without changing the guest's drivers, we can implement PV cdroms or PC tape drivers. What exactly would keep us from doing that with virtio-blk? I thought that supports scsi commands already. Alex ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:48 ` Alexander Graf @ 2010-08-04 16:49 ` Anthony Liguori 2010-08-04 16:51 ` Alexander Graf 2010-08-04 17:01 ` Paolo Bonzini 0 siblings, 2 replies; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 16:49 UTC (permalink / raw) To: Alexander Graf Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, Paolo Bonzini On 08/04/2010 11:48 AM, Alexander Graf wrote: > On 04.08.2010, at 18:46, Anthony Liguori wrote: > > >> On 08/04/2010 11:44 AM, Avi Kivity wrote: >> >>> On 08/04/2010 03:53 PM, Anthony Liguori wrote: >>> >>>> So how do we enable support for more than 20 disks? I think a virtio-scsi is inevitable.. >>>> >>> Not only for large numbers of disks, also for JBOD performance. If you have one queue per disk you'll have low queue depths and high interrupt rates. Aggregating many spindles into a single queue is important for reducing overhead. >>> >> Right, the only question is, to you inject your own bus or do you just reuse SCSI. On the surface, it seems like reusing SCSI has a significant number of advantages. For instance, without changing the guest's drivers, we can implement PV cdroms or PC tape drivers. >> > What exactly would keep us from doing that with virtio-blk? I thought that supports scsi commands already. > I think the toughest change would be making it appear as a scsi device within the guest. You could do that to virtio-blk but it would be a flag day as reasonable configured guests will break. Having virtio-blk device show up as /dev/vdX was a big mistake. It's been nothing but a giant PITA. There is an amazing amount of software that only looks at /dev/sd* and /dev/hd*. Regards, Anthony Liguori > Alex > > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:49 ` Anthony Liguori @ 2010-08-04 16:51 ` Alexander Graf 2010-08-04 17:01 ` Paolo Bonzini 1 sibling, 0 replies; 151+ messages in thread From: Alexander Graf @ 2010-08-04 16:51 UTC (permalink / raw) To: Anthony Liguori Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, Paolo Bonzini On 04.08.2010, at 18:49, Anthony Liguori wrote: > On 08/04/2010 11:48 AM, Alexander Graf wrote: >> On 04.08.2010, at 18:46, Anthony Liguori wrote: >> >> >>> On 08/04/2010 11:44 AM, Avi Kivity wrote: >>> >>>> On 08/04/2010 03:53 PM, Anthony Liguori wrote: >>>> >>>>> So how do we enable support for more than 20 disks? I think a virtio-scsi is inevitable.. >>>>> >>>> Not only for large numbers of disks, also for JBOD performance. If you have one queue per disk you'll have low queue depths and high interrupt rates. Aggregating many spindles into a single queue is important for reducing overhead. >>>> >>> Right, the only question is, to you inject your own bus or do you just reuse SCSI. On the surface, it seems like reusing SCSI has a significant number of advantages. For instance, without changing the guest's drivers, we can implement PV cdroms or PC tape drivers. >>> >> What exactly would keep us from doing that with virtio-blk? I thought that supports scsi commands already. >> > > I think the toughest change would be making it appear as a scsi device within the guest. You could do that to virtio-blk but it would be a flag day as reasonable configured guests will break. > > Having virtio-blk device show up as /dev/vdX was a big mistake. It's been nothing but a giant PITA. There is an amazing amount of software that only looks at /dev/sd* and /dev/hd*. I completely agree and yes, we should move in that direction IMHO. I don't see why virtio-blk should be any different from megasas for example. Alex ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:49 ` Anthony Liguori 2010-08-04 16:51 ` Alexander Graf @ 2010-08-04 17:01 ` Paolo Bonzini 2010-08-04 17:19 ` Avi Kivity 1 sibling, 1 reply; 151+ messages in thread From: Paolo Bonzini @ 2010-08-04 17:01 UTC (permalink / raw) To: Anthony Liguori Cc: Alexander Graf, Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity On 08/04/2010 06:49 PM, Anthony Liguori wrote: >>> Right, the only question is, to you inject your own bus or do you >>> just reuse SCSI. On the surface, it seems like reusing SCSI has a >>> significant number of advantages. For instance, without changing the >>> guest's drivers, we can implement PV cdroms or PC tape drivers. If you want multiple LUNs per virtio device SCSI is obviously a good choice, but you will need something more (like the config space Avi mentioned). My position is that getting this "something more" right is considerably harder than virtio-blk. Maybe it will be done some day, but I still think that not having virtio-scsi from day 1 was actually a good thing. Even if we can learn from xenbus and all that. >> What exactly would keep us from doing that with virtio-blk? I thought >> that supports scsi commands already. > > I think the toughest change would be making it appear as a scsi device > within the guest. You could do that to virtio-blk but it would be a > flag day as reasonable configured guests will break. > > Having virtio-blk device show up as /dev/vdX was a big mistake. It's > been nothing but a giant PITA. There is an amazing amount of software > that only looks at /dev/sd* and /dev/hd*. That's another story and I totally agree here, but not reusing /dev/sd* is not intrinsic in the design of virtio-blk (and one thing that Windows gets right; everything is SCSI, period). Paolo ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 17:01 ` Paolo Bonzini @ 2010-08-04 17:19 ` Avi Kivity 2010-08-04 17:25 ` Alexander Graf 2010-08-04 17:27 ` Anthony Liguori 0 siblings, 2 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 17:19 UTC (permalink / raw) To: Paolo Bonzini Cc: Gleb Natapov, kvm, Richard W.M. Jones, qemu-devel, Alexander Graf On 08/04/2010 08:01 PM, Paolo Bonzini wrote: > > That's another story and I totally agree here, but not reusing > /dev/sd* is not intrinsic in the design of virtio-blk (and one thing > that Windows gets right; everything is SCSI, period). > I don't really get why everything must be SCSI. Everything must support read, write, a few other commands, and a large set of optional commands. But why map them all to SCSI? What's the magic? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 17:19 ` Avi Kivity @ 2010-08-04 17:25 ` Alexander Graf 2010-08-04 17:27 ` Anthony Liguori 1 sibling, 0 replies; 151+ messages in thread From: Alexander Graf @ 2010-08-04 17:25 UTC (permalink / raw) To: Avi Kivity Cc: kvm, Gleb Natapov, Richard W.M. Jones, qemu-devel, Paolo Bonzini On 04.08.2010, at 19:19, Avi Kivity wrote: > On 08/04/2010 08:01 PM, Paolo Bonzini wrote: >> >> That's another story and I totally agree here, but not reusing /dev/sd* is not intrinsic in the design of virtio-blk (and one thing that Windows gets right; everything is SCSI, period). >> > > I don't really get why everything must be SCSI. Everything must support read, write, a few other commands, and a large set of optional commands. But why map them all to SCSI? What's the magic? Hence the reference to megasas. It implements its own read/write/few other commands and the whole stack of optional commands as SCSI. I think virtio-blk should be the same. SCSI simply because it's there, it's flexible and it's well defined. You get a working spec and a lot of working implementations. Alex ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 17:19 ` Avi Kivity 2010-08-04 17:25 ` Alexander Graf @ 2010-08-04 17:27 ` Anthony Liguori 2010-08-04 17:37 ` Avi Kivity 1 sibling, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 17:27 UTC (permalink / raw) To: Avi Kivity Cc: Gleb Natapov, kvm, Richard W.M. Jones, qemu-devel, Alexander Graf, Paolo Bonzini On 08/04/2010 12:19 PM, Avi Kivity wrote: > On 08/04/2010 08:01 PM, Paolo Bonzini wrote: >> >> That's another story and I totally agree here, but not reusing >> /dev/sd* is not intrinsic in the design of virtio-blk (and one thing >> that Windows gets right; everything is SCSI, period). >> > > I don't really get why everything must be SCSI. Everything must > support read, write, a few other commands, and a large set of optional > commands. But why map them all to SCSI? What's the magic? Because that's what real hardware with only a few rare exceptions. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 17:27 ` Anthony Liguori @ 2010-08-04 17:37 ` Avi Kivity 2010-08-04 17:53 ` Anthony Liguori 0 siblings, 1 reply; 151+ messages in thread From: Avi Kivity @ 2010-08-04 17:37 UTC (permalink / raw) To: Anthony Liguori Cc: Gleb Natapov, kvm, Richard W.M. Jones, qemu-devel, Alexander Graf, Paolo Bonzini On 08/04/2010 08:27 PM, Anthony Liguori wrote: > On 08/04/2010 12:19 PM, Avi Kivity wrote: >> On 08/04/2010 08:01 PM, Paolo Bonzini wrote: >>> >>> That's another story and I totally agree here, but not reusing >>> /dev/sd* is not intrinsic in the design of virtio-blk (and one thing >>> that Windows gets right; everything is SCSI, period). >>> >> >> I don't really get why everything must be SCSI. Everything must >> support read, write, a few other commands, and a large set of >> optional commands. But why map them all to SCSI? What's the magic? > > Because that's what real hardware with only a few rare exceptions. > I thought that IDE was emulated as SCSI even when it wasn't. But I guess now with SATA you're right. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 17:37 ` Avi Kivity @ 2010-08-04 17:53 ` Anthony Liguori 2010-08-04 18:05 ` Alexander Graf 0 siblings, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 17:53 UTC (permalink / raw) To: Avi Kivity Cc: Gleb Natapov, kvm, Richard W.M. Jones, qemu-devel, Alexander Graf, Paolo Bonzini On 08/04/2010 12:37 PM, Avi Kivity wrote: > On 08/04/2010 08:27 PM, Anthony Liguori wrote: >> On 08/04/2010 12:19 PM, Avi Kivity wrote: >>> On 08/04/2010 08:01 PM, Paolo Bonzini wrote: >>>> >>>> That's another story and I totally agree here, but not reusing >>>> /dev/sd* is not intrinsic in the design of virtio-blk (and one >>>> thing that Windows gets right; everything is SCSI, period). >>>> >>> >>> I don't really get why everything must be SCSI. Everything must >>> support read, write, a few other commands, and a large set of >>> optional commands. But why map them all to SCSI? What's the magic? >> >> Because that's what real hardware with only a few rare exceptions. >> > > I thought that IDE was emulated as SCSI even when it wasn't. But I > guess now with SATA you're right. IDE -> EIDE -> ATA -> SATA ATA can encapsulate SCSI commands via ATAPI which gives you the ability to have ATA based CD-ROMs among other things. I don't believe that SATA actually uses SCSI commands for read/write operations but I think Linux exposes SATA drivers as SCSI anyway. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 17:53 ` Anthony Liguori @ 2010-08-04 18:05 ` Alexander Graf 0 siblings, 0 replies; 151+ messages in thread From: Alexander Graf @ 2010-08-04 18:05 UTC (permalink / raw) To: Anthony Liguori Cc: Gleb Natapov, kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, Paolo Bonzini On 04.08.2010, at 19:53, Anthony Liguori wrote: > On 08/04/2010 12:37 PM, Avi Kivity wrote: >> On 08/04/2010 08:27 PM, Anthony Liguori wrote: >>> On 08/04/2010 12:19 PM, Avi Kivity wrote: >>>> On 08/04/2010 08:01 PM, Paolo Bonzini wrote: >>>>> >>>>> That's another story and I totally agree here, but not reusing /dev/sd* is not intrinsic in the design of virtio-blk (and one thing that Windows gets right; everything is SCSI, period). >>>>> >>>> >>>> I don't really get why everything must be SCSI. Everything must support read, write, a few other commands, and a large set of optional commands. But why map them all to SCSI? What's the magic? >>> >>> Because that's what real hardware with only a few rare exceptions. >>> >> >> I thought that IDE was emulated as SCSI even when it wasn't. But I guess now with SATA you're right. > > IDE -> EIDE -> ATA -> SATA > > ATA can encapsulate SCSI commands via ATAPI which gives you the ability to have ATA based CD-ROMs among other things. > > I don't believe that SATA actually uses SCSI commands for read/write operations It doesn't. In fact, it's basically just a wrapper around the normal ATA commands - even for read/write. Plus some additional SATA only commands for parallel read/write. > but I think Linux exposes SATA drivers as SCSI anyway. Yup. That's what libata does. Even works with PATA drives. But this is a purely Linux internal thing. Alex ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 20:49 ` Anthony Liguori 2010-08-03 21:13 ` Paolo Bonzini @ 2010-08-04 5:56 ` Avi Kivity 1 sibling, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 5:56 UTC (permalink / raw) To: Anthony Liguori; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 11:49 PM, Anthony Liguori wrote: > >> We could demand that OSes write device drivers for more qemu devices >> -- already OS vendors write thousands of device drivers for all sorts >> of obscure devices, so this isn't really much of a demand for them. >> In fact, they're already doing it. > > So far, MS hasn't quite gotten the clue yet that they should write > device drivers for qemu :-) To be fair, we haven't actually demanded that they do. > In fact, noone has. Strangely, the reverse has happened - I think virtualbox has written virtio device models for their VMM. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 20:00 ` Richard W.M. Jones 2010-08-03 20:49 ` Anthony Liguori @ 2010-08-04 1:17 ` Jamie Lokier 1 sibling, 0 replies; 151+ messages in thread From: Jamie Lokier @ 2010-08-04 1:17 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: kvm, Avi Kivity, Gleb Natapov, qemu-devel Richard W.M. Jones wrote: > We could demand that OSes write device drivers for more qemu devices > -- already OS vendors write thousands of device drivers for all sorts > of obscure devices, so this isn't really much of a demand for them. > In fact, they're already doing it. Result: Most OSes not working with qemu? Actually we seem to be going that way. Recent qemus don't work with older versions of Windows any more, so we have to use different versions of qemu for different guests. -- Jamie ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 19:13 ` Richard W.M. Jones ` (2 preceding siblings ...) 2010-08-03 19:22 ` Avi Kivity @ 2010-08-04 8:21 ` Avi Kivity 3 siblings, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 8:21 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: Gleb Natapov, qemu-devel, kvm On 08/03/2010 10:13 PM, Richard W.M. Jones wrote: > On Tue, Aug 03, 2010 at 09:43:39PM +0300, Avi Kivity wrote: >> libguestfs does not depend on an x86 architectural feature. >> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We >> should discourage people from depending on this interface for >> production use. > I really don't get this whole thing where we must slavishly > emulate an exact PC ... An additional point in favour is that we have a method of resolving design arguments. No need to think, we have the spec in front of us. The arguments then devolve into interpretation of the spec. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 18:43 ` Avi Kivity ` (3 preceding siblings ...) 2010-08-03 19:13 ` Richard W.M. Jones @ 2010-08-04 14:51 ` David S. Ahern 2010-08-04 14:57 ` Anthony Liguori 4 siblings, 1 reply; 151+ messages in thread From: David S. Ahern @ 2010-08-04 14:51 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, Gleb Natapov, kvm, Richard W.M. Jones On 08/03/10 12:43, Avi Kivity wrote: > libguestfs does not depend on an x86 architectural feature. > qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We should > discourage people from depending on this interface for production use. That is a feature of qemu - and an important one to me as well. Why should it be discouraged? You end up at the same place -- a running kernel and in-ram filesystem; why require going through a bootloader just because the hardware case needs it? David ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 14:51 ` David S. Ahern @ 2010-08-04 14:57 ` Anthony Liguori 2010-08-04 15:25 ` Gleb Natapov 0 siblings, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-04 14:57 UTC (permalink / raw) To: David S. Ahern Cc: qemu-devel, Gleb Natapov, Avi Kivity, kvm, Richard W.M. Jones On 08/04/2010 09:51 AM, David S. Ahern wrote: > > On 08/03/10 12:43, Avi Kivity wrote: > >> libguestfs does not depend on an x86 architectural feature. >> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We should >> discourage people from depending on this interface for production use. >> > That is a feature of qemu - and an important one to me as well. Why > should it be discouraged? You end up at the same place -- a running > kernel and in-ram filesystem; why require going through a bootloader > just because the hardware case needs it? > It's smoke and mirrors. We're still providing a boot loader it's just a little tiny one that we've written soley for this purpose. And it works fine for production use. The question is whether we ought to be aggressively optimizing it for large initrd sizes. To be honest, after a lot of discussion of possibilities, I've come to the conclusion that it's just not worth it. There are better ways like using string I/O and optimizing the PIO path in the kernel. That should cut down the 1s slow down with a 100MB initrd by a bit. But honestly, shaving a couple hundred ms further off the initrd load is just not worth it using the current model. If this is important to someone, we ought to look at refactoring the loader completely to be disk based which is a higher performance interface. Regards, Anthony Liguori > David > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 14:57 ` Anthony Liguori @ 2010-08-04 15:25 ` Gleb Natapov 2010-08-04 15:31 ` Alexander Graf 2010-08-04 23:17 ` Kevin O'Connor 0 siblings, 2 replies; 151+ messages in thread From: Gleb Natapov @ 2010-08-04 15:25 UTC (permalink / raw) To: Anthony Liguori Cc: qemu-devel, Richard W.M. Jones, Avi Kivity, David S. Ahern, kvm On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote: > On 08/04/2010 09:51 AM, David S. Ahern wrote: > > > >On 08/03/10 12:43, Avi Kivity wrote: > >>libguestfs does not depend on an x86 architectural feature. > >>qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We should > >>discourage people from depending on this interface for production use. > >That is a feature of qemu - and an important one to me as well. Why > >should it be discouraged? You end up at the same place -- a running > >kernel and in-ram filesystem; why require going through a bootloader > >just because the hardware case needs it? > > It's smoke and mirrors. We're still providing a boot loader it's > just a little tiny one that we've written soley for this purpose. > > And it works fine for production use. The question is whether we > ought to be aggressively optimizing it for large initrd sizes. To > be honest, after a lot of discussion of possibilities, I've come to > the conclusion that it's just not worth it. > > There are better ways like using string I/O and optimizing the PIO > path in the kernel. That should cut down the 1s slow down with a > 100MB initrd by a bit. But honestly, shaving a couple hundred ms > further off the initrd load is just not worth it using the current > model. > The slow down is not 1s any more. String PIO emulation had many bugs that were fixed in 2.6.35. I verified how much time it took to load 100M via fw_cfg interface on older kernel and on 2.6.35. On older kernels on my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations that was already committed make it 20s. I have some code prototype that makes it 11s. I don't see how we can get below that, surely not back to ~2-3sec. > If this is important to someone, we ought to look at refactoring the > loader completely to be disk based which is a higher performance > interface. > > Regards, > > Anthony Liguori > > >David > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 15:25 ` Gleb Natapov @ 2010-08-04 15:31 ` Alexander Graf 2010-08-04 15:48 ` Gleb Natapov 2010-08-04 23:17 ` Kevin O'Connor 1 sibling, 1 reply; 151+ messages in thread From: Alexander Graf @ 2010-08-04 15:31 UTC (permalink / raw) To: Gleb Natapov Cc: kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, David S. Ahern On 04.08.2010, at 17:25, Gleb Natapov wrote: > On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote: >> On 08/04/2010 09:51 AM, David S. Ahern wrote: >>> >>> On 08/03/10 12:43, Avi Kivity wrote: >>>> libguestfs does not depend on an x86 architectural feature. >>>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We should >>>> discourage people from depending on this interface for production use. >>> That is a feature of qemu - and an important one to me as well. Why >>> should it be discouraged? You end up at the same place -- a running >>> kernel and in-ram filesystem; why require going through a bootloader >>> just because the hardware case needs it? >> >> It's smoke and mirrors. We're still providing a boot loader it's >> just a little tiny one that we've written soley for this purpose. >> >> And it works fine for production use. The question is whether we >> ought to be aggressively optimizing it for large initrd sizes. To >> be honest, after a lot of discussion of possibilities, I've come to >> the conclusion that it's just not worth it. >> >> There are better ways like using string I/O and optimizing the PIO >> path in the kernel. That should cut down the 1s slow down with a >> 100MB initrd by a bit. But honestly, shaving a couple hundred ms >> further off the initrd load is just not worth it using the current >> model. >> > The slow down is not 1s any more. String PIO emulation had many bugs > that were fixed in 2.6.35. I verified how much time it took to load 100M > via fw_cfg interface on older kernel and on 2.6.35. On older kernels on > my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations > that was already committed make it 20s. I have some code prototype that > makes it 11s. I don't see how we can get below that, surely not back to > ~2-3sec. What exactly is the reason for the slowdown? It can't be only boundary and permission checks, right? Alex ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 15:31 ` Alexander Graf @ 2010-08-04 15:48 ` Gleb Natapov 2010-08-04 15:59 ` Alexander Graf 0 siblings, 1 reply; 151+ messages in thread From: Gleb Natapov @ 2010-08-04 15:48 UTC (permalink / raw) To: Alexander Graf Cc: kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, David S. Ahern On Wed, Aug 04, 2010 at 05:31:12PM +0200, Alexander Graf wrote: > > On 04.08.2010, at 17:25, Gleb Natapov wrote: > > > On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote: > >> On 08/04/2010 09:51 AM, David S. Ahern wrote: > >>> > >>> On 08/03/10 12:43, Avi Kivity wrote: > >>>> libguestfs does not depend on an x86 architectural feature. > >>>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We should > >>>> discourage people from depending on this interface for production use. > >>> That is a feature of qemu - and an important one to me as well. Why > >>> should it be discouraged? You end up at the same place -- a running > >>> kernel and in-ram filesystem; why require going through a bootloader > >>> just because the hardware case needs it? > >> > >> It's smoke and mirrors. We're still providing a boot loader it's > >> just a little tiny one that we've written soley for this purpose. > >> > >> And it works fine for production use. The question is whether we > >> ought to be aggressively optimizing it for large initrd sizes. To > >> be honest, after a lot of discussion of possibilities, I've come to > >> the conclusion that it's just not worth it. > >> > >> There are better ways like using string I/O and optimizing the PIO > >> path in the kernel. That should cut down the 1s slow down with a > >> 100MB initrd by a bit. But honestly, shaving a couple hundred ms > >> further off the initrd load is just not worth it using the current > >> model. > >> > > The slow down is not 1s any more. String PIO emulation had many bugs > > that were fixed in 2.6.35. I verified how much time it took to load 100M > > via fw_cfg interface on older kernel and on 2.6.35. On older kernels on > > my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations > > that was already committed make it 20s. I have some code prototype that > > makes it 11s. I don't see how we can get below that, surely not back to > > ~2-3sec. > > What exactly is the reason for the slowdown? It can't be only boundary and permission checks, right? > > The big part of slowdown right now is that write into memory is done for each byte. It means for each byte we call kvm_write_guest() and kvm_mmu_pte_write(). The second call is needed in case memory, instruction is trying to write to, is shadowed. Previously we didn't checked for that at all. This can be mitigated by introducing write cache and do combined writes into the memory and unshadow the page if there is more then one write into it. This optimization saves ~10secs. Currently string emulation enter guest from time to time to check if event injection is needed and read from userspace is done in 1K chunks, not 4K like it was, but when I made reads to be 4K and disabled guest reentry I haven't seen any speed improvements worth talking about. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 15:48 ` Gleb Natapov @ 2010-08-04 15:59 ` Alexander Graf 2010-08-04 16:08 ` Gleb Natapov 0 siblings, 1 reply; 151+ messages in thread From: Alexander Graf @ 2010-08-04 15:59 UTC (permalink / raw) To: Gleb Natapov Cc: kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, David S. Ahern On 04.08.2010, at 17:48, Gleb Natapov wrote: > On Wed, Aug 04, 2010 at 05:31:12PM +0200, Alexander Graf wrote: >> >> On 04.08.2010, at 17:25, Gleb Natapov wrote: >> >>> On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote: >>>> On 08/04/2010 09:51 AM, David S. Ahern wrote: >>>>> >>>>> On 08/03/10 12:43, Avi Kivity wrote: >>>>>> libguestfs does not depend on an x86 architectural feature. >>>>>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We should >>>>>> discourage people from depending on this interface for production use. >>>>> That is a feature of qemu - and an important one to me as well. Why >>>>> should it be discouraged? You end up at the same place -- a running >>>>> kernel and in-ram filesystem; why require going through a bootloader >>>>> just because the hardware case needs it? >>>> >>>> It's smoke and mirrors. We're still providing a boot loader it's >>>> just a little tiny one that we've written soley for this purpose. >>>> >>>> And it works fine for production use. The question is whether we >>>> ought to be aggressively optimizing it for large initrd sizes. To >>>> be honest, after a lot of discussion of possibilities, I've come to >>>> the conclusion that it's just not worth it. >>>> >>>> There are better ways like using string I/O and optimizing the PIO >>>> path in the kernel. That should cut down the 1s slow down with a >>>> 100MB initrd by a bit. But honestly, shaving a couple hundred ms >>>> further off the initrd load is just not worth it using the current >>>> model. >>>> >>> The slow down is not 1s any more. String PIO emulation had many bugs >>> that were fixed in 2.6.35. I verified how much time it took to load 100M >>> via fw_cfg interface on older kernel and on 2.6.35. On older kernels on >>> my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations >>> that was already committed make it 20s. I have some code prototype that >>> makes it 11s. I don't see how we can get below that, surely not back to >>> ~2-3sec. >> >> What exactly is the reason for the slowdown? It can't be only boundary and permission checks, right? >> >> > The big part of slowdown right now is that write into memory is done > for each byte. It means for each byte we call kvm_write_guest() and > kvm_mmu_pte_write(). The second call is needed in case memory, instruction > is trying to write to, is shadowed. Previously we didn't checked for > that at all. This can be mitigated by introducing write cache and do > combined writes into the memory and unshadow the page if there is more > then one write into it. This optimization saves ~10secs. Currently string Ok, so you tackled that bit already. > emulation enter guest from time to time to check if event injection is > needed and read from userspace is done in 1K chunks, not 4K like it was, > but when I made reads to be 4K and disabled guest reentry I haven't seen > any speed improvements worth talking about. So what are we wasting those 10 seconds on then? Does perf tell you anything useful? Alex ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 15:59 ` Alexander Graf @ 2010-08-04 16:08 ` Gleb Natapov 2010-08-04 16:48 ` Avi Kivity 0 siblings, 1 reply; 151+ messages in thread From: Gleb Natapov @ 2010-08-04 16:08 UTC (permalink / raw) To: Alexander Graf Cc: kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, David S. Ahern On Wed, Aug 04, 2010 at 05:59:40PM +0200, Alexander Graf wrote: > > On 04.08.2010, at 17:48, Gleb Natapov wrote: > > > On Wed, Aug 04, 2010 at 05:31:12PM +0200, Alexander Graf wrote: > >> > >> On 04.08.2010, at 17:25, Gleb Natapov wrote: > >> > >>> On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote: > >>>> On 08/04/2010 09:51 AM, David S. Ahern wrote: > >>>>> > >>>>> On 08/03/10 12:43, Avi Kivity wrote: > >>>>>> libguestfs does not depend on an x86 architectural feature. > >>>>>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We should > >>>>>> discourage people from depending on this interface for production use. > >>>>> That is a feature of qemu - and an important one to me as well. Why > >>>>> should it be discouraged? You end up at the same place -- a running > >>>>> kernel and in-ram filesystem; why require going through a bootloader > >>>>> just because the hardware case needs it? > >>>> > >>>> It's smoke and mirrors. We're still providing a boot loader it's > >>>> just a little tiny one that we've written soley for this purpose. > >>>> > >>>> And it works fine for production use. The question is whether we > >>>> ought to be aggressively optimizing it for large initrd sizes. To > >>>> be honest, after a lot of discussion of possibilities, I've come to > >>>> the conclusion that it's just not worth it. > >>>> > >>>> There are better ways like using string I/O and optimizing the PIO > >>>> path in the kernel. That should cut down the 1s slow down with a > >>>> 100MB initrd by a bit. But honestly, shaving a couple hundred ms > >>>> further off the initrd load is just not worth it using the current > >>>> model. > >>>> > >>> The slow down is not 1s any more. String PIO emulation had many bugs > >>> that were fixed in 2.6.35. I verified how much time it took to load 100M > >>> via fw_cfg interface on older kernel and on 2.6.35. On older kernels on > >>> my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations > >>> that was already committed make it 20s. I have some code prototype that > >>> makes it 11s. I don't see how we can get below that, surely not back to > >>> ~2-3sec. > >> > >> What exactly is the reason for the slowdown? It can't be only boundary and permission checks, right? > >> > >> > > The big part of slowdown right now is that write into memory is done > > for each byte. It means for each byte we call kvm_write_guest() and > > kvm_mmu_pte_write(). The second call is needed in case memory, instruction > > is trying to write to, is shadowed. Previously we didn't checked for > > that at all. This can be mitigated by introducing write cache and do > > combined writes into the memory and unshadow the page if there is more > > then one write into it. This optimization saves ~10secs. Currently string > > Ok, so you tackled that bit already. > > > emulation enter guest from time to time to check if event injection is > > needed and read from userspace is done in 1K chunks, not 4K like it was, > > but when I made reads to be 4K and disabled guest reentry I haven't seen > > any speed improvements worth talking about. > > So what are we wasting those 10 seconds on then? Does perf tell you anything useful? > Not 10, but 7-8 seconds. After applying cache fix nothing definite as far as I remember (I ran it last time almost 2 week ago, need to rerun). Code always go through emulator now and check direction flags to update SI/DI accordingly. Emulator is a big switch and it calls various callbacks that may also slow things down. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 16:08 ` Gleb Natapov @ 2010-08-04 16:48 ` Avi Kivity 0 siblings, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-04 16:48 UTC (permalink / raw) To: Gleb Natapov Cc: Alexander Graf, kvm, qemu-devel, Richard W.M. Jones, David S. Ahern On 08/04/2010 07:08 PM, Gleb Natapov wrote: > > After applying cache fix nothing definite as far as I remember (I ran it last time > almost 2 week ago, need to rerun). Code always go through emulator now > and check direction flags to update SI/DI accordingly. Emulator is a big > switch and it calls various callbacks that may also slow things down. > We can have it set up a fast path. Similar to how real hardware optimizes 'rep movs' to copy complete cachelines. The emulator does all the checks, sets up a callback to be called on completion or when an interrupt is made pending, and lets x86.c do all the work. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 15:25 ` Gleb Natapov 2010-08-04 15:31 ` Alexander Graf @ 2010-08-04 23:17 ` Kevin O'Connor 2010-08-05 5:26 ` Gleb Natapov 1 sibling, 1 reply; 151+ messages in thread From: Kevin O'Connor @ 2010-08-04 23:17 UTC (permalink / raw) To: Gleb Natapov Cc: kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, David S. Ahern On Wed, Aug 04, 2010 at 06:25:52PM +0300, Gleb Natapov wrote: > On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote: > > There are better ways like using string I/O and optimizing the PIO > > path in the kernel. That should cut down the 1s slow down with a > > 100MB initrd by a bit. But honestly, shaving a couple hundred ms > > further off the initrd load is just not worth it using the current > > model. > > > The slow down is not 1s any more. String PIO emulation had many bugs > that were fixed in 2.6.35. I verified how much time it took to load 100M > via fw_cfg interface on older kernel and on 2.6.35. On older kernels on > my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations > that was already committed make it 20s. I have some code prototype that > makes it 11s. I don't see how we can get below that, surely not back to > ~2-3sec. I guess this slowness is primarily for kvm. I just ran some tests on the latest qemu (with TCG). I pulled in a 400Meg file over fw_cfg using the SeaBIOS interface - it takes 9.8 seconds (pretty consistently). Oddly, if I change SeaBIOS to use insb (string pio) it takes 11.5 seconds (again, pretty consistently). These times were measured on the host - they don't include the extra time it takes qemu to start up (during which it reads the file into its memory). -Kevin ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-04 23:17 ` Kevin O'Connor @ 2010-08-05 5:26 ` Gleb Natapov 0 siblings, 0 replies; 151+ messages in thread From: Gleb Natapov @ 2010-08-05 5:26 UTC (permalink / raw) To: Kevin O'Connor Cc: kvm, qemu-devel, Richard W.M. Jones, Avi Kivity, David S. Ahern On Wed, Aug 04, 2010 at 07:17:30PM -0400, Kevin O'Connor wrote: > On Wed, Aug 04, 2010 at 06:25:52PM +0300, Gleb Natapov wrote: > > On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote: > > > There are better ways like using string I/O and optimizing the PIO > > > path in the kernel. That should cut down the 1s slow down with a > > > 100MB initrd by a bit. But honestly, shaving a couple hundred ms > > > further off the initrd load is just not worth it using the current > > > model. > > > > > The slow down is not 1s any more. String PIO emulation had many bugs > > that were fixed in 2.6.35. I verified how much time it took to load 100M > > via fw_cfg interface on older kernel and on 2.6.35. On older kernels on > > my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations > > that was already committed make it 20s. I have some code prototype that > > makes it 11s. I don't see how we can get below that, surely not back to > > ~2-3sec. > > I guess this slowness is primarily for kvm. I just ran some tests on > the latest qemu (with TCG). I pulled in a 400Meg file over fw_cfg > using the SeaBIOS interface - it takes 9.8 seconds (pretty > consistently). Oddly, if I change SeaBIOS to use insb (string pio) it > takes 11.5 seconds (again, pretty consistently). These times were > measured on the host - they don't include the extra time it takes qemu > to start up (during which it reads the file into its memory). > Yes only KVM is affected, nothing has changed in qemu itself. -- Gleb. ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 16:50 ` Avi Kivity 2010-08-03 16:53 ` Anthony Liguori @ 2010-08-03 16:56 ` Anthony Liguori 1 sibling, 0 replies; 151+ messages in thread From: Anthony Liguori @ 2010-08-03 16:56 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, Richard W.M. Jones, Gleb Natapov, qemu-devel On 08/03/2010 11:50 AM, Avi Kivity wrote: > On 08/03/2010 07:46 PM, Anthony Liguori wrote: >>> It doesn't appear to support live migration, or hiding the feature >>> for -M older. >>> >>> It's not a good path to follow. Tomorrow we'll need to load 300MB >>> initrds and we'll have to rework this yet again. Meanwhile the >>> kernel and virtio support demand loading of any image size you'd >>> want to use. >> >> >> firmware is totally broken with respect to -M older FWIW. >> > > Well, then this is adding to the brokenness. > > fwcfg dma is going to have exactly one user, libguestfs. Much better > to have libguestfs move to some other interface and improve are > users-to-interfaces ratio. BTW, the brokenness is that regardless of -M older, we always use the newest firmware. Because always use the newest firmware, fwcfg is not a backwards compatible interface. Migration totally screws this up. While we migrate roms (and correctly now thanks to Alex's patches), we size the allocation based on the newest firmware size. That means if we ever decreased the size of a rom, we'd see total failure (even if we had a compatible fwcfg interface). Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 16:44 ` Avi Kivity 2010-08-03 16:46 ` Anthony Liguori @ 2010-08-03 16:48 ` Avi Kivity 2010-08-03 17:00 ` Richard W.M. Jones 2010-08-03 16:56 ` Richard W.M. Jones 2 siblings, 1 reply; 151+ messages in thread From: Avi Kivity @ 2010-08-03 16:48 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: qemu-devel, Gleb Natapov, kvm On 08/03/2010 07:44 PM, Avi Kivity wrote: > > It's not a good path to follow. Tomorrow we'll need to load 300MB > initrds and we'll have to rework this yet again. Meanwhile the kernel > and virtio support demand loading of any image size you'd want to use. > Even better would be to use virtio-9p. You don't even need an image in this case. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 16:48 ` Avi Kivity @ 2010-08-03 17:00 ` Richard W.M. Jones 2010-08-03 17:05 ` Avi Kivity 0 siblings, 1 reply; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-03 17:00 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel On Tue, Aug 03, 2010 at 07:48:17PM +0300, Avi Kivity wrote: > On 08/03/2010 07:44 PM, Avi Kivity wrote: > > > >It's not a good path to follow. Tomorrow we'll need to load 300MB > >initrds and we'll have to rework this yet again. Meanwhile the > >kernel and virtio support demand loading of any image size you'd > >want to use. > > > > Even better would be to use virtio-9p. You don't even need an image > in this case. We don't want to expose the whole host filesystem, just selected files, and we want to use our own configuration files (basically that's what is in the skeleton part that we do ship). Of course, if we can use virtio-9p, then excellent. Is there good documentation about virtio-9p? What I can find is fragmentary or based on reading qemu -help ... Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/ ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 17:00 ` Richard W.M. Jones @ 2010-08-03 17:05 ` Avi Kivity 0 siblings, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-03 17:05 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: qemu-devel On 08/03/2010 08:00 PM, Richard W.M. Jones wrote: > On Tue, Aug 03, 2010 at 07:48:17PM +0300, Avi Kivity wrote: >> On 08/03/2010 07:44 PM, Avi Kivity wrote: >>> It's not a good path to follow. Tomorrow we'll need to load 300MB >>> initrds and we'll have to rework this yet again. Meanwhile the >>> kernel and virtio support demand loading of any image size you'd >>> want to use. >>> >> Even better would be to use virtio-9p. You don't even need an image >> in this case. > We don't want to expose the whole host filesystem, just selected > files, and we want to use our own configuration files (basically > that's what is in the skeleton part that we do ship). True. The guest might landmine its disks with something that the libguestfs kernel would step on an be exploited. You might hardlink the needed files into a private directory tree. > Of course, if we can use virtio-9p, then excellent. Is there good > documentation about virtio-9p? What I can find is fragmentary or > based on reading qemu -help ... Not to my knowledge. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 16:44 ` Avi Kivity 2010-08-03 16:46 ` Anthony Liguori 2010-08-03 16:48 ` Avi Kivity @ 2010-08-03 16:56 ` Richard W.M. Jones 2010-08-03 17:08 ` Avi Kivity 2 siblings, 1 reply; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-03 16:56 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel, Gleb Natapov, kvm On Tue, Aug 03, 2010 at 07:44:49PM +0300, Avi Kivity wrote: > On 08/03/2010 07:28 PM, Richard W.M. Jones wrote: > >I have posted a small patch which makes this 650x faster without > >appreciable complication. > > It doesn't appear to support live migration, or hiding the feature > for -M older. AFAICT live migration should still work (even assuming someone live migrates a domain during early boot, which seems pretty unlikely ...) Maybe you mean live migration of the dma_* global variables? I can fix that. > It's not a good path to follow. Tomorrow we'll need to load 300MB > initrds and we'll have to rework this yet again. Not a very good straw man ... The patch would take ~300ms instead of ~115ms, versus something like 2 mins 40 seconds with the current method. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://et.redhat.com/~rjones/virt-top ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 16:56 ` Richard W.M. Jones @ 2010-08-03 17:08 ` Avi Kivity 0 siblings, 0 replies; 151+ messages in thread From: Avi Kivity @ 2010-08-03 17:08 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: qemu-devel, Gleb Natapov, kvm On 08/03/2010 07:56 PM, Richard W.M. Jones wrote: > On Tue, Aug 03, 2010 at 07:44:49PM +0300, Avi Kivity wrote: >> On 08/03/2010 07:28 PM, Richard W.M. Jones wrote: >>> I have posted a small patch which makes this 650x faster without >>> appreciable complication. >> It doesn't appear to support live migration, or hiding the feature >> for -M older. > AFAICT live migration should still work (even assuming someone live > migrates a domain during early boot, which seems pretty unlikely ...) Live migration is sometimes performed automatically by management tools, which have no idea (nor do they care) what the guest is doing. > Maybe you mean live migration of the dma_* global variables? I can > fix that. Yes. >> It's not a good path to follow. Tomorrow we'll need to load 300MB >> initrds and we'll have to rework this yet again. > Not a very good straw man ... The patch would take ~300ms instead > of ~115ms, versus something like 2 mins 40 seconds with the current > method. > It's still 300ms extra time, with a 900MB footprint. btw, a DMA interface which blocks the guest and/or qemu for 115ms is not something we want to introduce to qemu. dma is hard, doing something simple means it won't work very well. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 14:53 ` Richard W.M. Jones 2010-08-03 16:10 ` Avi Kivity @ 2010-08-03 16:39 ` Anthony Liguori 2010-08-03 16:43 ` Richard W.M. Jones 1 sibling, 1 reply; 151+ messages in thread From: Anthony Liguori @ 2010-08-03 16:39 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: kvm, Avi Kivity, Gleb Natapov, qemu-devel On 08/03/2010 09:53 AM, Richard W.M. Jones wrote: > On Tue, Aug 03, 2010 at 05:38:25PM +0300, Avi Kivity wrote: > >> The time will only continue to grow as you add features and as the >> distro bloats naturally. >> >> Much better to create it once and only update it if some dependent >> file changes (basically the current on-the-fly code + save a list of >> file timestamps). >> > This applies to both cases, the initrd could also be saved, so: > > >>> Total saving: 115ms. >>> >> 815 ms by my arithmetic. >> > no, not true, 115ms. > > >> You also save 3*N-2*P memory where N is the size of your initrd and >> P is the actual amount used by the guest. >> > Can you explain this? > > >> Loading a file into memory is plenty fast if you use the standard >> interfaces. -kernel -initrd is a specialized interface. >> > Why bother with any command line options at all? After all, they keep > changing and causing problems for qemu's users ... Apparently we're > all doing stuff "wrong", in ways that are never explained by the > developers. > Let's be fair. I think we've all agreed to adjust the fw_cfg interface to implement DMA. The only requirement was that the DMA operation not be triggered from a single port I/O but rather based on a polling operation which better fits the way real hardware works. Is this a regression? Probably. But performance regressions that result from correctness fixes don't get reverted. We have to find an approach to improve performance without impacting correctness. That said, the general view of -kernel/-append is that these are developer options and we don't really look at it as a performance critical interface. We could do a better job of communicating this to users but that's true of most of the features we support. Regards, Anthony Liguori > Rich. > > ^ permalink raw reply [flat|nested] 151+ messages in thread
* Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? 2010-08-03 16:39 ` Anthony Liguori @ 2010-08-03 16:43 ` Richard W.M. Jones 0 siblings, 0 replies; 151+ messages in thread From: Richard W.M. Jones @ 2010-08-03 16:43 UTC (permalink / raw) To: Anthony Liguori; +Cc: qemu-devel On Tue, Aug 03, 2010 at 11:39:43AM -0500, Anthony Liguori wrote: > Let's be fair. I think we've all agreed to adjust the fw_cfg > interface to implement DMA. The only requirement was that the DMA > operation not be triggered from a single port I/O but rather based > on a polling operation which better fits the way real hardware > works. The patch I posted requires that the caller poll a register, so hopefully this requirement is satisfied. The other requirement was that the interface be discoverable, which is also something in the latest version of the patch that I just posted. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v ^ permalink raw reply [flat|nested] 151+ messages in thread
end of thread, other threads:[~2010-08-05 15:34 UTC | newest] Thread overview: 151+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-03 11:13 [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35? Richard W.M. Jones 2010-08-03 11:33 ` Gleb Natapov 2010-08-03 12:10 ` Richard W.M. Jones 2010-08-03 12:37 ` Gleb Natapov 2010-08-03 12:48 ` Richard W.M. Jones 2010-08-03 13:19 ` Avi Kivity 2010-08-03 14:05 ` Richard W.M. Jones 2010-08-03 14:38 ` Avi Kivity 2010-08-03 14:53 ` Richard W.M. Jones 2010-08-03 16:10 ` Avi Kivity 2010-08-03 16:28 ` Richard W.M. Jones 2010-08-03 16:44 ` Avi Kivity 2010-08-03 16:46 ` Anthony Liguori 2010-08-03 16:50 ` Avi Kivity 2010-08-03 16:53 ` Anthony Liguori 2010-08-03 17:01 ` Avi Kivity 2010-08-03 17:42 ` Anthony Liguori 2010-08-03 17:58 ` Avi Kivity 2010-08-03 18:11 ` Richard W.M. Jones 2010-08-03 18:26 ` Anthony Liguori 2010-08-03 18:43 ` Avi Kivity 2010-08-03 18:47 ` Avi Kivity 2010-08-03 18:55 ` Anthony Liguori 2010-08-03 19:00 ` Avi Kivity 2010-08-03 19:05 ` Gleb Natapov 2010-08-03 19:09 ` Avi Kivity 2010-08-03 19:15 ` Anthony Liguori 2010-08-03 19:24 ` Avi Kivity 2010-08-03 19:38 ` Anthony Liguori 2010-08-03 19:41 ` Avi Kivity 2010-08-03 19:47 ` Anthony Liguori 2010-08-04 5:47 ` Avi Kivity 2010-08-03 21:24 ` Gerd Hoffmann 2010-08-03 21:20 ` Gerd Hoffmann 2010-08-04 5:53 ` Avi Kivity 2010-08-04 7:56 ` Gerd Hoffmann 2010-08-04 8:17 ` Avi Kivity 2010-08-04 8:43 ` Gleb Natapov 2010-08-04 9:22 ` Gerd Hoffmann 2010-08-04 13:04 ` Anthony Liguori 2010-08-04 13:07 ` Gleb Natapov 2010-08-04 13:15 ` Anthony Liguori 2010-08-04 13:24 ` Richard W.M. Jones 2010-08-04 13:26 ` Gleb Natapov 2010-08-04 14:22 ` Anthony Liguori 2010-08-04 14:38 ` Gleb Natapov 2010-08-04 14:50 ` Anthony Liguori 2010-08-04 15:01 ` Gleb Natapov 2010-08-04 15:07 ` Anthony Liguori 2010-08-04 15:15 ` Gleb Natapov 2010-08-04 22:41 ` Kevin O'Connor 2010-08-04 16:26 ` Avi Kivity 2010-08-04 13:34 ` Gleb Natapov 2010-08-04 13:52 ` Anthony Liguori 2010-08-04 14:00 ` Gleb Natapov 2010-08-04 14:14 ` Anthony Liguori 2010-08-04 14:36 ` Gleb Natapov 2010-08-04 14:22 ` Paolo Bonzini 2010-08-04 14:39 ` Anthony Liguori 2010-08-04 16:33 ` Avi Kivity 2010-08-04 16:30 ` Avi Kivity 2010-08-04 16:36 ` Avi Kivity 2010-08-04 16:44 ` Anthony Liguori 2010-08-04 16:52 ` Avi Kivity 2010-08-04 17:37 ` Gleb Natapov 2010-08-05 7:28 ` Gerd Hoffmann 2010-08-05 7:34 ` Gleb Natapov 2010-08-05 7:56 ` Avi Kivity 2010-08-05 7:59 ` Gleb Natapov 2010-08-05 8:45 ` Avi Kivity 2010-08-05 8:48 ` Gleb Natapov 2010-08-05 13:43 ` Anthony Liguori 2010-08-04 16:45 ` Alexander Graf 2010-08-04 16:54 ` Avi Kivity 2010-08-04 17:01 ` Alexander Graf 2010-08-04 17:14 ` Avi Kivity 2010-08-04 17:27 ` Alexander Graf 2010-08-04 17:34 ` Avi Kivity 2010-08-04 20:06 ` David S. Ahern 2010-08-04 20:16 ` Richard W.M. Jones 2010-08-05 2:38 ` Avi Kivity 2010-08-04 17:26 ` Anthony Liguori 2010-08-04 17:31 ` Alexander Graf 2010-08-04 17:35 ` Avi Kivity 2010-08-04 17:36 ` Anthony Liguori 2010-08-04 17:36 ` Alexander Graf 2010-08-04 17:46 ` Richard W.M. Jones 2010-08-04 17:50 ` Avi Kivity 2010-08-04 18:13 ` Alexander Graf 2010-08-04 18:16 ` Anthony Liguori 2010-08-04 18:18 ` Alexander Graf 2010-08-04 18:19 ` Avi Kivity 2010-08-04 18:18 ` Avi Kivity 2010-08-04 16:42 ` Anthony Liguori 2010-08-04 13:22 ` Richard W.M. Jones 2010-08-04 13:29 ` Gleb Natapov 2010-08-04 16:25 ` Avi Kivity 2010-08-03 22:06 ` Richard W.M. Jones 2010-08-04 5:54 ` Avi Kivity 2010-08-04 9:24 ` Richard W.M. Jones 2010-08-04 9:27 ` Gleb Natapov 2010-08-04 9:52 ` Avi Kivity 2010-08-04 11:33 ` Richard W.M. Jones 2010-08-04 11:36 ` Avi Kivity 2010-08-04 12:07 ` Gleb Natapov 2010-08-04 12:59 ` Anthony Liguori 2010-08-03 19:26 ` Gleb Natapov 2010-08-03 19:13 ` Richard W.M. Jones 2010-08-03 19:17 ` Gleb Natapov 2010-08-03 19:19 ` Anthony Liguori 2010-08-03 19:22 ` Avi Kivity 2010-08-03 20:00 ` Richard W.M. Jones 2010-08-03 20:49 ` Anthony Liguori 2010-08-03 21:13 ` Paolo Bonzini 2010-08-03 21:34 ` Anthony Liguori 2010-08-04 7:57 ` Paolo Bonzini 2010-08-04 8:19 ` Avi Kivity 2010-08-04 12:53 ` Anthony Liguori 2010-08-04 16:44 ` Avi Kivity 2010-08-04 16:46 ` Anthony Liguori 2010-08-04 16:48 ` Alexander Graf 2010-08-04 16:49 ` Anthony Liguori 2010-08-04 16:51 ` Alexander Graf 2010-08-04 17:01 ` Paolo Bonzini 2010-08-04 17:19 ` Avi Kivity 2010-08-04 17:25 ` Alexander Graf 2010-08-04 17:27 ` Anthony Liguori 2010-08-04 17:37 ` Avi Kivity 2010-08-04 17:53 ` Anthony Liguori 2010-08-04 18:05 ` Alexander Graf 2010-08-04 5:56 ` Avi Kivity 2010-08-04 1:17 ` Jamie Lokier 2010-08-04 8:21 ` Avi Kivity 2010-08-04 14:51 ` David S. Ahern 2010-08-04 14:57 ` Anthony Liguori 2010-08-04 15:25 ` Gleb Natapov 2010-08-04 15:31 ` Alexander Graf 2010-08-04 15:48 ` Gleb Natapov 2010-08-04 15:59 ` Alexander Graf 2010-08-04 16:08 ` Gleb Natapov 2010-08-04 16:48 ` Avi Kivity 2010-08-04 23:17 ` Kevin O'Connor 2010-08-05 5:26 ` Gleb Natapov 2010-08-03 16:56 ` Anthony Liguori 2010-08-03 16:48 ` Avi Kivity 2010-08-03 17:00 ` Richard W.M. Jones 2010-08-03 17:05 ` Avi Kivity 2010-08-03 16:56 ` Richard W.M. Jones 2010-08-03 17:08 ` Avi Kivity 2010-08-03 16:39 ` Anthony Liguori 2010-08-03 16:43 ` Richard W.M. Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).