From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=60549 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OalkU-0001XX-Rh for qemu-devel@nongnu.org; Mon, 19 Jul 2010 04:30:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OalkT-0004Kc-7S for qemu-devel@nongnu.org; Mon, 19 Jul 2010 04:30:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:2050) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OalkS-0004KR-SU for qemu-devel@nongnu.org; Mon, 19 Jul 2010 04:30:53 -0400 Date: Mon, 19 Jul 2010 11:30:50 +0300 From: Gleb Natapov Subject: Re: [Qemu-devel] Question about qemu firmware configuration (fw_cfg) device Message-ID: <20100719083050.GG4689@redhat.com> References: <20100719062356.GU4689@redhat.com> <20100719072802.GO13194@amd.home.annexia.org> <20100719073312.GY4689@redhat.com> <4E9BBBA5-F2D1-4485-AFD3-8D6FDE3A3CCC@suse.de> <20100719075110.GB4689@redhat.com> <77A267F6-3646-4F22-B837-E1E7DBA06950@suse.de> <20100719080142.GE4689@redhat.com> <20100719081954.GF4689@redhat.com> <43B9EAA8-E3F5-4903-896C-DEBD90E06162@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <43B9EAA8-E3F5-4903-896C-DEBD90E06162@suse.de> List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexander Graf Cc: "Richard W.M. Jones" , qemu-devel@nongnu.org On Mon, Jul 19, 2010 at 10:24:46AM +0200, Alexander Graf wrote: > > On 19.07.2010, at 10:19, Gleb Natapov wrote: > > > On Mon, Jul 19, 2010 at 10:08:57AM +0200, Alexander Graf wrote: > >> > >> On 19.07.2010, at 10:01, Gleb Natapov wrote: > >> > >>> On Mon, Jul 19, 2010 at 09:57:02AM +0200, Alexander Graf wrote: > >>>> > >>>> On 19.07.2010, at 09:51, Gleb Natapov wrote: > >>>> > >>>>> On Mon, Jul 19, 2010 at 09:40:18AM +0200, Alexander Graf wrote: > >>>>>> > >>>>>> On 19.07.2010, at 09:33, Gleb Natapov wrote: > >>>>>> > >>>>>>> On Mon, Jul 19, 2010 at 08:28:02AM +0100, Richard W.M. Jones wrote: > >>>>>>>> On Mon, Jul 19, 2010 at 09:23:56AM +0300, Gleb Natapov wrote: > >>>>>>>>> That what I am warring about too. If we are adding device we have to be > >>>>>>>>> sure such device can actually exist on real hw too otherwise we may have > >>>>>>>>> problems later. > >>>>>>>> > >>>>>>>> I don't understand why the constraints of real h/w have anything to do > >>>>>>>> with this. Can you explain? > >>>>>>>> > >>>>>>> Each time we do something not architectural it cause us troubles later. > >>>>>>> So constraints of real h/w is our constrains to. > >>>>>>> > >>>>>>>>> Also 1 second on 100M file does not look like huge gain to me. > >>>>>>>> > >>>>>>>> Every second counts. We're trying to get libguestfs boot times down > >>>>>>>> from 8-12 seconds to 4-5 seconds. For many cases it's an interactive > >>>>>>>> program. > >>>>>>>> > >>>>>>> So what about making initrd smaller? I remember managing two > >>>>>>> distribution in 64M flash in embedded project. > >>>>>> > >>>>>> Having a huge initrd basically helps in reusing a lot of existing code. We do the same - in general the initrd is just a subset of the applications of the host OS. And if you start putting perl or the likes into it, it becomes big. > >>>>>> > >>>>> Why not provide small disk/cdrom with all those utilities installed? > >>>> > >>>> Because - if the loading is done fast - this way everything's in RAM instantly. And you still have all devices available for use inside the system - that makes enumeration a lot easier. There are several reasons why and I don't think we should force different ways on people just because one component of our system is ineffective. > >>>> > >>> Loading huge initrd on real HW takes noticeably longer time that small > >>> one, so I would say that it is your design that is to blame here, not > >>> KVM. > >> > >> I disagree. Virtualization enables new use cases. The -initrd parameter is a very good example for that. It's something that you simply couldn't do on real hw. > >> > > How is it different from starting kernel/initrd from usb flash drive? > > The kernel and initrd are read directly from the host fs. It's more like a 9p grub boot. > There is no "host" on real HW :) But conceptually it's almost the same. 9p grub boot would be also nice. Hmm, I think PXE is closest to -kernel/-initrd option on real HW. > > > >>> > >>>>> > >>>>>> I guess the best thing for now really is to try and see which code paths insb goes along. It should really be coalesced. > >>>>>> > >>>>> It is coalesced to a certain extent (reenter guest every 1024 bytes, > >>>>> read from userspace page at a time). You need to continue injecting > >>>>> interrupt into a guest during long string operation and checking > >>>>> exception condition on a page boundaries. > >>>> > >>>> That still sounds slow. So yeah, adding DMA is probably the right way to go. But then again - if we model it after real hw it would be asynchronous, giving us an interrupt, causing even more headache. Ugh. > >>>> > >>>> Can't we just ignore real hw constraints here and have it available in guest ram once one particular PIO is done? No bus master, no interrupts, but full speed and simplicity/atomicity which also helps migration. > >>>> > >>> We shouldn't add devices that work not like real HW to speed up some > >>> pathological cases (and are slow on real HW too). > >> > >> Just because you don't use them doesn't mean they're pathological, really. We simply chose a bad interface for transferring reasonable big chunks of data and we need to fix that. If you want to look at it from a different perspective, it's a regression. Older qemu versions did map the kernel and initrd directly into guest ram, so now we're slower than back then. > >> > > I use them hundred time each day (at least -kernel part). If the > > interface is slow for your use case I have no problem with introducing > > new one, but the one that make sense in x86 architecture. I do not agree > > this is regression BTW. You can't compare buggy way of doing things and > > non-buggy way and say that bug fixing is a regression. > > > > What about adding new PCI card that holds kernel initrd in ROM bar? > > Yes and no. It sounds nice at first, but doesn't quite fit. There are two issues: > > 1) We need a new PCI ID We have our range. We can allocate from there. > 2) There can be a lot of initrd binaries with multiboot. We only have a limited amount of BARs > Is it supported now with fw_cfg interface? My main concern with this approach is huge BAR size that may take a lot of space from PCI MMIO range if guest OS decide to configure it. -- Gleb.