From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MxHTR-0000z2-4q for qemu-devel@nongnu.org; Mon, 12 Oct 2009 05:45:49 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MxHTM-0000wb-JD for qemu-devel@nongnu.org; Mon, 12 Oct 2009 05:45:48 -0400 Received: from [199.232.76.173] (port=47101 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MxHTM-0000wG-3P for qemu-devel@nongnu.org; Mon, 12 Oct 2009 05:45:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:29999) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MxHTL-0001YZ-JA for qemu-devel@nongnu.org; Mon, 12 Oct 2009 05:45:43 -0400 Date: Mon, 12 Oct 2009 11:43:35 +0200 From: "Michael S. Tsirkin" Message-ID: <20091012094335.GC10741@redhat.com> References: <1255287547-28329-1-git-send-email-gleb@redhat.com> <1255287547-28329-4-git-send-email-gleb@redhat.com> <20091011214820.GB6411@redhat.com> <20091012064433.GR16702@redhat.com> <20091012071052.GA10741@redhat.com> <20091012072202.GU16702@redhat.com> <20091012081314.GB10741@redhat.com> <20091012084858.GY16702@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20091012084858.GY16702@redhat.com> Subject: [Qemu-devel] Re: [PATCH 4/5] Make MMIO address page aligned in guest. List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Gleb Natapov Cc: kevin@koconnor.net, qemu-devel@nongnu.org On Mon, Oct 12, 2009 at 10:48:58AM +0200, Gleb Natapov wrote: > On Mon, Oct 12, 2009 at 10:13:14AM +0200, Michael S. Tsirkin wrote: > > > > > > > > > > > > This wastes memory for non-assigned devices. I think it's better, and > > > > > > cleaner, to make qemu increase the BAR size up to 4K for assigned > > > > > > devices if it wants page size alignment. > > > > > > > > > > > We have three and a half devices in QEUM so I don't think memory is a > > > > > big concern. Regardless, if you think that fiddle with assigned devices > > > > > responses is better idea go ahead and send patches. > > > > > > > > Even if you fiddle with BIOS, guest is allowed to reassign BARs, > > > > breaking your assumptions. > > > Good point. So the fact that this patched helped its creator shows that > > > linux doesn't do this. > > > > Try hot-plugging the device instead of have it present on boot. > > Patching BIOS won't help then, will it? So my question is, if we need > > to handle this in qemu, is it worth it to do it in kvm as well? > > > It depend how linux assign mmio address to hot pluggable devices. How > can you be sure a device driver continue working if you'll misrepresent > BAR size BTW? Yes, this adds yet another way for device to discover it's running in a VM, so this might break some drivers. If we see many of these in practice, we can try adding a PCI-to-PCI bridge with some dummy devices behind it to the picture, to increase the chances to get a dedicated memory page. > > > > > As it stands this > > > > > patch is in kvm's bios and is required for assigned devices to work > > > > > for some devices, so moving to seabios without this patch will introduce > > > > > a regression. > > > > > > > > I have a question here: if kvm maps a full physical page > > > > into guest memory, while device only uses part of the page, > > > > won't that mean that guest is granted access outside the > > > > device, which it should not have? > > > And how is real HW different? It maps a full physical page into OS > > > memory even if BAR is smaller then page and grants OS access to > > > unassigned mmio region. Access unassigned mmio region shouldn't cause > > > any trouble, doesn't it? > > > > Unassigned - typically no, but there can be another device there, or a RAM > > page. It is different on real hardware where OS has access to all RAM and all > > devices, anyway. > > > > Here's an example from my laptop: > > > > 00:03.0 Communication controller: Intel Corporation Mobile 4 Series Chipset MEI Controller (rev 07) > > Subsystem: Lenovo Device 20e6 > > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- > Latency: 0 > > Interrupt: pin A routed to IRQ 11 > > Region 0: Memory at fc226800 (64-bit, non-prefetchable) [size=16] > > Capabilities: > > > > ... > > > > 00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03) (prog-if 01 [AHCI 1.0]) > > Subsystem: Lenovo Device 20f8 > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > > Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- > Latency: 0 > > Interrupt: pin B routed to IRQ 28 > > Region 0: I/O ports at 1c48 [size=8] > > Region 1: I/O ports at 183c [size=4] > > Region 2: I/O ports at 1c40 [size=8] > > Region 3: I/O ports at 1838 [size=4] > > Region 4: I/O ports at 1c20 [size=32] > > Region 5: Memory at fc226000 (32-bit, non-prefetchable) [size=2K] > > Capabilities: > > Kernel driver in use: ahci > > > > In this setup, if you assign a page at address fc226000, for SATA, > > I think that guest will be able to control Communication controller as well. > Who configures BARs for assigned device guest or host? Host. > If host you can't safely passthrough one of those devices. Why not? > But passthrough is not secure anyway since guest can DMA all over host > memory. That's why we only enable it with I/O mmu, right? > > > > > > Maybe the solution is to disable bypass for sub-page BARs and to > > > > handle them in qemu, where we don't have alignment restrictions? > > > > > > > Making fast path go through qemu for assigned devices? May be remove > > > this pass through crap from kvm to save us all from this misery then? > > > > Another option is for KVM to check these scenarious and deny assignment if > > there's such an overlap. > One more constrain for device assignment. Simple real life scenarios > don't work for our users as it is. Adding more constrains will not help. For linux host, you can force resource alignment using a kernel parameter. What do you suggest? Ignore this issue? > > > > > > > > > > > > > > > --- > > > > > > > src/pciinit.c | 7 +++++++ > > > > > > > 1 files changed, 7 insertions(+), 0 deletions(-) > > > > > > > > > > > > > > diff --git a/src/pciinit.c b/src/pciinit.c > > > > > > > index 29b3901..53fbfcf 100644 > > > > > > > --- a/src/pciinit.c > > > > > > > +++ b/src/pciinit.c > > > > > > > @@ -10,6 +10,7 @@ > > > > > > > #include "biosvar.h" // GET_EBDA > > > > > > > #include "pci_ids.h" // PCI_VENDOR_ID_INTEL > > > > > > > #include "pci_regs.h" // PCI_COMMAND > > > > > > > +#include "paravirt.h" > > > > > > > > > > > > > > #define PCI_ROM_SLOT 6 > > > > > > > #define PCI_NUM_REGIONS 7 > > > > > > > @@ -158,6 +159,12 @@ static void pci_bios_init_device(u16 bdf) > > > > > > > *paddr = ALIGN(*paddr, size); > > > > > > > pci_set_io_region_addr(bdf, i, *paddr); > > > > > > > *paddr += size; > > > > > > > + if (kvm_para_available()) { > > > > > > > + /* make memory address page aligned */ > > > > > > > + /* needed for device assignment on kvm */ > > > > > > > + if (!(val & PCI_BASE_ADDRESS_SPACE_IO)) > > > > > > > + *paddr = (*paddr + 0xfff) & 0xfffff000; > > > > > > > + } > > > > > > > } > > > > > > > } > > > > > > > break; > > > > > > > -- > > > > > > > 1.6.3.3 > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Gleb. > > > > > > -- > > > Gleb. > > -- > Gleb.