qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Gleb Natapov <gleb@redhat.com>
Cc: kevin@koconnor.net, qemu-devel@nongnu.org
Subject: [Qemu-devel] Re: [PATCH 4/5] Make MMIO address page aligned in guest.
Date: Mon, 12 Oct 2009 11:43:35 +0200	[thread overview]
Message-ID: <20091012094335.GC10741@redhat.com> (raw)
In-Reply-To: <20091012084858.GY16702@redhat.com>

On Mon, Oct 12, 2009 at 10:48:58AM +0200, Gleb Natapov wrote:
> On Mon, Oct 12, 2009 at 10:13:14AM +0200, Michael S. Tsirkin wrote:
> > > > > > 
> > > > > > This wastes memory for non-assigned devices.  I think it's better, and
> > > > > > cleaner, to make qemu increase the BAR size up to 4K for assigned
> > > > > > devices if it wants page size alignment.
> > > > > > 
> > > > > We have three and a half devices in QEUM so I don't think memory is a
> > > > > big concern. Regardless, if you think that fiddle with assigned devices
> > > > > responses is better idea go ahead and send patches.
> > > > 
> > > > Even if you fiddle with BIOS, guest is allowed to reassign BARs,
> > > > breaking your assumptions.
> > > Good point. So the fact that this patched helped its creator shows that
> > > linux doesn't do this.
> > 
> > Try hot-plugging the device instead of have it present on boot.
> > Patching BIOS won't help then, will it?  So my question is, if we need
> > to handle this in qemu, is it worth it to do it in kvm as well?
> > 
> It depend how linux assign mmio address to hot pluggable devices. How
> can you be sure a device driver continue working if you'll misrepresent
> BAR size BTW?

Yes, this adds yet another way for device to discover it's running in a
VM, so this might break some drivers.  If we see many of these in
practice, we can try adding a PCI-to-PCI bridge with some dummy devices
behind it to the picture, to increase the chances to get a dedicated
memory page.

> > > > > As it stands this
> > > > > patch is in kvm's bios and is required for assigned devices to work
> > > > > for some devices, so moving to seabios without this patch will introduce
> > > > > a regression.
> > > > 
> > > > I have a question here: if kvm maps a full physical page
> > > > into guest memory, while device only uses part of the page,
> > > > won't that mean that guest is granted access outside the
> > > > device, which it should not have?
> > > And how is real HW different? It maps a full physical page into OS
> > > memory even if BAR is smaller then page and grants OS access to
> > > unassigned mmio region. Access unassigned mmio region shouldn't cause
> > > any trouble, doesn't it?
> > 
> > Unassigned - typically no, but there can be another device there, or a RAM
> > page.  It is different on real hardware where OS has access to all RAM and all
> > devices, anyway.
> > 
> > Here's an example from my laptop:
> > 
> > 00:03.0 Communication controller: Intel Corporation Mobile 4 Series Chipset MEI Controller (rev 07)
> >         Subsystem: Lenovo Device 20e6
> >         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
> >         Latency: 0
> >         Interrupt: pin A routed to IRQ 11
> >         Region 0: Memory at fc226800 (64-bit, non-prefetchable) [size=16]
> >         Capabilities: <access denied>
> > 
> > ...
> > 
> > 00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03) (prog-if 01 [AHCI 1.0])
> >         Subsystem: Lenovo Device 20f8
> >         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> >         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Latency: 0
> >         Interrupt: pin B routed to IRQ 28
> >         Region 0: I/O ports at 1c48 [size=8]
> >         Region 1: I/O ports at 183c [size=4]
> >         Region 2: I/O ports at 1c40 [size=8]
> >         Region 3: I/O ports at 1838 [size=4]
> >         Region 4: I/O ports at 1c20 [size=32]
> >         Region 5: Memory at fc226000 (32-bit, non-prefetchable) [size=2K]
> >         Capabilities: <access denied>
> >         Kernel driver in use: ahci
> > 
> > In this setup, if you assign a page at address fc226000, for SATA,
> > I think that guest will be able to control Communication controller as well.
> Who configures BARs for assigned device guest or host?

Host.

> If host you can't safely passthrough one of those devices.

Why not?

> But passthrough is not secure anyway since guest can DMA all over host
> memory.


That's why we only enable it with I/O mmu, right?

> > 
> > > > Maybe the solution is to disable bypass for sub-page BARs and to
> > > > handle them in qemu, where we don't have alignment restrictions?
> > > > 
> > > Making fast path go through qemu for assigned devices? May be remove 
> > > this pass through crap from kvm to save us all from this misery then? 
> > 
> > Another option is for KVM to check these scenarious and deny assignment if
> > there's such an overlap.
> One more constrain for device assignment. Simple real life scenarios
> don't work for our users as it is. Adding more constrains will not help.

For linux host, you can force resource alignment using a kernel
parameter. What do you suggest? Ignore this issue?

> > 
> > > > > > 
> > > > > > > ---
> > > > > > >  src/pciinit.c |    7 +++++++
> > > > > > >  1 files changed, 7 insertions(+), 0 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/src/pciinit.c b/src/pciinit.c
> > > > > > > index 29b3901..53fbfcf 100644
> > > > > > > --- a/src/pciinit.c
> > > > > > > +++ b/src/pciinit.c
> > > > > > > @@ -10,6 +10,7 @@
> > > > > > >  #include "biosvar.h" // GET_EBDA
> > > > > > >  #include "pci_ids.h" // PCI_VENDOR_ID_INTEL
> > > > > > >  #include "pci_regs.h" // PCI_COMMAND
> > > > > > > +#include "paravirt.h"
> > > > > > >  
> > > > > > >  #define PCI_ROM_SLOT 6
> > > > > > >  #define PCI_NUM_REGIONS 7
> > > > > > > @@ -158,6 +159,12 @@ static void pci_bios_init_device(u16 bdf)
> > > > > > >                  *paddr = ALIGN(*paddr, size);
> > > > > > >                  pci_set_io_region_addr(bdf, i, *paddr);
> > > > > > >                  *paddr += size;
> > > > > > > +                if (kvm_para_available()) {
> > > > > > > +                    /* make memory address page aligned */
> > > > > > > +                    /* needed for device assignment on kvm */
> > > > > > > +                    if (!(val & PCI_BASE_ADDRESS_SPACE_IO))
> > > > > > > +                        *paddr = (*paddr + 0xfff) & 0xfffff000;
> > > > > > > +               }
> > > > > > >              }
> > > > > > >          }
> > > > > > >          break;
> > > > > > > -- 
> > > > > > > 1.6.3.3
> > > > > > > 
> > > > > > > 
> > > > > 
> > > > > --
> > > > > 			Gleb.
> > > 
> > > --
> > > 			Gleb.
> 
> --
> 			Gleb.

  reply	other threads:[~2009-10-12  9:45 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-11 18:59 [Qemu-devel] [PATCH 1/5] Generate mptable unconditionally Gleb Natapov
2009-10-11 18:59 ` [Qemu-devel] [PATCH 2/5] Enable power button event generation Gleb Natapov
2009-10-11 18:59 ` [Qemu-devel] [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR Gleb Natapov
2009-10-11 21:53   ` [Qemu-devel] " Michael S. Tsirkin
2009-10-12  6:50     ` Gleb Natapov
2009-10-12  9:52       ` Michael S. Tsirkin
2009-10-12 10:08         ` Gleb Natapov
2009-10-12 11:03           ` Michael S. Tsirkin
2009-10-12 11:45             ` Michael S. Tsirkin
2009-10-12 11:48             ` Gleb Natapov
2009-10-12 11:59               ` Michael S. Tsirkin
2009-10-12 12:08                 ` Gleb Natapov
2009-10-12 13:20                   ` Michael S. Tsirkin
2009-10-12 13:29                     ` Gleb Natapov
2009-10-12 13:51                       ` Michael S. Tsirkin
2009-10-12 14:04                         ` Gleb Natapov
2009-10-12 14:11                           ` Michael S. Tsirkin
2009-10-12 14:17                             ` Gleb Natapov
2009-10-12 14:24                               ` Michael S. Tsirkin
2009-10-12 14:20                     ` [Qemu-devel] seabios: fix low bits in ROM and I/O sizing Michael S. Tsirkin
2009-10-13 13:39                       ` [Qemu-devel] " Gleb Natapov
2009-10-14 23:29                         ` Kevin O'Connor
2009-10-11 18:59 ` [Qemu-devel] [PATCH 4/5] Make MMIO address page aligned in guest Gleb Natapov
2009-10-11 21:48   ` [Qemu-devel] " Michael S. Tsirkin
2009-10-12  6:44     ` Gleb Natapov
2009-10-12  7:10       ` Michael S. Tsirkin
2009-10-12  7:22         ` Gleb Natapov
2009-10-12  8:13           ` Michael S. Tsirkin
2009-10-12  8:48             ` Gleb Natapov
2009-10-12  9:43               ` Michael S. Tsirkin [this message]
2009-10-12 10:06                 ` Gleb Natapov
2009-10-12 14:27   ` Kevin O'Connor
2009-10-11 18:59 ` [Qemu-devel] [PATCH 5/5] Set the PCI base address to 0xf0000000 Gleb Natapov
2009-10-12 14:24   ` [Qemu-devel] " Kevin O'Connor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091012094335.GC10741@redhat.com \
    --to=mst@redhat.com \
    --cc=gleb@redhat.com \
    --cc=kevin@koconnor.net \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).