All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Thomas Huth <thuth@redhat.com>
Cc: lvivier@redhat.com, aik@ozlabs.ru, gwshan@linux.vnet.ibm.com,
	qemu-devel@nongnu.org, alex.williamson@redhat.com,
	qemu-ppc@nongnu.org, pbonzini@redhat.com
Subject: Re: [Qemu-devel] [RFC PATCH 03/10] vfio: Check guest IOVA ranges against host IOMMU capabilities
Date: Wed, 23 Sep 2015 21:07:06 +1000	[thread overview]
Message-ID: <20150923110706.GA15944@voom.fritz.box> (raw)
In-Reply-To: <56027AA6.8090504@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 5440 bytes --]

On Wed, Sep 23, 2015 at 12:10:46PM +0200, Thomas Huth wrote:
> On 17/09/15 15:09, David Gibson wrote:
> > The current vfio core code assumes that the host IOMMU is capable of
> > mapping any IOVA the guest wants to use to where we need.  However, real
> > IOMMUs generally only support translating a certain range of IOVAs (the
> > "DMA window") not a full 64-bit address space.
> > 
> > The common x86 IOMMUs support a wide enough range that guests are very
> > unlikely to go beyond it in practice, however the IOMMU used on IBM Power
> > machines - in the default configuration - supports only a much more limited
> > IOVA range, usually 0..2GiB.
> > 
> > If the guest attempts to set up an IOVA range that the host IOMMU can't
> > map, qemu won't report an error until it actually attempts to map a bad
> > IOVA.  If guest RAM is being mapped directly into the IOMMU (i.e. no guest
> > visible IOMMU) then this will show up very quickly.  If there is a guest
> > visible IOMMU, however, the problem might not show up until much later when
> > the guest actually attempt to DMA with an IOVA the host can't handle.
> > 
> > This patch adds a test so that we will detect earlier if the guest is
> > attempting to use IOVA ranges that the host IOMMU won't be able to deal
> > with.
> > 
> > For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is
> > incorrect, but no worse than what we have already.  We can't do better for
> > now because the Type1 kernel interface doesn't tell us what IOVA range the
> > IOMMU actually supports.
> > 
> > For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported
> > IOVA range and validate guest IOVA ranges against it, and this patch does
> > so.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/vfio/common.c              | 42 +++++++++++++++++++++++++++++++++++++++---
> >  include/hw/vfio/vfio-common.h |  6 ++++++
> >  2 files changed, 45 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index 9953b9c..c37f1a1 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -344,14 +344,23 @@ static void vfio_listener_region_add(MemoryListener *listener,
> >      if (int128_ge(int128_make64(iova), llend)) {
> >          return;
> >      }
> > +    end = int128_get64(llend);
> > +
> > +    if ((iova < container->iommu_data.min_iova)
> > +        || ((end - 1) > container->iommu_data.max_iova)) {
> 
> (Too much paranthesis for my taste ;-))

Yes, well, we've already established our tastes differ on that point.

> > +        error_report("vfio: IOMMU container %p can't map guest IOVA region"
> > +                     " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
> > +                     container, iova, end - 1);
> > +        ret = -EFAULT; /* FIXME: better choice here? */
> 
> Maybe -EINVAL? ... but -EFAULT also sounds ok for me.

I try to avoid EINVAL unless it's clearly the only right choice.  So
many things use it that it tends to be very unhelpful when you get one.

> > +        goto fail;
> > +    }
> ...
> > @@ -712,6 +732,22 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
> >              ret = -errno;
> >              goto free_container_exit;
> >          }
> > +
> > +        /*
> > +         * FIXME: This only considers the host IOMMU' 32-bit window.
> > +         * At some point we need to add support for the optional
> > +         * 64-bit window and dynamic windows
> > +         */
> > +        info.argsz = sizeof(info);
> > +        ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> > +        if (ret) {
> > +            error_report("vfio: VFIO_IOMMU_SPAPR_TCE_GET_INFO failed: %m");
> 
> Isn't that %m a glibc extension only? ... Well, this code likely only
> runs on Linux with a glibc, so it likely doesn't matter, I guess...

Yes, it is, but it's already used extensively within qemu.

> > +            ret = -errno;
> > +            goto free_container_exit;
> > +        }
> > +        container->iommu_data.min_iova = info.dma32_window_start;
> > +        container->iommu_data.max_iova = container->iommu_data.min_iova
> > +            + info.dma32_window_size - 1;
> >      } else {
> >          error_report("vfio: No available IOMMU models");
> >          ret = -EINVAL;
> > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > index aff18cd..88ec213 100644
> > --- a/include/hw/vfio/vfio-common.h
> > +++ b/include/hw/vfio/vfio-common.h
> > @@ -71,6 +71,12 @@ typedef struct VFIOContainer {
> >          MemoryListener listener;
> >          int error;
> >          bool initialized;
> > +        /*
> > +         * FIXME: This assumes the host IOMMU can support only a
> > +         * single contiguous IOVA window.  We may need to generalize
> > +         * that in future
> > +         */
> > +        hwaddr min_iova, max_iova;
> 
> Should that maybe be dma_addr_t instead of hwaddr ?

Ah, yes it probably should.

> >      } iommu_data;
> >      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
> >      QLIST_HEAD(, VFIOGroup) group_list;
> > 
> 
>  Thomas
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2015-09-23 11:08 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-17 13:09 [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge David Gibson
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 01/10] vfio: Remove unneeded union from VFIOContainer David Gibson
2015-09-18  6:15   ` Alexey Kardashevskiy
2015-09-23 10:31   ` Thomas Huth
2015-09-23 23:14     ` David Gibson
2015-09-23 13:18   ` Laurent Vivier
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 02/10] vfio: Generalize vfio_listener_region_add failure path David Gibson
2015-09-23  9:13   ` Thomas Huth
2015-09-23 13:31   ` Laurent Vivier
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 03/10] vfio: Check guest IOVA ranges against host IOMMU capabilities David Gibson
2015-09-18  6:38   ` Alexey Kardashevskiy
2015-09-23 10:10   ` Thomas Huth
2015-09-23 11:07     ` David Gibson [this message]
2015-09-23 23:43       ` David Gibson
2015-09-23 14:26   ` Laurent Vivier
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 04/10] vfio: Record host IOMMU's available IO page sizes David Gibson
2015-09-23 10:29   ` Thomas Huth
2015-09-23 14:30   ` Laurent Vivier
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 05/10] memory: Allow replay of IOMMU mapping notifications David Gibson
2015-09-23 10:40   ` Thomas Huth
2015-09-23 16:35     ` Laurent Vivier
2015-09-23 23:47     ` David Gibson
2015-09-23 17:04   ` Laurent Vivier
2015-09-23 23:50     ` David Gibson
2015-09-24  7:09       ` Laurent Vivier
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 06/10] vfio: Allow hotplug of containers onto existing guest IOMMU mappings David Gibson
2015-09-17 16:54   ` Alex Williamson
2015-09-17 23:31     ` David Gibson
2015-09-23 11:02       ` Thomas Huth
2015-09-23 23:50         ` David Gibson
2015-09-23 18:44   ` Laurent Vivier
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 07/10] spapr_pci: Allow PCI host bridge DMA window to be configured David Gibson
2015-09-23 11:08   ` Thomas Huth
2015-09-23 23:56     ` David Gibson
2015-09-23 18:55   ` Laurent Vivier
2015-09-23 23:54     ` David Gibson
2015-09-24  6:59       ` Laurent Vivier
2015-10-03  0:25         ` Alexey Kardashevskiy
2015-10-05 14:13           ` Paolo Bonzini
2015-10-06  3:25             ` David Gibson
2015-10-06  4:18               ` David Gibson
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 08/10] spapr_iommu: Rename vfio_accel parameter David Gibson
2015-09-17 16:54   ` Alex Williamson
2015-09-17 23:34     ` David Gibson
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 09/10] spapr_iommu: Provide a function to switch a TCE table to allowing VFIO David Gibson
2015-09-17 16:54   ` Alex Williamson
2015-09-23 11:24     ` Thomas Huth
2015-09-24  0:35       ` David Gibson
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 10/10] spapr_pci: Allow VFIO devices to work on the normal PCI host bridge David Gibson
2015-09-17 16:54 ` [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge Alex Williamson
2015-09-23 11:26 ` Thomas Huth
2015-09-23 16:46 ` Laurent Vivier
2015-09-24  1:02   ` David Gibson
2015-09-24  7:02     ` Laurent Vivier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150923110706.GA15944@voom.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=lvivier@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.