qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	liu ping fan <qemulist@gmail.com>,
	qemu-devel@nongnu.org, Blue Swirl <blauwirbel@gmail.com>,
	Anthony Liguori <anthony@codemonkey.ws>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [RFC v1 7/7] vhost: abort if an emulated iommu is used
Date: Mon, 15 Oct 2012 12:24:45 +0200	[thread overview]
Message-ID: <507BE46D.9080303@redhat.com> (raw)
In-Reply-To: <1349984318.2321.56.camel@ul30vt.home>

On 10/11/2012 09:38 PM, Alex Williamson wrote:
> On Thu, 2012-10-11 at 17:48 +0200, Avi Kivity wrote:
>> On 10/11/2012 05:34 PM, Michael S. Tsirkin wrote:
>> > On Thu, Oct 11, 2012 at 04:35:23PM +0200, Avi Kivity wrote:
>> >> On 10/11/2012 04:35 PM, Michael S. Tsirkin wrote:
>> >> 
>> >> >> No, qemu should configure virtio devices to bypass the iommu, even if it
>> >> >> is on.
>> >> > 
>> >> > Okay so there will be some API that virtio devices should call
>> >> > to achieve this?
>> >> 
>> >> The iommu should probably call pci_device_bypasses_iommu() to check for
>> >> such devices.
>> > 
>> > So maybe this patch should depend on the introduction of such
>> > an API.
>> 
>> I've dropped it for now.
>> 
>> In fact, virtio/vhost are safe since they use cpu_physical_memory_rw()
>> and the memory listener watches address_space_memory, no iommu there.
>> vfio needs to change to listen to pci_dev->bus_master_as, and need
>> special handling for iommu regions (abort for now, type 2 iommu later).
> 
> I don't see how we can ever support an assigned device with the
> translate function.  

We cannot.

> Don't we want a flat address space at run time
> anyway?  

Not if we want vfio-in-the-guest (for nested virt or OS bypass).

> IOMMU drivers go to pains to make IOTLB updates efficient and
> drivers optimize for long running translations, but here we impose a
> penalty on every access.  I think we'd be more efficient and better able
> to support assigned devices if the per device/bus address space was
> updated and flattened when it changes.  

A flattened address space cannot be efficiently implemented with a
->translate() callback.  Describing the transformed address space
requires walking all the iommu page tables; these can change very
frequently for some use cases, and the io page tables can be built after
the iommu is configured but before dma is initiated, so you have no hook
from which to call ->translate(); and the representation of the address
space can be huge.

> Being able to implement an XOR
> IOMMU is impressive, but is it practical?  

The XOR IOMMU is just a way for me to test and demonstrate the API.

> We could be doing much more
> practical things like nested device assignment with a flatten
> translation ;)  Thanks,

No, a flattened translation is impractical, at least when driven from qemu.

My plans wrt vfio/kvm here are to have memory_region_init_iommu()
provide, in addition to ->translate(), a declarative description of the
translation function.  In practical terms, this means that the API will
receive the name of the spec that the iommu implements:

  MemoryRegionIOMMUOps amd_iommu_v2_ops = {
      .translate = amd_iommu_v2_ops,
      .translation_type = IOMMU_AMD_V2,
  };

qemu-side vfio would then match ->translation_type with what the kernel
provides, and configure the kernel for this type of translation.  As
some v2 hardware supports two levels of translations, all vfio has to do
is to set up the lower translation level to match the guest->host
translation (which it does already), and to set up the upper translation
level to follow the guest configuration.  From then on the hardware does
the rest.

If the hardware supports only one translation level, we may still be
able to implement nested iommu using the same techniques we use for the
processor page tables - shadowing.  kvm would write-protect the iommu
page tables and pass any updates to vfio, which would update the shadow
io page tables that implement the ngpa->gpa->hpa translation.  However
given the complexity and performance problems on one side, and the size
of the niche that nested device assignment serves, we'll probably limit
ourselves to hardware that supports two levels of translations.  If
nested virtualization really takes off we can use shadowing to provide
the guest with emulated hardware that supports two translation level
(the solution above uses host hardware with two levels to expose guest
hardware with one level).

-- 
error compiling committee.c: too many arguments to function

  reply	other threads:[~2012-10-15 10:24 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-11 13:26 [Qemu-devel] [RFC v1 0/7] IOMMU support Avi Kivity
2012-10-11 13:26 ` [Qemu-devel] [RFC v1 1/7] memory: fix address space initialization/destruction Avi Kivity
2012-10-11 13:31   ` Paolo Bonzini
2012-10-11 13:33     ` Avi Kivity
2012-10-13  9:14       ` Blue Swirl
2012-10-11 13:26 ` [Qemu-devel] [RFC v1 2/7] memory: limit sections in the radix tree to the actual address space size Avi Kivity
2012-10-11 13:26 ` [Qemu-devel] [RFC v1 3/7] memory: iommu support Avi Kivity
2012-10-11 13:42   ` Paolo Bonzini
2012-10-11 13:45     ` Avi Kivity
2012-10-11 13:54       ` Paolo Bonzini
2012-10-11 13:57         ` Avi Kivity
2012-10-12  2:51           ` Benjamin Herrenschmidt
2012-10-15 16:54             ` Avi Kivity
2012-10-12  2:45     ` Benjamin Herrenschmidt
2012-10-13  9:30       ` Blue Swirl
2012-10-13 11:37         ` Benjamin Herrenschmidt
2012-10-11 14:29   ` Avi Kivity
2012-10-11 13:27 ` [Qemu-devel] [RFC v1 4/7] pci: switch iommu to using the memory API Avi Kivity
2012-10-11 13:53   ` Paolo Bonzini
2012-10-11 13:56     ` Avi Kivity
2012-10-13  9:13   ` Blue Swirl
2012-10-15 10:31     ` Avi Kivity
2012-10-11 13:27 ` [Qemu-devel] [RFC v1 5/7] i440fx: add an iommu Avi Kivity
2012-10-11 13:27 ` [Qemu-devel] [RFC v1 6/7] vfio: abort if an emulated iommu is used Avi Kivity
2012-10-11 13:27 ` [Qemu-devel] [RFC v1 7/7] vhost: " Avi Kivity
2012-10-11 13:31   ` Michael S. Tsirkin
2012-10-11 13:34     ` Avi Kivity
2012-10-11 13:44       ` Michael S. Tsirkin
2012-10-11 13:44         ` Avi Kivity
2012-10-11 14:35           ` Michael S. Tsirkin
2012-10-11 14:35             ` Avi Kivity
2012-10-11 15:34               ` Michael S. Tsirkin
2012-10-11 15:48                 ` Avi Kivity
2012-10-11 19:38                   ` Alex Williamson
2012-10-15 10:24                     ` Avi Kivity [this message]
2012-10-15  8:44                   ` liu ping fan
2012-10-15 10:32                     ` Avi Kivity
2012-10-12  2:36 ` [Qemu-devel] [RFC v1 0/7] IOMMU support Benjamin Herrenschmidt
2012-10-15 10:45   ` Avi Kivity
2012-10-15 19:52     ` Benjamin Herrenschmidt
2012-10-16  9:30       ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=507BE46D.9080303@redhat.com \
    --to=avi@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=blauwirbel@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemulist@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).