qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Knut Omang <knut.omang@oracle.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Alex Williamson <alex.williamson@redhat.com>
Cc: "Eduardo Habkost" <ehabkost@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jan Kiszka" <jan.kiszka@siemens.com>,
	"Alexander Graf" <agraf@suse.de>,
	qemu-devel@nongnu.org, "Andreas Färber" <andreas.faerber@web.de>,
	qemu-ppc@nongnu.org, "Le Tan" <tamlokveer@gmail.com>,
	marcel@redhat.com, "Paolo Bonzini" <pbonzini@redhat.com>,
	"David Gibson" <david@gibson.dropbear.id.au>,
	"Richard Henderson" <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH v2 0/2] intel_iommu: Add support for translation for devices behind bridges
Date: Sat, 12 Sep 2015 20:37:18 +0200	[thread overview]
Message-ID: <1442083038.8395.292.camel@oracle.com> (raw)
In-Reply-To: <1441258016.3260.126.camel@oracle.com>

On Thu, 2015-09-03 at 07:26 +0200, Knut Omang wrote:
> On Thu, 2015-09-03 at 08:33 +1000, Benjamin Herrenschmidt wrote:
> > On Wed, 2015-09-02 at 10:12 -0600, Alex Williamson wrote:
> > 
> > > There are very specific rules for translating requester IDs across
> > > bridges.  Bus numbers can change during enumeration, devfn cannot.
> 
> Thanks for clarifying that point, Alex, I realize I was a bit imprecise
> in my last mail,
> 
> > > devfn can however be masked by topology changes from PCIe to PCI.  If 
> > > we pretend that the IOMMU can distinguish requester IDs where it 
> > > can't on real hardware, we're going to break the guest.  Thanks,
> > 
> > Note that whether a PCI / PCI-X bridge will mask devfn, bus# or both or
> > even mask it partially (number of bits) or replace some transfers with
> > its own RID ... depends on a given bridge implementation.
> > 
> > Another thing is while I agree that the bus number is problematic,
> > since it changes, it is still what the HW actually uses to match the
> > requester in practice, at least on PHB and I would think on Intel.
> > 
> > The problem is more fundamental. qemu is trying to bind devices to
> > address spaces in a fixed way at device creation time, while this is
> > lazily resolved in HW at the point of the DMA occurring.
> 
> So let me try to sum up my understanding in context of the patch in
> terms of these two approaches,
> 
> > One way to fix it is to effectively have an address space per device,
> > and have the iommu translate function figure out the binding
> > dynamically and flush things if it detects a change. But that is tricky
> > for vfio and it means invalidations will have to iterate all address
> > spaces.
> 
> So my patch is along these lines by actually moving the address space
> pointer into the device struct.
> The benefit is that:
>   * The data structure for the DMA address space can be reused across 
>     IOMMUs, and the address spaces can be set up before bus numbers are
>     
>     assigned, and the implementation is fairly simple.
>   * The IOMMU does not have to be notified of bus changes, except for
>     invalidation purposes (but wouldn't a new enumeration cause a full 
>     IOMMU invalidate anyway?)
> 
> The drawbacks are:
>   * The IOMMUs get to know explicitly about devices behind a bridge, 
>     which logically deviates from how hardware works and 
>     complicates future attempts to implement bridges that 
>     translate RIDs.
>   * Each device can have only one DMA address space mapping associated 
>     with it (I suppose it might be possible to have a topology that 
>     would allow multiple paths to a device, but do we care at this 
>     stage?)
> 
> > The other option is to create Address Spaces on the fly as we lookup
> > domains, and bind them to devices lazily, but again, we need to deal
> > with changes/invalidations and that can be nasty with VFIO.
> 
> We could get here without changing the interfaces, by refining the
> current implementation to just cache bus pointers at setup, then lazily
> add address spaces for each device. This approach would yield IOMMU
> device specific implementations, but would still in practice associate
> devices with address spaces.

As the thread went silent after our conclusions, I have made a second
implementation for the Intel IOMMU according to this alternate scheme,
It keeps the current API and handles the bus number resolution lazily
within the IOMMU implementation, I will post the (single) patch as v3
of this. 

Hopefully this is acceptable and can be leveraged to do a similar
rework, or be abstracted as generic functionality (?) for the other
architectures,..

Thanks,

Knut

  reply	other threads:[~2015-09-12 18:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-01 18:48 [Qemu-devel] [PATCH v2 0/2] intel_iommu: Add support for translation for devices behind bridges Knut Omang
2015-09-01 18:48 ` [Qemu-devel] [PATCH v2 1/2] iommu: Replace bus+devfn arguments with PCIDevice* in PCIIOMMUFunc Knut Omang
2015-09-01 18:48 ` [Qemu-devel] [PATCH v2 2/2] intel_iommu: Add support for translation for devices behind bridges Knut Omang
2015-09-02 12:30 ` [Qemu-devel] [PATCH v2 0/2] " Marcel Apfelbaum
2015-09-02 13:10   ` Knut Omang
2015-09-02 16:12     ` Alex Williamson
2015-09-02 22:33       ` Benjamin Herrenschmidt
2015-09-03  5:26         ` Knut Omang
2015-09-12 18:37           ` Knut Omang [this message]
2015-09-12 23:12             ` Benjamin Herrenschmidt
2015-09-13  7:04               ` Knut Omang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1442083038.8395.292.camel@oracle.com \
    --to=knut.omang@oracle.com \
    --cc=agraf@suse.de \
    --cc=alex.williamson@redhat.com \
    --cc=andreas.faerber@web.de \
    --cc=benh@kernel.crashing.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=ehabkost@redhat.com \
    --cc=jan.kiszka@siemens.com \
    --cc=marcel@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=tamlokveer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).