All of lore.kernel.org
 help / color / mirror / Atom feed
From: Knut Omang <knut.omang@oracle.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Alex Williamson <alex.williamson@redhat.com>
Cc: "Eduardo Habkost" <ehabkost@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jan Kiszka" <jan.kiszka@siemens.com>,
	"Alexander Graf" <agraf@suse.de>,
	qemu-devel@nongnu.org, "Andreas Färber" <andreas.faerber@web.de>,
	qemu-ppc@nongnu.org, "Le Tan" <tamlokveer@gmail.com>,
	marcel@redhat.com, "Paolo Bonzini" <pbonzini@redhat.com>,
	"David Gibson" <david@gibson.dropbear.id.au>,
	"Richard Henderson" <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH v2 0/2] intel_iommu: Add support for translation for devices behind bridges
Date: Sat, 12 Sep 2015 20:37:18 +0200	[thread overview]
Message-ID: <1442083038.8395.292.camel@oracle.com> (raw)
In-Reply-To: <1441258016.3260.126.camel@oracle.com>

On Thu, 2015-09-03 at 07:26 +0200, Knut Omang wrote:
> On Thu, 2015-09-03 at 08:33 +1000, Benjamin Herrenschmidt wrote:
> > On Wed, 2015-09-02 at 10:12 -0600, Alex Williamson wrote:
> > 
> > > There are very specific rules for translating requester IDs across
> > > bridges.  Bus numbers can change during enumeration, devfn cannot.
> 
> Thanks for clarifying that point, Alex, I realize I was a bit imprecise
> in my last mail,
> 
> > > devfn can however be masked by topology changes from PCIe to PCI.  If 
> > > we pretend that the IOMMU can distinguish requester IDs where it 
> > > can't on real hardware, we're going to break the guest.  Thanks,
> > 
> > Note that whether a PCI / PCI-X bridge will mask devfn, bus# or both or
> > even mask it partially (number of bits) or replace some transfers with
> > its own RID ... depends on a given bridge implementation.
> > 
> > Another thing is while I agree that the bus number is problematic,
> > since it changes, it is still what the HW actually uses to match the
> > requester in practice, at least on PHB and I would think on Intel.
> > 
> > The problem is more fundamental. qemu is trying to bind devices to
> > address spaces in a fixed way at device creation time, while this is
> > lazily resolved in HW at the point of the DMA occurring.
> 
> So let me try to sum up my understanding in context of the patch in
> terms of these two approaches,
> 
> > One way to fix it is to effectively have an address space per device,
> > and have the iommu translate function figure out the binding
> > dynamically and flush things if it detects a change. But that is tricky
> > for vfio and it means invalidations will have to iterate all address
> > spaces.
> 
> So my patch is along these lines by actually moving the address space
> pointer into the device struct.
> The benefit is that:
>   * The data structure for the DMA address space can be reused across 
>     IOMMUs, and the address spaces can be set up before bus numbers are
>     
>     assigned, and the implementation is fairly simple.
>   * The IOMMU does not have to be notified of bus changes, except for
>     invalidation purposes (but wouldn't a new enumeration cause a full 
>     IOMMU invalidate anyway?)
> 
> The drawbacks are:
>   * The IOMMUs get to know explicitly about devices behind a bridge, 
>     which logically deviates from how hardware works and 
>     complicates future attempts to implement bridges that 
>     translate RIDs.
>   * Each device can have only one DMA address space mapping associated 
>     with it (I suppose it might be possible to have a topology that 
>     would allow multiple paths to a device, but do we care at this 
>     stage?)
> 
> > The other option is to create Address Spaces on the fly as we lookup
> > domains, and bind them to devices lazily, but again, we need to deal
> > with changes/invalidations and that can be nasty with VFIO.
> 
> We could get here without changing the interfaces, by refining the
> current implementation to just cache bus pointers at setup, then lazily
> add address spaces for each device. This approach would yield IOMMU
> device specific implementations, but would still in practice associate
> devices with address spaces.

As the thread went silent after our conclusions, I have made a second
implementation for the Intel IOMMU according to this alternate scheme,
It keeps the current API and handles the bus number resolution lazily
within the IOMMU implementation, I will post the (single) patch as v3
of this. 

Hopefully this is acceptable and can be leveraged to do a similar
rework, or be abstracted as generic functionality (?) for the other
architectures,..

Thanks,

Knut

  reply	other threads:[~2015-09-12 18:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-01 18:48 [Qemu-devel] [PATCH v2 0/2] intel_iommu: Add support for translation for devices behind bridges Knut Omang
2015-09-01 18:48 ` [Qemu-devel] [PATCH v2 1/2] iommu: Replace bus+devfn arguments with PCIDevice* in PCIIOMMUFunc Knut Omang
2015-09-01 18:48 ` [Qemu-devel] [PATCH v2 2/2] intel_iommu: Add support for translation for devices behind bridges Knut Omang
2015-09-02 12:30 ` [Qemu-devel] [PATCH v2 0/2] " Marcel Apfelbaum
2015-09-02 13:10   ` Knut Omang
2015-09-02 16:12     ` Alex Williamson
2015-09-02 22:33       ` Benjamin Herrenschmidt
2015-09-03  5:26         ` Knut Omang
2015-09-12 18:37           ` Knut Omang [this message]
2015-09-12 23:12             ` Benjamin Herrenschmidt
2015-09-13  7:04               ` Knut Omang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1442083038.8395.292.camel@oracle.com \
    --to=knut.omang@oracle.com \
    --cc=agraf@suse.de \
    --cc=alex.williamson@redhat.com \
    --cc=andreas.faerber@web.de \
    --cc=benh@kernel.crashing.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=ehabkost@redhat.com \
    --cc=jan.kiszka@siemens.com \
    --cc=marcel@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=tamlokveer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.