From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34485) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZapgP-0002ND-5C for qemu-devel@nongnu.org; Sat, 12 Sep 2015 14:37:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZapgO-0000uV-2G for qemu-devel@nongnu.org; Sat, 12 Sep 2015 14:37:53 -0400 Message-ID: <1442083038.8395.292.camel@oracle.com> From: Knut Omang Date: Sat, 12 Sep 2015 20:37:18 +0200 In-Reply-To: <1441258016.3260.126.camel@oracle.com> References: <1441133302-18950-1-git-send-email-knut.omang@oracle.com> <55E6EBE1.5070709@gmail.com> <1441199414.25731.115.camel@oracle.com> <1441210345.20355.295.camel@redhat.com> <1441233210.2668.90.camel@kernel.crashing.org> <1441258016.3260.126.camel@oracle.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2 0/2] intel_iommu: Add support for translation for devices behind bridges List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Benjamin Herrenschmidt , Alex Williamson Cc: Eduardo Habkost , "Michael S. Tsirkin" , Jan Kiszka , Alexander Graf , qemu-devel@nongnu.org, Andreas =?ISO-8859-1?Q?F=E4rber?= , qemu-ppc@nongnu.org, Le Tan , marcel@redhat.com, Paolo Bonzini , David Gibson , Richard Henderson On Thu, 2015-09-03 at 07:26 +0200, Knut Omang wrote: > On Thu, 2015-09-03 at 08:33 +1000, Benjamin Herrenschmidt wrote: > > On Wed, 2015-09-02 at 10:12 -0600, Alex Williamson wrote: > > > > > There are very specific rules for translating requester IDs across > > > bridges. Bus numbers can change during enumeration, devfn cannot. > > Thanks for clarifying that point, Alex, I realize I was a bit imprecise > in my last mail, > > > > devfn can however be masked by topology changes from PCIe to PCI. If > > > we pretend that the IOMMU can distinguish requester IDs where it > > > can't on real hardware, we're going to break the guest. Thanks, > > > > Note that whether a PCI / PCI-X bridge will mask devfn, bus# or both or > > even mask it partially (number of bits) or replace some transfers with > > its own RID ... depends on a given bridge implementation. > > > > Another thing is while I agree that the bus number is problematic, > > since it changes, it is still what the HW actually uses to match the > > requester in practice, at least on PHB and I would think on Intel. > > > > The problem is more fundamental. qemu is trying to bind devices to > > address spaces in a fixed way at device creation time, while this is > > lazily resolved in HW at the point of the DMA occurring. > > So let me try to sum up my understanding in context of the patch in > terms of these two approaches, > > > One way to fix it is to effectively have an address space per device, > > and have the iommu translate function figure out the binding > > dynamically and flush things if it detects a change. But that is tricky > > for vfio and it means invalidations will have to iterate all address > > spaces. > > So my patch is along these lines by actually moving the address space > pointer into the device struct. > The benefit is that: > * The data structure for the DMA address space can be reused across > IOMMUs, and the address spaces can be set up before bus numbers are > > assigned, and the implementation is fairly simple. > * The IOMMU does not have to be notified of bus changes, except for > invalidation purposes (but wouldn't a new enumeration cause a full > IOMMU invalidate anyway?) > > The drawbacks are: > * The IOMMUs get to know explicitly about devices behind a bridge, > which logically deviates from how hardware works and > complicates future attempts to implement bridges that > translate RIDs. > * Each device can have only one DMA address space mapping associated > with it (I suppose it might be possible to have a topology that > would allow multiple paths to a device, but do we care at this > stage?) > > > The other option is to create Address Spaces on the fly as we lookup > > domains, and bind them to devices lazily, but again, we need to deal > > with changes/invalidations and that can be nasty with VFIO. > > We could get here without changing the interfaces, by refining the > current implementation to just cache bus pointers at setup, then lazily > add address spaces for each device. This approach would yield IOMMU > device specific implementations, but would still in practice associate > devices with address spaces. As the thread went silent after our conclusions, I have made a second implementation for the Intel IOMMU according to this alternate scheme, It keeps the current API and handles the bus number resolution lazily within the IOMMU implementation, I will post the (single) patch as v3 of this. Hopefully this is acceptable and can be leveraged to do a similar rework, or be abstracted as generic functionality (?) for the other architectures,.. Thanks, Knut