qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Don Dutile <ddutile@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: B07421@freescale.com, kvm@vger.kernel.org,
	gregkh@linuxfoundation.org, aik@ozlabs.ru,
	linux-pci@vger.kernel.org, agraf@suse.de, qemu-devel@nongnu.org,
	chrisw@sous-sol.org, B08248@freescale.com,
	iommu@lists.linux-foundation.org, avi@redhat.com,
	Bjorn Helgaas <bhelgaas@google.com>,
	david@gibson.dropbear.id.au, dwmw2@infradead.org,
	linux-kernel@vger.kernel.org, benve@cisco.com
Subject: Re: [Qemu-devel] [PATCH 05/13] pci: New pci_acs_enabled()
Date: Wed, 16 May 2012 09:29:51 -0400	[thread overview]
Message-ID: <4FB3ABCF.4030708@redhat.com> (raw)
In-Reply-To: <1337116157.6954.212.camel@bling.home>

On 05/15/2012 05:09 PM, Alex Williamson wrote:
> On Tue, 2012-05-15 at 13:56 -0600, Bjorn Helgaas wrote:
>> On Mon, May 14, 2012 at 4:49 PM, Alex Williamson
>> <alex.williamson@redhat.com>  wrote:
>>> On Mon, 2012-05-14 at 16:02 -0600, Bjorn Helgaas wrote:
>>>> On Fri, May 11, 2012 at 4:56 PM, Alex Williamson
>>>> <alex.williamson@redhat.com>  wrote:
>>>>> In a PCIe environment, transactions aren't always required to
>>>>> reach the root bus before being re-routed.  Peer-to-peer DMA
>>>>> may actually not be seen by the IOMMU in these cases.  For
>>>>> IOMMU groups, we want to provide IOMMU drivers a way to detect
>>>>> these restrictions.  Provided with a PCI device, pci_acs_enabled
>>>>> returns the furthest downstream device with a complete PCI ACS
>>>>> chain.  This information can then be used in grouping to create
>>>>> fully isolated groups.  ACS chain logic extracted from libvirt.
>>>>
>>>> The name "pci_acs_enabled()" sounds like it returns a boolean, but it doesn't.
>>>
>>> Right, maybe this should be:
>>>
>>> struct pci_dev *pci_find_upstream_acs(struct pci_dev *pdev);
>>>
+1; there is a global in the PCI code, pci_acs_enable,
and a function pci_enable_acs(), which the above name certainly
confuses.  I recommend  pci_find_top_acs_bridge()
would be most descriptive.

>>>> I'm not sure what "a complete PCI ACS chain" means.
>>>>
>>>> The function starts from "dev" and searches *upstream*, so I'm
>>>> guessing it returns the root of a subtree that must be contained in a
>>>> group.
>>>
>>> Any intermediate switch between an endpoint and the root bus can
>>> redirect a dma access without iommu translation,
>>
>> Is this "redirection" just the normal PCI bridge forwarding that
>> allows peer-to-peer transactions, i.e., the rule (from P2P bridge
>> spec, rev 1.2, sec 4.1) that the bridge apertures define address
>> ranges that are forwarded from primary to secondary interface, and the
>> inverse ranges are forwarded from secondary to primary?  For example,
>> here:
>>
>>                     ^
>>                     |
>>            +--------+-------+
>>            |                |
>>     +------+-----+    +-----++-----+
>>     | Downstream |    | Downstream |
>>     |    Port    |    |    Port    |
>>     |   06:05.0  |    |   06:06.0  |
>>     +------+-----+    +------+-----+
>>            |                 |
>>       +----v----+       +----v----+
>>       | Endpoint|       | Endpoint|
>>       | 07:00.0 |       | 08:00.0 |
>>       +---------+       +---------+
>>
>> that rule is all that's needed for a transaction from 07:00.0 to be
>> forwarded from upstream to the internal switch bus 06, then claimed by
>> 06:06.0 and forwarded downstream to 08:00.0.  This is plain old PCI,
>> nothing specific to PCIe.
>
> Right, I think the main PCI difference is the point-to-point nature of
> PCIe vs legacy PCI bus.  On a legacy PCI bus there's no way to prevent
> devices talking to each other, but on PCIe the transaction makes a
> U-turn at some point and heads out another downstream port.  ACS allows
> us to prevent that from happening.
>
detail: PCIe up/downstream routing is really done by an internal switch;
         ACS forces the legacy, PCI base-limit address routing and *forces*
         the switch to always route the transaction from a downstream port
         to the upstream port.

>> I don't understand ACS very well, but it looks like it basically
>> provides ways to prevent that peer-to-peer forwarding, so transactions
>> would be sent upstream toward the root (and specifically, the IOMMU)
>> instead of being directly claimed by 06:06.0.
>
> Yep, that's my meager understanding as well.
>
+1

>>> so we're looking for
>>> the furthest upstream device for which acs is enabled all the way up to
>>> the root bus.
>>
>> Correct me if this is wrong: To force device A's DMAs to be processed
>> by an IOMMU, ACS must be enabled on the root port and every downstream
>> port along the path to A.
>
> Yes, modulo this comment in libvirt source:
>
>      /* if we have no parent, and this is the root bus, ACS doesn't come
>       * into play since devices on the root bus can't P2P without going
>       * through the root IOMMU.
>       */
>
Correct. PCIe spec says roots must support ACS. I believe all the
root bridges that have an IOMMU have ACS wired in/on.

> So we assume that a redirect at the point of the iommu will factor in
> iommu translation.
>
>> If so, I think you're trying to find out the closest upstream device X
>> such that everything leading to X has ACS enabled.  Every device below
>> X can DMA freely to other devices below X, so they would all have to
>> be in the same isolated group.
>
> Yes
>
>> I tried to work through some examples to develop some intuition about this:
>
> (inserting fixed url)
>> http://www.asciiflow.com/#3736558963405980039
>
>> pci_acs_enabled(00:00.0) = 00:00.0 (on root bus (but doesn't it matter
>> if 00:00.0 is PCIe or if RP has ACS?))
>
> Hmm, the latter is the assumption above.  For the former, I think
> libvirt was probably assuming that PCI devices must have a PCIe device
> upstream from them because x86 doesn't have assignment friendly IOMMUs
> except on PCIe.  I'll need to work on making that more generic.
>
>> pci_acs_enabled(00:01.0) = 00:01.0 (on root bus)
>> pci_acs_enabled(01:00.0) = 01:00.0 (acs_dev = 00:01.0, 01:00.0 is not
>> PCIe; seems wrong)
>
> Oops, I'm calling pci_find_upstream_pcie_bridge() first on any of my
> input devices, so this was passing for me.  I'll need to incorporate
> that generically.
>
>> pci_acs_enabled(00:02.0) = 00:02.0 (on root bus; seems wrong if RP
>> doesn't have ACS)
>
> Yeah, let me validate the libvirt assumption.  I see ACS on my root
> port, so maybe they're just assuming it's always enabled or that the
> precedence favors IOMMU translation.  I'm also starting to think that we
> might want "from" and "to" struct pci_dev parameters to make it more
> flexible where the iommu lives in the system.
>
see comment above wrt root ports that have IOMMUs in them.

>> pci_acs_enabled(02:00.0) = 00:02.0 (acs_dev = 00:02.0, 02:00.0 has no ACS cap)
>> pci_acs_enabled(03:00.0) = 00:02.0 (acs_dev = 00:02.0)
>> pci_acs_enabled(02:01.0) = 02:01.0 (acs_dev = 00:02.0, 02:01.0 has ACS enabled)
>> pci_acs_enabled(04:00.0) = 04:00.0 (acs_dev = 02:01.0, 04:00.0 is not
>> a bridge; seems wrong if 04:00 is a multi-function device)
>
> AIUI, ACS is not an endpoint property, so this is what should happen.  I
> don't think multifunction plays a role other than how much do we trust
> the implementation to not allow back channels between functions (the
> answer should probably be not at all).
>
correct. ACS is a *bridge* property.
The unknown wrt multifunction devices is that such devices *could* be implemented
by a hidden (not responding to PCI cfg accesses from downstream port) PCI bridge
btwn the functions within a device.
Such a bridge could allow peer-to-peer xactions and there is no way for OS's to
force ACS.  So, one has to ask the hw vendors if such a hidden device exists
in the implementation, and whether peer-to-peer is enabled/allowed -- a hidden PCI
bridge/PCIe-switch could just be hardwired to push all IO to upstream port,
and allow parent bridge re-route it back down if peer-to-peer is desired.
Debate exists whether multifunction devices are 'secure' b/c of this unknown.
Maybe a PCIe (min., SRIOV) spec change is needed in this area to
determine this status about a device (via pci cfg/cap space).

>> pci_acs_enabled(02:02.0) = 02:02.0 (acs_dev = 00:02.0, 02:02.0 has ACS enabled)
>> pci_acs_enabled(05:00.0) = 05:00.0 (acs_dev = 02:02.0, 05:00.0 is not a bridge)
>>
>> But it didn't really help.  I still can't develop a mental picture of
>> what this function does.
>
> It helped me :)  These are good examples, I'll work on fixing it for
> them.  Thanks,
>
> Alex
>
>
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2012-05-16 13:30 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-11 22:55 [Qemu-devel] [PATCH 00/13] IOMMU Groups + VFIO Alex Williamson
2012-05-11 22:55 ` [Qemu-devel] [PATCH 01/13] driver core: Add iommu_group tracking to struct device Alex Williamson
2012-05-11 23:38   ` Greg KH
2012-05-11 23:58     ` Alex Williamson
2012-05-12  0:00       ` Greg KH
2012-05-11 22:55 ` [Qemu-devel] [PATCH 02/13] iommu: IOMMU Groups Alex Williamson
2012-05-11 23:39   ` Greg KH
2012-05-11 23:58     ` Alex Williamson
2012-05-14  1:16   ` David Gibson
2012-05-14 17:11     ` Alex Williamson
2012-05-15  2:03       ` David Gibson
2012-05-15  6:34         ` Alex Williamson
2012-05-17  3:29           ` David Gibson
2012-05-11 22:55 ` [Qemu-devel] [PATCH 03/13] iommu: IOMMU groups for VT-d and AMD-Vi Alex Williamson
2012-05-17  3:37   ` David Gibson
2012-05-11 22:55 ` [Qemu-devel] [PATCH 04/13] pci: New pci_dma_quirk() Alex Williamson
2012-05-17  3:39   ` David Gibson
2012-05-17  4:06     ` Alex Williamson
2012-05-17  7:19   ` Anonymous
2012-05-17 15:22     ` Alex Williamson
2012-05-11 22:56 ` [Qemu-devel] [PATCH 05/13] pci: New pci_acs_enabled() Alex Williamson
2012-05-14 22:02   ` Bjorn Helgaas
2012-05-14 22:49     ` Alex Williamson
2012-05-15 19:56       ` Bjorn Helgaas
2012-05-15 20:05         ` Bjorn Helgaas
2012-05-15 21:09         ` Alex Williamson
2012-05-16 13:29           ` Don Dutile [this message]
2012-05-16 16:21             ` Alex Williamson
2012-05-16 19:36               ` Alex Williamson
2012-05-18 23:00               ` [Qemu-devel] RESEND3: " Don Dutile
2012-05-19  2:47                 ` [Qemu-devel] " Alex Williamson
2012-05-21 13:31                   ` Don Dutile
2012-05-21 14:59                     ` Alex Williamson
2012-05-21 18:14                       ` Don Dutile
2012-05-11 22:56 ` [Qemu-devel] [PATCH 06/13] iommu: Make use of DMA quirking and ACS enabled check for groups Alex Williamson
2012-05-11 22:56 ` [Qemu-devel] [PATCH 07/13] vfio: VFIO core Alex Williamson
2012-05-11 22:56 ` [Qemu-devel] [PATCH 08/13] vfio: Add documentation Alex Williamson
2012-05-11 22:56 ` [Qemu-devel] [PATCH 09/13] vfio: x86 IOMMU implementation Alex Williamson
2012-05-11 22:56 ` [Qemu-devel] [PATCH 10/13] pci: export pci_user functions for use by other drivers Alex Williamson
2012-05-14 21:20   ` Bjorn Helgaas
2012-05-11 22:56 ` [Qemu-devel] [PATCH 11/13] pci: Create common pcibios_err_to_errno Alex Williamson
2012-05-21 17:55   ` Konrad Rzeszutek Wilk
2012-05-11 22:56 ` [Qemu-devel] [PATCH 12/13] pci: Misc pci_reg additions Alex Williamson
2012-05-11 22:56 ` [Qemu-devel] [PATCH 13/13] vfio: Add PCI device driver Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FB3ABCF.4030708@redhat.com \
    --to=ddutile@redhat.com \
    --cc=B07421@freescale.com \
    --cc=B08248@freescale.com \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=avi@redhat.com \
    --cc=benve@cisco.com \
    --cc=bhelgaas@google.com \
    --cc=chrisw@sous-sol.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=dwmw2@infradead.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).