From: Bjorn Helgaas <helgaas@kernel.org>
To: "Lawrynowicz, Jacek" <jacek.lawrynowicz@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
Alex Williamson <alex.williamson@redhat.com>,
Joerg Roedel <jroedel@suse.de>,
David Woodhouse <dwmw2@infradead.org>,
"iommu@lists.linux-foundation.org"
<iommu@lists.linux-foundation.org>
Subject: Re: [PATCH v4 3/6] PCI: Add support for multiple DMA aliases
Date: Mon, 29 Feb 2016 16:44:17 -0600 [thread overview]
Message-ID: <20160229224417.GD3653@localhost> (raw)
In-Reply-To: <36D38C1F74839847A52A484C31F3E51A62180790@irsmsx105.ger.corp.intel.com>
On Thu, Feb 25, 2016 at 03:41:51PM +0000, Lawrynowicz, Jacek wrote:
> > -----Original Message-----
> > From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> > Sent: Thursday, February 25, 2016 3:39 PM
> > To: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: Lawrynowicz, Jacek <jacek.lawrynowicz@intel.com>; linux-
> > pci@vger.kernel.org; Alex Williamson <alex.williamson@redhat.com>; Joerg
> > Roedel <jroedel@suse.de>; David Woodhouse <dwmw2@infradead.org>;
> > iommu@lists.linux-foundation.org
> > Subject: Re: [PATCH v4 3/6] PCI: Add support for multiple DMA aliases
> >
> > On Wed, Feb 24, 2016 at 01:44:06PM -0600, Bjorn Helgaas wrote:
> > > From: Jacek Lawrynowicz <jacek.lawrynowicz@intel.com>
> > >
> > > <Insert changelog here>
> >
> > (Sorry, I should have copied this changelog in the patch; I copied
> > this manually from your v3 posting):
> >
> > > This patch solves IOMMU support issues with PCIe non-transparent bridges
> > > that use Requester ID look-up tables (LUT), e.g. PEX8733. Before exiting
> > > the bridge, packet's RID is rewritten according to LUT programmed by
> > > a driver. Modified packets are then passed to a destination bus and
> > > processed upstream. The problem is that such packets seem to come from
> > > non-existent nodes that are hidden behind NTB and are not discoverable
> > > by a destination node, so IOMMU discards them. Adding DMA alias for a
> > > given LUT entry allows IOMMU to create a proper mapping that enables
> > > inter-node communication.
> >
> > A specific example here would help me understand. Here's how I
> > understand this (correct me if I'm wrong): We're talking about a DMA
> > packet being forwarded upstream from an NTB. The NTB uses the LUT to
> > rewrite the RID in the DMA packet. The new RID from the LUT is
> > unknown to the IOMMU, so it discards the DMA packet.
>
> Yes, this is exactly the problem.
>
> > > The current DMA alias implementation supports only single alias, so it's
> > > not possible to connect more than two nodes when IOMMU is enabled. This
> > > implementation enables all possible aliases on a given bus (256) that
> > > are stored in a bitset. Alias devfn is directly translated to a bit
> > > number. The bitset is not allocated for devices that have no need for
> > > DMA aliases.
> >
> > I think "two nodes" is referring to two PCIe devices on the other side
> > of the NTB. You want DMA packets from those devices to have different
> > RIDs so the IOMMU can distinguish them.
>
> Right.
>
> > The LUT entries basically create aliases of the NTB (one alias for
> > each device beyond the NTB). Your quirk uses pci_add_dma_alias(), and
> > the aliases are all on the same bus as the NTB itself.
> >
> > The quirk adds PCI_DEVFN(0x10, 0x0), PCI_DEVFN(0x11, 0x0), and
> > PCI_DEVFN(0x12, 0x0). Shouldn't there be some connection between this
> > and the LUT programming? I assume the LUT is programmed to correspond
> > to those aliases. Does this mean you're limited to three devices
> > beyond the NTB?
>
> Yes, there is an indirect connection between LUT table and devfns used in the
> quirk.
> Dev part is an offset in the LUT table and function is taken from the device
> behind the NTB.
> So the driver can only change the dev part by using different LUT offsets.
> We don't plan to modify this quirk. The number of PCIe devices beyond single
> x200 card NTB will not change.
> Two are used by x200 CPU (host bridge & root port) and one is used by x200 DMA
> engine.
> I'm not sure introducing some dependencies to make sure the offsets are set
> correctly is really worth it.
I'd like at least a comment that points to the specific x200 code that
must coordinate with this.
> So regarding the improvements in the patch description, you want me to update
> and repost it?
Yes, please.
> BTW I posted x200 DMA driver (the client for this change) on DMA list:
> https://lkml.org/lkml/2016/2/9/287
> I'm working on integrating review comments and hope to get it included in 4.6.
What about my questions on the code itself, below?
> > > ---
> > > drivers/iommu/iommu.c | 17 ++++++++++-------
> > > drivers/pci/pci.c | 11 +++++++++--
> > > drivers/pci/probe.c | 1 +
> > > drivers/pci/search.c | 14 +++++++++-----
> > > include/linux/pci.h | 4 +---
> > > 5 files changed, 30 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > > index 0e3b009..a214e19 100644
> > > --- a/drivers/iommu/iommu.c
> > > +++ b/drivers/iommu/iommu.c
> > > @@ -659,9 +659,15 @@ static struct iommu_group
> > *get_pci_function_alias_group(struct pci_dev *pdev,
> > > return NULL;
> > > }
> > >
> > > +static bool dma_alias_is_enabled(struct pci_dev *dev, u8 devfn)
> > > +{
> > > + return dev->dma_alias_mask &&
> > > + test_bit(devfn, dev->dma_alias_mask);
> > > +}
> > > +
> > > /*
> > > - * Look for aliases to or from the given device for exisiting groups. The
> > > - * dma_alias_devfn only supports aliases on the same bus, therefore the
> > search
> > > + * Look for aliases to or from the given device for existing groups. DMA
> > > + * aliases are only supported on the same bus, therefore the search
> >
> > I'm trying to reconcile this statement that "DMA aliases are only
> > supported on the same bus" (which was there even before this patch)
> > with the fact that pci_for_each_dma_alias() does not have that
> > limitation.
> >
> > > * space is quite small (especially since we're really only looking at pcie
> > > * device, and therefore only expect multiple slots on the root complex or
> > > * downstream switch ports). It's conceivable though that a pair of
> > > @@ -686,11 +692,8 @@ static struct iommu_group *get_pci_alias_group(struct
> > pci_dev *pdev,
> > > continue;
> > >
> > > /* We alias them or they alias us */
> > > - if (((pdev->dev_flags & PCI_DEV_FLAGS_DMA_ALIAS_DEVFN)
> > &&
> > > - pdev->dma_alias_devfn == tmp->devfn) ||
> > > - ((tmp->dev_flags & PCI_DEV_FLAGS_DMA_ALIAS_DEVFN) &&
> > > - tmp->dma_alias_devfn == pdev->devfn)) {
> > > -
> > > + if (dma_alias_is_enabled(pdev, tmp->devfn) ||
> > > + dma_alias_is_enabled(tmp, pdev->devfn)) {
> > > group = get_pci_alias_group(tmp, devfns);
> >
> > We basically have this:
> >
> > for_each_pci_dev(tmp) {
> > if (<pdev and tmp are DMA aliases>)
> > group = get_pci_alias_group();
> > ...
> > }
> >
> > The DMA alias stuff relies on PCI internals, so it doesn't doesn't
> > seem quite right to use things like PCI_DEV_FLAGS_DMA_ALIAS_DEVFN and
> > dma_alias_devfn here in the IOMMU code.
> >
> > I'm trying to figure out why we don't do something like the following
> > instead:
> >
> > callback(struct pci_dev *pdev, u16 alias, void *opaque)
> > {
> > struct iommu_group *group;
> >
> > group = get_pci_alias_group();
> > if (group)
> > return group;
> >
> > return 0;
> > }
> >
> > pci_for_each_dma_alias(pdev, callback, ...);
> >
> > Is the existing code some sort of optimization, e.g., checking
> > PCI_DEV_FLAGS_DMA_ALIAS_DEVFN is cheaper than using
> > pci_for_each_dma_alias()?
> >
> > It seems like this won't work for some very unlikely but theoretically
> > possible topologies, e.g.,
> >
> > PCIe Root Complex/IOMMU
> > PCIe switch A
> > PCIe to conventional PCI bridge
> > PCI to PCIe Root Complex
> > PCIe NTB
> >
> > Here, I think the IOMMU will only see RIDs from PCIe switch A, but the
> > current code only looks at DMA aliases that are on the same bus as the
> > PCIe NTB. Wouldn't using pci_for_each_dma_alias() handle this
> > correctly?
> >
> > > if (group) {
> > > pci_dev_put(tmp);
next prev parent reply other threads:[~2016-02-29 22:44 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-24 19:43 [PATCH v4 0/6] PCI: Support multiple DMA aliases Bjorn Helgaas
2016-02-24 19:43 ` [PATCH v4 1/6] PCI: Add pci_add_dma_alias() to abstract implementation Bjorn Helgaas
2016-04-08 20:18 ` Alex Williamson
2016-02-24 19:43 ` [PATCH v4 2/6] PCI: Move informational printk to pci_add_dma_alias() Bjorn Helgaas
2016-04-08 20:19 ` Alex Williamson
2016-02-24 19:44 ` [PATCH v4 3/6] PCI: Add support for multiple DMA aliases Bjorn Helgaas
2016-02-25 14:38 ` Bjorn Helgaas
2016-02-25 15:41 ` Lawrynowicz, Jacek
2016-02-29 22:44 ` Bjorn Helgaas [this message]
2016-03-01 16:57 ` Jacek Lawrynowicz
2016-03-03 14:22 ` [PATCH] " Jacek Lawrynowicz
2016-03-03 14:38 ` [PATCH v5 3/6] " Jacek Lawrynowicz
2016-04-08 20:19 ` Alex Williamson
2016-03-14 22:43 ` [PATCH v4 " David Woodhouse
2016-03-16 0:48 ` Bjorn Helgaas
2016-04-08 16:06 ` Bjorn Helgaas
2016-04-08 16:09 ` David Woodhouse
2016-04-08 17:31 ` Alex Williamson
2016-02-24 19:44 ` [PATCH v4 4/6] PCI: Rename dma_alias_is_enabled() to pci_devs_are_dma_aliases() Bjorn Helgaas
2016-04-08 20:19 ` Alex Williamson
2016-02-24 19:44 ` [PATCH v4 5/6] pci: Add DMA alias quirk for mic_x200_dma Bjorn Helgaas
2016-03-03 14:53 ` [PATCH v5 5/6] PCI: " Jacek Lawrynowicz
2016-04-08 20:19 ` Alex Williamson
2016-02-24 19:44 ` [PATCH v4 6/6] PCI: Squash pci_dev_flags to remove holes Bjorn Helgaas
2016-04-08 20:19 ` Alex Williamson
2016-04-12 4:38 ` [PATCH v4 0/6] PCI: Support multiple DMA aliases Bjorn Helgaas
2016-04-12 16:20 ` Lawrynowicz, Jacek
2016-04-12 18:10 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160229224417.GD3653@localhost \
--to=helgaas@kernel.org \
--cc=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=dwmw2@infradead.org \
--cc=iommu@lists.linux-foundation.org \
--cc=jacek.lawrynowicz@intel.com \
--cc=jroedel@suse.de \
--cc=linux-pci@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).