From: Alex Williamson <alex.williamson@redhat.com>
To: Andrew Cooks <acooks@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
"open list:PCI SUBSYSTEM" <linux-pci@vger.kernel.org>,
Joerg Roedel <joro@8bytes.org>,
"open list:INTEL IOMMU (VT-d)" <iommu@lists.linux-foundation.org>,
Don Dutile <ddutile@redhat.com>,
David Woodhouse <dwmw2@infradead.org>
Subject: Re: [RFC PATCH v2 1/2] pci: Create PCIe requester ID interface
Date: Wed, 24 Jul 2013 09:50:40 -0600 [thread overview]
Message-ID: <1374681040.5208.12.camel@ul30vt.home> (raw)
In-Reply-To: <CAJtEV7b3JR9J7UN_gzL6deD1JDBTfhQjGQzd_TRMLcWX2aoUfg@mail.gmail.com>
On Wed, 2013-07-24 at 23:03 +0800, Andrew Cooks wrote:
> On Wed, Jul 24, 2013 at 7:21 AM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
> >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> >> > This provides interfaces for drivers to discover the visible PCIe
> >> > requester ID for a device, for things like IOMMU setup, and iterate
> >>
> >> IDs (plural)
> >
> > How does a device can't have multiple requester IDs? Reading below, I'm
> > not sure we're on the same page for the purpose of this patch.
> >
> >> > over the device chain from requestee to requester, including DMA
> >> > quirks at each step.
> >>
> >> "requestee" doesn't make sense to me. The "-ee" suffix added to a verb
> >> normally makes a noun that refers to the object of the action. So
> >> "requestee" sounds like it means something like "target" or "responder,"
> >> but that's not what you mean here.
> >
> > Hmm, ok. I figured a request-er makes a request on behalf of a
> > request-ee. Suggestions?
> >
> >> > Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
> >> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> >> > ---
> >> > drivers/pci/search.c | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >> > include/linux/pci.h | 7 ++
> >> > 2 files changed, 205 insertions(+)
> >> >
> >> > diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> >> > index d0627fa..4759c02 100644
> >> > --- a/drivers/pci/search.c
> >> > +++ b/drivers/pci/search.c
> >> > @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem);
> >> > EXPORT_SYMBOL_GPL(pci_bus_sem);
> >> >
> >> > /*
> >> > + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID
> >> > + * @dev: device to test
> >> > + */
> >> > +static bool pci_has_pcie_requester_id(struct pci_dev *dev)
> >> > +{
> >> > + /*
> >> > + * XXX There's no indicator of the bus type, conventional PCI vs
> >> > + * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe
> >> > + * requester ID is a native PCIe based system (such as VT-d or
> >> > + * AMD-Vi). It's common that PCIe root complex devices do not
> >> > + * include a PCIe capability, but we can assume they are PCIe
> >> > + * devices based on their topology.
> >> > + */
> >> > + if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus))
> >> > + return true;
> >> > +
> >> > + /*
> >> > + * PCI-X devices have a requester ID, but the bridge may still take
> >> > + * ownership of transactions and create a requester ID. We therefore
> >> > + * assume that the PCI-X requester ID is not the same one used on PCIe.
> >> > + */
> >> > +
> >> > +#ifdef CONFIG_PCI_QUIRKS
> >> > + /*
> >> > + * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability.
> >> > + * If the device is a bridge, look to the next device upstream of it.
> >> > + * If that device is PCIe and not a PCIe-to-PCI bridge, then by
> >> > + * deduction, the device must be PCIe and therefore has a requester ID.
> >> > + */
> >> > + if (dev->subordinate) {
> >> > + struct pci_dev *parent = dev->bus->self;
> >> > +
> >> > + if (pci_is_pcie(parent) &&
> >> > + pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
> >> > + return true;
> >> > + }
> >> > +#endif
> >> > +
> >> > + return false;
> >> > +}
> >> > +
> >> > +/*
> >> > + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID?
> >> > + * @dev: requester device
> >> > + * @bridge: upstream bridge (or NULL for root bus)
> >> > + */
> >> > +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev,
> >> > + struct pci_dev *bridge)
> >> > +{
> >> > + /*
> >> > + * The entire path must be tested, if any step does not have a
> >> > + * requester ID, the chain is broken. This allows us to support
> >> > + * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe
> >> > + */
> >> > + while (dev != bridge) {
> >> > + if (!pci_has_pcie_requester_id(dev))
> >> > + return false;
> >> > +
> >> > + if (pci_is_root_bus(dev->bus))
> >> > + return !bridge; /* false if we don't hit @bridge */
> >> > +
> >> > + dev = dev->bus->self;
> >> > + }
> >> > +
> >> > + return true;
> >> > +}
> >> > +
> >> > +/*
> >> > + * Legacy PCI bridges within a root complex (ex. Intel 82801) report
> >> > + * a different requester ID than a standard PCIe-to-PCI bridge. Instead
> >> > + * of using (subordinate << 8 | 0) the use (bus << 8 | devfn), like a
> >>
> >> s/the/they/
> >>
> >> Did you learn about this empirically? Intel spec? I wonder if there's
> >> some way to derive this from the PCIe specs.
> >
> > It's in the current intel-iommu logic, pretty much anywhere it uses
> > pci_find_upstream_pcie_bridge() it follows that up with a pci_is_pcie()
> > check. If it's PCIe, it uses a traditional PCIe-to-PCI bridge ID. If
> > it's a legacy PCI bridge it uses the bridge ID itself.
> >
> > static int
> > domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev,
> > int translation)
> > {
> > ...
> > /* dependent device mapping */
> > tmp = pci_find_upstream_pcie_bridge(pdev);
> > if (!tmp)
> > return 0;
> > ...
> > if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
> > return domain_context_mapping_one(domain,
> > pci_domain_nr(tmp->subordinate),
> > tmp->subordinate->number, 0,
> > translation);
> > else /* this is a legacy PCI bridge */
> > return domain_context_mapping_one(domain,
> > pci_domain_nr(tmp->bus),
> > tmp->bus->number,
> > tmp->devfn,
> > translation);
> >
> > The 82801 reference is from looking at when this would actually happen
> > on one of my own systems.
> >
> >
> >> > + * standard PCIe endpoint. This function detects them.
> >> > + *
> >> > + * XXX Is this Intel vendor ID specific?
> >> > + */
> >> > +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge)
> >> > +{
> >> > + if (!pci_is_pcie(bridge) && pci_is_root_bus(bridge->bus))
> >> > + return true;
> >> > +
> >> > + return false;
> >> > +}
> >> > +
> >> > +#define PCI_REQUESTER_ID(dev) (((dev)->bus->number << 8) | (dev)->devfn)
> >> > +#define PCI_BRIDGE_REQUESTER_ID(dev) ((dev)->subordinate->number << 8)
> >> > +
> >> > +/*
> >> > + * pci_get_visible_pcie_requester - Get requester and requester ID for
> >> > + * @requestee below @bridge
> >> > + * @requestee: requester device
> >> > + * @bridge: upstream bridge (or NULL for root bus)
> >> > + * @requester_id: location to store requester ID or NULL
> >> > + */
> >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
> >> > + struct pci_dev *bridge,
> >> > + u16 *requester_id)
> >>
> >> I'm not sure it makes sense to return a struct pci_dev here because
> >> there's no requirement that a requester ID correspond to an actual
> >> pci_dev.
> >
> > That's why this function is named get_.._requester instead of requester
> > ID. I believe there still has to be a struct pci_dev that does the
> > request, but the requester ID for that device may not actually match.
> > So I return both. In a PCIe-to-PCI bridge case, the pci_dev is the
> > bridge, but the requester ID is either the bridge bus|devfn or
> > subordinate|0 depending on the topology. If we want to support "ghost
> > functions", we can return the real pci_dev and a ghost requester ID.
> >
> > I think if we used just a requester ID, it ends up being extremely
> > difficult to pass that into anything else since we then have to search
> > again for where that requester ID is rooted.
> >
> >> > +{
> >> > + struct pci_dev *requester = requestee;
> >> > +
> >> > + while (requester != bridge) {
> >> > + requester = pci_get_dma_source(requester);
> >> > + pci_dev_put(requester); /* XXX skip ref cnt */
> >> > +
> >> > + if (pci_has_visible_pcie_requester_id(requester, bridge))
> >>
> >> If we acquire the "requester" pointer via a ref-counting interface,
> >> it's illegal to use the pointer after dropping the reference, isn't it?
> >> Maybe that's what you mean by the "XXX" comment.
> >
> >
> > Yes, I was just following your RFC lead and didn't want to deal with
> > reference counting until this approach had enough traction.
> >
> >
> >> > + break;
> >> > +
> >> > + if (pci_is_root_bus(requester->bus))
> >> > + return NULL; /* @bridge not parent to @requestee */
> >> > +
> >> > + requester = requester->bus->self;
> >> > + }
> >> > +
> >> > + if (requester_id) {
> >> > + if (requester->bus != requestee->bus &&
> >> > + !pci_bridge_uses_endpoint_requester(requester))
> >> > + *requester_id = PCI_BRIDGE_REQUESTER_ID(requester);
> >> > + else
> >> > + *requester_id = PCI_REQUESTER_ID(requester);
> >> > + }
> >> > +
> >> > + return requester;
> >> > +}
> >> > +
> >> > +static int pci_do_requester_callback(struct pci_dev **dev,
> >> > + int (*fn)(struct pci_dev *,
> >> > + u16 id, void *),
> >> > + void *data)
> >> > +{
> >> > + struct pci_dev *dma_dev;
> >> > + int ret;
> >> > +
> >> > + ret = fn(*dev, PCI_REQUESTER_ID(*dev), data);
> >> > + if (ret)
> >> > + return ret;
> >> > +
> >> > + dma_dev = pci_get_dma_source(*dev);
> >> > + pci_dev_put(dma_dev); /* XXX skip ref cnt */
> >> > + if (dma_dev == *dev)
> >>
> >> Same ref count question as above.
> >>
> >> > + return 0;
> >> > +
> >> > + ret = fn(dma_dev, PCI_REQUESTER_ID(dma_dev), data);
> >> > + if (ret)
> >> > + return ret;
> >> > +
> >> > + *dev = dma_dev;
> >> > + return 0;
> >> > +}
> >> > +
> >> > +/*
> >> > + * pcie_for_each_requester - Call callback @fn on each devices and DMA source
> >> > + * from @requestee to the PCIe requester ID visible
> >> > + * to @bridge.
> >>
> >> Transactions from a device may appear with one of several requester IDs,
> >> but there's not necessarily an actual pci_dev for each ID, so I think the
> >> caller reads better if it's "...for_each_requester_id()"
> >
> > Wouldn't you expect to pass a requester ID into a function with that
> > name? I'm pretty sure I had it named that at one point but thought the
> > parameters made more sense this way. I'll see if I can think of a
> > better name.
> >
> >> The "Call X on each devices and DMA source from Y to the requester ID"
> >> part doesn't quite make a sentence.
> >
> >
> > Ok
> >
> >> > + * @requestee: Starting device
> >> > + * @bridge: upstream bridge (or NULL for root bus)
> >>
> >> You should say something about the significance of @bridge. I think the
> >> point is to call @fn for every possible requester ID @bridge could see for
> >> transactions from @requestee. This is a way to learn the requester IDs an
> >> IOMMU at @bridge needs to be prepared for.
> >
> > I can add something. @bridge is supposed to be for bridge-based IOMMUs.
> > Essentially it's just a stopping point. There might be PCI topology
> > upstream of @bridge, but if you only want the requester ID seen by the
> > bridge, you don't care.
> >
> >> > + * @fn: callback function
> >> > + * @data: data to pass to callback
> >> > + */
> >> > +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
> >> > + int (*fn)(struct pci_dev *, u16 id, void *),
> >> > + void *data)
> >> > +{
> >> > + struct pci_dev *requester;
> >> > + struct pci_dev *dev = requestee;
> >> > + int ret = 0;
> >> > +
> >> > + requester = pci_get_visible_pcie_requester(requestee, bridge, NULL);
> >> > + if (!requester)
> >> > + return -EINVAL;
> >> > +
> >> > + do {
> >> > + ret = pci_do_requester_callback(&dev, fn, data);
> >> > + if (ret)
> >> > + return ret;
> >> > +
> >> > + if (dev == requester)
> >> > + return 0;
> >> > +
> >> > + /*
> >> > + * We always consider root bus devices to have a visible
> >> > + * requester ID, therefore this should never be true.
> >> > + */
> >> > + BUG_ON(pci_is_root_bus(dev->bus));
> >>
> >> What are we going to do if somebody hits this BUG_ON()? If it's impossible
> >> to hit, we should just remove it. If it's possible to hit it in some weird
> >> topology you didn't consider, we should see IOMMU faults for any requester
> >> ID we neglected to map, and that fault would be a better debugging hint
> >> than a BUG_ON() here.
> >
> > It's mostly for readability. I've learned that we never what to look at
> > dev->bus->self without first testing pci_is_root_bus(dev->bus), so if I
> > was reading this code I'd stumble when I got here and spend too long
> > looking around for the assumption that makes it not needed. I suppose I
> > could just make it a comment, but I thought why not make it official w/
> > a BUG.
> >
> >> > +
> >> > + dev = dev->bus->self;
> >> > +
> >> > + } while (dev != requester);
> >> > +
> >> > + /*
> >> > + * If we've made it here, @requester is a bridge upstream from
> >> > + * @requestee.
> >> > + */
> >> > + if (pci_bridge_uses_endpoint_requester(requester))
> >> > + return pci_do_requester_callback(&requester, fn, data);
> >> > +
> >> > + return fn(requester, PCI_BRIDGE_REQUESTER_ID(requester), data);
> >> > +}
> >> > +
> >> > +/*
> >> > * find the upstream PCIe-to-PCI bridge of a PCI device
> >> > * if the device is PCIE, return NULL
> >> > * if the device isn't connected to a PCIe bridge (that is its parent is a
> >> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> >> > index 3a24e4f..94e81d1 100644
> >> > --- a/include/linux/pci.h
> >> > +++ b/include/linux/pci.h
> >> > @@ -1873,6 +1873,13 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev)
> >> > }
> >> > #endif
> >> >
> >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
> >> > + struct pci_dev *bridge,
> >> > + u16 *requester_id);
> >>
> >> The structure of this interface implies that there is only one visible
> >> requester ID, but the whole point of this patch is that a transaction from
> >> @requestee may appear with one of several requester IDs. So which one will
> >> this return?
> >
> > I thought the point of this patch was to have an integrated interface
> > for finding the requester ID and doing something across all devices with
> > that requester ID and thereby remove pci_find_upstream_pcie_bridge(),
> > provide a way to quirk broken PCIe-to-PCI bridge and quirk dma_ops for
> > devices that use the wrong requester ID. In what case does a device
> > appear to have multiple requester IDs? We have cases where devices use
> > the wrong requester ID, but AFAIK they only use a single wrong requester
> > ID. Thanks,
> >
>
> The cases I've found where multiple requester IDs were used are all related
> to Marvell Sata controllers.
>
> Here's a table of affected devices with links to the
> bug reports. In each case both functions 0 and 1 are used.
I'm confused. There's only one requester ID for a given transaction.
Therefore on a per transaction level there's a 1:1 relationship between
a requester ID and a device. The requester ID may be incorrect, but
there cannot be more than one per transaction. Are you suggesting that
a given device (PCI function) will alternate using one requester ID for
some transactions and a different requester ID for others? I'm thinking
of cases like this bz:
https://bugzilla.redhat.com/show_bug.cgi?id=757166#c16
In that case pci_get_visible_pcie_requester() (assuming no bridges are
involved) should return the device and the requester ID that it would
use if it worked correctly. pcie_for_each_requester() should then call
the callback function using both the correct requester ID and the
quirked requester ID. The reason I mentioned that this doesn't
currently handle ghost functions is because it uses the
pci_get_dma_source() quirk, which only handles actual devices. We'd
need to add a pci_get_dma_alias() or something that would give us a list
of ghost requester IDs to handle these broken marvel devices. Thanks,
Alex
> static const struct pci_dev_dma_multi_source_map {
> u16 vendor;
> u16 device;
> } pci_dev_dma_multi_source_map[] = {
> /* Reported by Patrick Bregman
> * https://bugzilla.redhat.com/show_bug.cgi?id=863653 */
> { PCI_VENDOR_ID_MARVELL_EXT, 0x9120},
>
> /* Reported by Paweł Żak, Korneliusz Jarzębski, Daniel Mayer
> * https://bugzilla.kernel.org/show_bug.cgi?id=42679 and by
> * Justin Piszcz https://lkml.org/lkml/2012/11/24/94 */
> { PCI_VENDOR_ID_MARVELL_EXT, 0x9123},
>
> /* Reported by Robert Cicconetti
> * https://bugzilla.kernel.org/show_bug.cgi?id=42679 and by
> * Fernando https://bugzilla.redhat.com/show_bug.cgi?id=757166 */
> { PCI_VENDOR_ID_MARVELL_EXT, 0x9128},
>
> /* Reported by Stijn Tintel
> * https://bugzilla.kernel.org/show_bug.cgi?id=42679 */
> { PCI_VENDOR_ID_MARVELL_EXT, 0x9130},
>
> /* Reported by Gaudenz Steinlin
> * https://lkml.org/lkml/2013/3/5/288 */
> { PCI_VENDOR_ID_MARVELL_EXT, 0x9143},
>
> /* Reported by Andrew Cooks
> * https://bugzilla.kernel.org/show_bug.cgi?id=42679 */
> { PCI_VENDOR_ID_MARVELL_EXT, 0x9172}
> };
>
>
> > Alex
> >
> >> > +int pcie_for_each_requester(struct pci_dev *requestee, struct pci_dev *bridge,
> >> > + int (*fn)(struct pci_dev *, u16 id, void *),
> >> > + void *data);
> >> > +
> >> > /**
> >> > * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device
> >> > * @pdev: the PCI device
> >> >
> >
> >
> >
next prev parent reply other threads:[~2013-07-24 15:51 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-11 21:03 [RFC PATCH v2 0/2] pci/iommu: PCIe requester ID interface Alex Williamson
2013-07-11 21:03 ` [RFC PATCH v2 1/2] pci: Create " Alex Williamson
2013-07-23 22:35 ` Bjorn Helgaas
2013-07-23 23:21 ` Alex Williamson
2013-07-24 15:03 ` Andrew Cooks
2013-07-24 15:50 ` Alex Williamson [this message]
2013-07-24 16:47 ` Bjorn Helgaas
2013-07-24 18:12 ` Alex Williamson
2013-07-24 23:24 ` Bjorn Helgaas
2013-07-25 17:56 ` Alex Williamson
2013-07-26 21:54 ` Bjorn Helgaas
2013-07-29 16:06 ` Alex Williamson
2013-07-29 21:02 ` Bjorn Helgaas
2013-07-29 22:32 ` Alex Williamson
2013-08-01 22:08 ` Bjorn Helgaas
2013-08-02 16:59 ` Alex Williamson
2014-04-03 21:48 ` Bjorn Helgaas
2014-04-04 2:51 ` Alex Williamson
2014-04-04 15:00 ` Bjorn Helgaas
2013-07-29 21:03 ` Don Dutile
2013-07-29 22:55 ` Alex Williamson
2013-07-24 20:42 ` Don Dutile
2013-07-24 21:22 ` Alex Williamson
2013-07-25 18:38 ` Don Dutile
2013-07-25 17:19 ` Bjorn Helgaas
2013-07-25 18:25 ` Don Dutile
2013-07-26 19:48 ` Bjorn Helgaas
2013-07-26 20:04 ` Don Dutile
2013-07-11 21:03 ` [RFC PATCH v2 2/2] iommu/intel: Make use of " Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1374681040.5208.12.camel@ul30vt.home \
--to=alex.williamson@redhat.com \
--cc=acooks@gmail.com \
--cc=bhelgaas@google.com \
--cc=ddutile@redhat.com \
--cc=dwmw2@infradead.org \
--cc=iommu@lists.linux-foundation.org \
--cc=joro@8bytes.org \
--cc=linux-pci@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).