From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: "Lukas Wunner" <lukas@wunner.de>,
"Bjorn Helgaas" <helgaas@kernel.org>,
"Mahesh J Salgaonkar" <mahesh@linux.ibm.com>,
"Bjorn Helgaas" <bhelgaas@google.com>,
linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
"Davidlohr Bueso" <dave@stgolabs.net>,
"Dave Jiang" <dave.jiang@intel.com>,
"Alison Schofield" <alison.schofield@intel.com>,
"Vishal Verma" <vishal.l.verma@intel.com>,
"Ira Weiny" <ira.weiny@intel.com>,
"Dan Williams" <dan.j.williams@intel.com>,
"Will Deacon" <will@kernel.org>,
"Mark Rutland" <mark.rutland@arm.com>,
"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
terry.bowman@amd.com,
"Kuppuswamy Sathyanarayanan"
<sathyanarayanan.kuppuswamy@linux.intel.com>,
"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
Subject: Re: [RFC PATCH 0/9] pci: portdrv: Add auxiliary bus and register CXL PMUs (and aer)
Date: Thu, 12 Sep 2024 18:34:29 +0100 [thread overview]
Message-ID: <20240912183429.00006fa7@Huawei.com> (raw)
In-Reply-To: <20240912173720.000077ac@Huawei.com>
On Thu, 12 Sep 2024 17:37:20 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> On Tue, 10 Sep 2024 22:04:18 +0200
> Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > On Tue, Sep 10 2024 at 17:47, Jonathan Cameron wrote:
> > > There are quite a few places that make the assumption that the
> > > preirq_data->parent->chip is the right chip to for example call
> > > irq_set_affinity on.
> > >
> > > So the simple way to make it all work is to just have
> > > a msi_domain_template->msi_domain_ops->prepare_desc
> > > that sets the desc->dev = to the parent device (here the
> > > pci_dev->dev)
> >
> > Creative :)
>
> One bit that is challenging me is that the PCI access functions
> use the pci_dev that is embedded via irq_data->common->msi_desc->dev
> (and hence doesn't give us a obvious path to any hierarchical structures)
> E.g. __pci_write_msi_msg() and which checks the pci_dev is
> powered up.
>
> I'd like to be able to do a call to the parent similar to irq_chip_unmask_parent
> to handle write_msi_msg() for the new device domain.
>
> So how should this be avoided or should msi_desc->dev be the
> PCI device?
I thought about this whilst cycling home. Perhaps my fundamental
question is more basic Which device should
msi_dec->dev be?
* The ultimate requester of the msi - so here the one calling
our new pci_msix_subdev_alloc_at(),
* The one on which the msi_write_msg() should occur. - here
that would be the struct pci_dev given the checks needed on
whether the device is powered up. If this is the case, can
we instead use the irq_data->dev in __pci_write_msi_msg()?
Also, is it valid to use the irq_domain->dev for callbacks such
as msi_prepare_desc on basis that (I think) is the device
that 'hosts' the irq domain, so can we use it to replace the
use of msi_desc->dev in pci_msix_prepare_desc()
If we can that enables our subdev to use a prepare_desc callback
that does something like
struct msi_domain *info;
domain = domain->parent;
info = domain->host_data;
info->ops->prepare_desc(domain, arg, desc);
Thanks for any pointers!
Jonathan
>
> I can see some options but maybe I'm missing the big picture.
> 1) Stop them using that path to get to the device because it
> might not be the right one. Instead use device via irq_data->chip_data
> (which is currently unused for these cases).
> Some slightly fiddly work will be needed to handle changing __pci_write_msi_msg()
> to talk irq_data though.
> 2) Maybe setting the msi descriptor dev to the one we actually write
> on is the right solution? (smells bad to me but I'm still getting
> my head around this stuff).
>
> Any immediate thoughts? Or am I still thinking about this the wrong
> way? (which is likely)
>
> >
> > > At that point everything more or less just works and all the
> > > rest of the callbacks can use generic forms.
> > >
> > > Alternative might be to make calls like the one in
> > > arch/x86/kernel/apic/msi.c msi_set_affinity search
> > > for the first ancestor device that has an irq_set_affinity().
> > > That unfortunately may affect quite a few places.
> >
> > What's the problem? None of the CXL drivers should care about that at
> > all. Delegating other callbacks to the parent domain is what hierarchical
> > domains are about. If there is some helper missing, so we add it.
> >
> > > Anyhow, I'll probably send out the prepare_desc hack set with driver
> > > usage etc after I've written up a proper description of problems
> > > encountered etc so you can see what it all looks like and will be more
> > > palatable in general but in the meantime this is the guts of it of the
> > > variant where the subdev related desc has the dev set to the parent
> > > device.
> >
> > Let me look at this pile then :)
> >
> > > Note for the avoidance of doubt, I don't much like this
> > > solution but maybe it will grow on me ;)
> >
> > Maybe, but I doubt it will survive my abstraction taste check.
> >
> > > +static const struct msi_parent_ops pci_msix_parent_ops = {
> > > + .supported_flags = MSI_FLAG_PCI_MSIX | MSI_FLAG_PCI_MSIX_ALLOC_DYN,
> > > + .prefix = "PCI-MSIX-PROXY-",
> > > + .init_dev_msi_info = msi_parent_init_dev_msi_info,
> >
> > The obligatory link to:
> >
> > https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#struct-declarations-and-initializers
> >
>
> Thanks. Will fix up before I get anywhere near a real posting!
>
> > > +};
> > > +
> > > +int pci_msi_enable_parent_domain(struct pci_dev *pdev)
> > > +{
> > > + struct irq_domain *msix_dom;
> > > + /* TGLX: Validate has v2 parent domain */
> > > + /* TGLX: validate msix enabled */
> > > + /* TGLX: Validate msix domain supports dynamics msi-x */
> > > +
> > > + /* Enable PCI device parent domain */
> > > + msix_dom = dev_msi_get_msix_device_domain(&pdev->dev);
> > > + msix_dom->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
> > > + msix_dom->msi_parent_ops = &pci_msix_parent_ops;
> >
> > That want's to be a call into the MSI core code, Something like:
> >
> > msi_enable_sub_parent_domain(&pdev->dev, DOMAIN_BUS_PCI_DEVICE_MSIX)
> >
> > or something like that. I really don't want to expose the internals of
> > MSI. I spent a lot of effort to encapsulate that...
> >
> > > +void pci_msi_init_subdevice(struct pci_dev *pdev, struct device *dev)
> > > +{
> > > + dev_set_msi_domain(dev, dev_msi_get_msix_device_domain(&pdev->dev));
> >
> > msi_set_parent_domain(.....);
> >
> > > +static void pci_subdev_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg,
> > > + struct msi_desc *desc)
> > > +{
> > > + struct device *parent = desc->dev->parent;
> > > +
> > > + if (!desc->pci.mask_base) {
> > > + /* Not elegant - but needed for irq affinity to work? */
> >
> > Which makes me ask the question why desc->pci.mask_base is NULL in the
> > first place. That needs some thought at the place which initializes the
> > descriptor.
>
> Maybe this is the other way around? The descriptor is 'never' initialized
> for this case. The equivalent code in msix_prepare_msi_desc() has
> comments
> " This is separate from msix_setup_msi_descs() below to handle dynamic
> allocations for MSI-X after initial enablement."
> So I think this !desc->pci.mask_base should always succeed
> (and so I should get rid of it and run the descriptor initialize unconditionally)
>
> It might be appropriate to call the prepare_desc from the parent msi_domain though
> via irq_domain->parent_domain->host_data
> However this has the same need to not be based on the msi_desc->dev subject
> to the question above.
>
> Jonathan
>
>
>
> >
> > > --- a/kernel/irq/msi.c
> > > +++ b/kernel/irq/msi.c
> > > @@ -1720,3 +1720,8 @@ bool msi_device_has_isolated_msi(struct device *dev)
> > > return arch_is_isolated_msi();
> > > }
> > > EXPORT_SYMBOL_GPL(msi_device_has_isolated_msi);
> > > +struct irq_domain *dev_msi_get_msix_device_domain(struct device *dev)
> >
> > You clearly run out of new lines here :)
> >
> > > +{
> > > + return dev->msi.data->__domains[0].domain;
> > > +}
> >
> > But aside of that. No. See above.
> >
> > Thanks,
> >
> > tglx
> >
>
next prev parent reply other threads:[~2024-09-12 17:34 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-29 16:40 [RFC PATCH 0/9] pci: portdrv: Add auxiliary bus and register CXL PMUs (and aer) Jonathan Cameron
2024-05-29 16:40 ` [RFC PATCH 1/9] pci: pcie: Drop priv_data from struct pcie_device and use dev_get/set_drvdata() instead Jonathan Cameron
2024-05-29 16:40 ` [RFC PATCH 2/9] pci: portdrv: Drop driver field for port type Jonathan Cameron
2024-05-29 16:40 ` [RFC PATCH 3/9] pci: pcie: portdrv: Use managed device handling to simplify error and remove flows Jonathan Cameron
2024-05-29 16:40 ` [RFC PATCH 4/9] auxiliary_bus: expose auxiliary_bus_type Jonathan Cameron
2024-05-29 16:40 ` [RFC PATCH 5/9] pci: pcie: portdrv: Add a auxiliary_bus Jonathan Cameron
2024-05-29 16:41 ` [RFC PATCH 6/9] cxl: Move CPMU register definitions to header Jonathan Cameron
2024-05-29 16:41 ` [RFC PATCH 7/9] pci: pcie/cxl: Register an auxiliary device for each CPMU instance Jonathan Cameron
2024-05-29 16:41 ` [RFC PATCH 8/9] perf: cxl: Make the cpmu driver also work with auxiliary_devices Jonathan Cameron
2024-05-29 16:41 ` [RFC PATCH 9/9] pci: pcie: portdrv: aer: Switch to auxiliary_bus Jonathan Cameron
2024-06-05 18:04 ` [RFC PATCH 0/9] pci: portdrv: Add auxiliary bus and register CXL PMUs (and aer) Bjorn Helgaas
2024-06-05 19:44 ` Jonathan Cameron
2024-06-06 12:57 ` Jonathan Cameron
2024-06-06 13:21 ` Lukas Wunner
2024-08-23 11:05 ` Jonathan Cameron
2024-08-28 21:11 ` Thomas Gleixner
2024-08-29 12:17 ` Jonathan Cameron
2024-09-05 11:23 ` Jonathan Cameron
2024-09-06 10:11 ` Thomas Gleixner
2024-09-06 17:18 ` Jonathan Cameron
2024-09-10 16:47 ` Jonathan Cameron
2024-09-10 17:37 ` Jonathan Cameron
2024-09-10 20:04 ` Thomas Gleixner
2024-09-12 16:37 ` Jonathan Cameron
2024-09-12 17:34 ` Jonathan Cameron [this message]
2024-09-13 16:24 ` Thomas Gleixner
2024-06-17 7:03 ` Ilpo Järvinen
2024-07-04 16:14 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240912183429.00006fa7@Huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=helgaas@kernel.org \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lpieralisi@kernel.org \
--cc=lukas@wunner.de \
--cc=mahesh@linux.ibm.com \
--cc=mark.rutland@arm.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=terry.bowman@amd.com \
--cc=tglx@linutronix.de \
--cc=vishal.l.verma@intel.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox