From: Jason Gunthorpe <jgg@nvidia.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Logan Gunthorpe <logang@deltatee.com>,
LKML <linux-kernel@vger.kernel.org>,
Bjorn Helgaas <helgaas@kernel.org>, Marc Zygnier <maz@kernel.org>,
Alex Williamson <alex.williamson@redhat.com>,
Kevin Tian <kevin.tian@intel.com>,
Megha Dey <megha.dey@intel.com>, Ashok Raj <ashok.raj@intel.com>,
linux-pci@vger.kernel.org,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Jon Mason <jdmason@kudzu.us>, Dave Jiang <dave.jiang@intel.com>,
Allen Hubbe <allenbh@gmail.com>,
linux-ntb@googlegroups.com, linux-s390@vger.kernel.org,
Heiko Carstens <hca@linux.ibm.com>,
Christian Borntraeger <borntraeger@de.ibm.com>,
x86@kernel.org, Joerg Roedel <jroedel@suse.de>,
iommu@lists.linux-foundation.org
Subject: Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
Date: Thu, 2 Dec 2021 20:37:49 -0400 [thread overview]
Message-ID: <20211203003749.GT4670@nvidia.com> (raw)
In-Reply-To: <87o85y63m8.ffs@tglx>
On Thu, Dec 02, 2021 at 11:31:11PM +0100, Thomas Gleixner wrote:
> The software representation aka struct msi_desc is a different
> story. That's what we are debating.
Okay, I did mean msi_desc storage, so we are talking about the same thigns
> >> Of course we can store them in pci_dev.dev.msi.data.store. Either with a
> >> dedicated xarray or by partitioning the xarray space. Both have their
> >> pro and cons.
> >
> > This decision seems to drive the question of how many 'struct devices'
> > do we need, and where do we get them..
>
> Not really. There is nothing what enforces to make the MSI irqdomain
> storage strictly hang off struct device. There has to be a connection to
> a struct device in some way obviously to make IOMMU happy.
Yes, I thought this too, OK we agree
> Just badly named because the table itself is where the resulting message
> is stored, which is composed with the help of the relevant MSI
> descriptor. See above.
I picked the name because it looks like it will primarily contain an
xarray and the API you suggested to query it is idx based. To me that
is a table. A table of msi_desc storage to be specific.
It seems we have some agreement here as your lfunc also included the
same xarray and uses an idx as part of the API, right?
> We really should not try to make up an artifical table representation
> for something which does not necessarily have a table at all, i.e. the
> devices you talk about which store the message in queue specific system
> memory. Pretending that this is a table is just silly.
Now I'm a bit confused, what is the idx in the lfunc?
I think this is language again, I would call idx an artificial table
representation?
> Also I disagree that this has to be tied to a PCI specific interface,
> except for creating a PCI specific wrapper for it to not make a driver
> developer have to write '&pdev->dev', which is the very least of our
> problems.
Agreed, just trying to be illustrative at a higher level
> Aside of that 'my_irq_chip' does not cut it at all because of the way
> how the resulting messages are stored. IDXD has IOMEM storage and a
> storage space limitation while your device uses system memory storage
> and has other limitations, i.e. system memory and the number of queues
> the device can provide.
Okay, so the device must setup the domain because it must provide
information during setup. Makes sense
> Not really. I was really reasoning about an abstract representation for
> a functional queue, which is more than just a queue allocated from the
> PF or VF device.
>
> I really meant a container like this:
>
> struct logical_function {
> /* Pointer to the physical device */
> struct device *phys_device;
> /* MSI descriptor storage */
> struct msi_data msi;
Up to here is basically what I thought the "message table" would
contain. Possibly a pointer to the iommu_domain too?
> /* The queue number */
> unsigned int queue_nr;
> /* Add more information which is common to these things */
Not sure why do we need this?
Lets imagine a simple probe function for a simple timer device. It has
no cdev, vfio, queues, etc. However, the device only supports IMS. No
MSI, MSI-X, nothing else.
What does the probe function look like to get to request_irq()?
Why does this simple driver create something called a 'logical
function' to access its only interrupt?
I think you have the right idea here, just forget about VFIO, the IDXD
use case, all of it. Provide a way to use IMS cleanly and concurrently
with MSI.
Do that and everything else should have sane solutions too.
> The idea is to have a common representation for these type of things
> which allows:
>
> 1) Have common code for exposing queues to VFIO, cdev, sysfs...
>
> You still need myqueue specific code, but the common stuff which is
> in struct logical_function can be generic and device independent.
I will quote you: "Seriously, you cannot make something uniform which
is by definition non-uniform." :)
We will find there is no common stuff here, we did this exercise
already when we designed struct auxiliary_device, and came up
empty.
> 2) Having the MSI storage per logical function (queue) allows to have
> a queue relative 0 based MSI index space.
Can you explain a bit what you see 0 meaning in this idx numbering?
I also don't understand what 'queue relative' means?
> The actual index in the physical table (think IMS) would be held in
> the msi descriptor itself.
This means that a device like IDXD would store the phsical IMS table
entry # in the msi descriptor? What is idx then?
For a device like IDXD with a simple linear table, how does the driver
request a specific entry in the IMS table? eg maybe IMS table entry #0
is special like we often see in MSI?
> msi_get_virq(&myqueue->lfunc.msi, idx = 0)
>
> v.s.
>
> idx = myqueue->msidx[0];
> msi_get_virq(pcidev->dev, idx);
> where the queue management code has to set up myqueue->msidx[]
> and stick the index of the underlying device storage into it.
For the devices I know about there are two approaches for allocating
interrupts.
Many devices have a few special interrupts at fixed table
indexes. These can only be used for their single purpose. Eg MSI 0 and
1. An example is IRQ 1 signals the device had an error, read the error
reporting MMIO registers.
Then the device has a broad pool of fungible vectors. These are all
interchangeable, any vector can be used with any queue. eg MSI 2->128
A common mode operation is to setup the special vectors first, and
then assign on demand the pool vectors as CPUs are allocated for
use. Basically each CPU gets a vector and then many device objects are
attached to that vector. This scheme gives CPU locality and doesn't
exhaust CPU vectors. As example nvme uses 1 queue per CPU and 1 vector
per queue and userspace tells it how many queues (and thus CPUs) to
consume.
A device like mlx5 may have 1000's of queues and hundreds of sub
devices sharing a single CPU interrupt. This is desirable because we
can quickly run out of interrupts when we are talking about 100's of
concurrent subdevices.
When a "VFIO mdev" gets involved (mlx5 has this already, it is just
called VDPA, don't ask) the same allocator is used by the VFIO part
except VFIO has to request a dedicated MSI. Dedicated is because the
Intel VTx is used to deliver that MSI addr/data directly to the
guest's vAPIC.
When I imagine mlx5 using IMS, I see IMS as a simple extension of the
existing MSI-X vector pool. Every IMS vector is interchangable and the
main PCI driver will apply exactly the same allocation algorithm we
already have today for MSI, just with even more vectors. When VFIO
wants a vector it may get a MSI or it may get an IMS, I don't want to
care.
All of this about logical functions just confuses
responsibilities. The IRQ layer should be worrying about configuring
IRQs and not dictating how the device will design its IRQ assignment
policy or subdevice scheme.
> 3) Setup and teardown would be simply per logical function for
> all of the related resources which are required.
>
> Interrrupt teardown would look like this:
>
> msi_domain_free_all_irqs(irqdomain, &lfunc->msi);
Looks nice
> Now change struct logical_function to:
>
> struct logical_function {
> - /* Pointer to the physical device */
> - struct device *phys_device;
>
> + /* Pseudo device to allow using devres */
> + struct pseudo_device pseudo_device;
>
> /* MSI descriptor storage */
> struct msi_data msi;
> /* The queue number */
> unsigned int queue_nr;
> /* Add more information which is common to these things */
> };
>
> where struct pseudo_device holds the phys_device pointer and then you
> can utilize the devres infrastructure like you do for any other device
> and do:
>
> pseudo_device_add(&myqueue->lfunc.pseudo_device);
>
> at setup time and
>
> pseudo_device_remove(&myqueue->lfunc.pseudo_device);
>
> on teardown and let all the resources including MSI interrupts be
> released automatically.
We are already doing this with struct auxiliary_device and other
similar things. I don't see the value to couple the msi_data into the
lifetime model?
> I might be completely off track. Feel free to tell me so :)
I think you have the right core idea, just shrink your thinking of
logical_function to only cover msi stuff and leave the device model to
the other places that already have good mechansims. eg auxiliary device
IMHO this has become hyper focused on the special IDXD VFIO use case -
step back and think about my timer example above - a simple pci_driver
that just wants to use IMS for itself. No queues, no VFIO, no
mess. Just how does it configure and use the interrupt source.
Is it helpful feedback?
Regards,
Jason
next prev parent reply other threads:[~2021-12-03 0:38 UTC|newest]
Thread overview: 184+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-27 1:22 [patch 00/32] genirq/msi, PCI/MSI: Spring cleaning - Part 2 Thomas Gleixner
2021-11-27 1:22 ` [patch 01/32] genirq/msi: Move descriptor list to struct msi_device_data Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 12:19 ` Greg Kroah-Hartman
2021-11-27 1:22 ` [patch 02/32] genirq/msi: Add mutex for MSI list protection Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 03/32] genirq/msi: Provide msi_domain_alloc/free_irqs_descs_locked() Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 04/32] genirq/msi: Provide a set of advanced MSI accessors and iterators Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-28 1:00 ` Jason Gunthorpe
2021-11-28 19:22 ` Thomas Gleixner
2021-11-29 9:26 ` Thomas Gleixner
2021-11-29 14:01 ` Jason Gunthorpe
2021-11-29 14:46 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 05/32] genirq/msi: Provide msi_alloc_msi_desc() and a simple allocator Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 06/32] genirq/msi: Provide domain flags to allocate/free MSI descriptors automatically Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 07/32] genirq/msi: Count the allocated MSI descriptors Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 12:19 ` Greg Kroah-Hartman
2021-11-27 19:22 ` Thomas Gleixner
2021-11-27 19:45 ` Thomas Gleixner
2021-11-28 11:07 ` Greg Kroah-Hartman
2021-11-28 19:23 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 08/32] PCI/MSI: Protect MSI operations Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 09/32] PCI/MSI: Use msi_add_msi_desc() Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 10/32] PCI/MSI: Let core code free MSI descriptors Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 11/32] PCI/MSI: Use msi_on_each_desc() Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 12/32] x86/pci/xen: Use msi_for_each_desc() Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 13/32] xen/pcifront: Rework MSI handling Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 14/32] s390/pci: Rework MSI descriptor walk Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-29 10:31 ` Niklas Schnelle
2021-11-29 13:04 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 15/32] powerpc/4xx/hsta: Rework MSI handling Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 16/32] powerpc/cell/axon_msi: Convert to msi_on_each_desc() Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 17/32] powerpc/pasemi/msi: Convert to msi_on_each_dec() Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 18/32] powerpc/fsl_msi: Use msi_for_each_desc() Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 19/32] powerpc/mpic_u3msi: Use msi_for_each-desc() Thomas Gleixner
2021-11-27 1:23 ` Thomas Gleixner
2021-11-27 1:22 ` [patch 20/32] PCI: hv: Rework MSI handling Thomas Gleixner
2021-11-27 1:24 ` Thomas Gleixner
2021-11-27 1:23 ` [patch 21/32] NTB/msi: Convert to msi_on_each_desc() Thomas Gleixner
2021-11-27 1:24 ` Thomas Gleixner
2021-11-29 18:21 ` Logan Gunthorpe
2021-11-29 20:51 ` Thomas Gleixner
2021-11-29 22:27 ` Logan Gunthorpe
2021-11-29 22:50 ` Dave Jiang
2021-11-29 23:31 ` Jason Gunthorpe
2021-11-29 23:52 ` Logan Gunthorpe
2021-11-30 0:01 ` Jason Gunthorpe
2021-11-30 0:29 ` Thomas Gleixner
2021-11-30 19:21 ` Logan Gunthorpe
2021-11-30 19:48 ` Thomas Gleixner
2021-11-30 20:14 ` Logan Gunthorpe
2021-11-30 20:28 ` Jason Gunthorpe
2021-11-30 21:23 ` Thomas Gleixner
2021-12-01 0:17 ` Jason Gunthorpe
2021-12-01 10:16 ` Thomas Gleixner
2021-12-01 13:00 ` Jason Gunthorpe
2021-12-01 17:35 ` Thomas Gleixner
2021-12-01 18:14 ` Jason Gunthorpe
2021-12-01 18:46 ` Logan Gunthorpe
2021-12-01 20:21 ` Thomas Gleixner
2021-12-02 0:01 ` Thomas Gleixner
2021-12-02 13:55 ` Jason Gunthorpe
2021-12-02 14:23 ` Greg Kroah-Hartman
2021-12-02 14:45 ` Jason Gunthorpe
2021-12-02 19:25 ` Thomas Gleixner
2021-12-02 20:00 ` Jason Gunthorpe
2021-12-02 22:31 ` Thomas Gleixner
2021-12-03 0:37 ` Jason Gunthorpe [this message]
2021-12-03 15:07 ` Thomas Gleixner
2021-12-03 16:41 ` Jason Gunthorpe
2021-12-04 14:20 ` Thomas Gleixner
2021-12-05 14:16 ` Thomas Gleixner
2021-12-06 14:43 ` Jason Gunthorpe
2021-12-06 15:47 ` Thomas Gleixner
2021-12-06 17:00 ` Jason Gunthorpe
2021-12-06 20:28 ` Thomas Gleixner
2021-12-06 21:06 ` Jason Gunthorpe
2021-12-06 22:21 ` Thomas Gleixner
2021-12-06 14:19 ` Jason Gunthorpe
2021-12-06 15:06 ` Thomas Gleixner
2021-12-09 6:26 ` Tian, Kevin
2021-12-09 9:03 ` Thomas Gleixner
2021-12-09 12:17 ` Tian, Kevin
2021-12-09 15:57 ` Thomas Gleixner
2021-12-10 7:37 ` Tian, Kevin
2021-12-09 5:41 ` Tian, Kevin
2021-12-09 5:47 ` Jason Wang
2021-12-01 16:28 ` Dave Jiang
2021-12-01 18:41 ` Thomas Gleixner
2021-12-01 18:47 ` Dave Jiang
2021-12-01 20:25 ` Thomas Gleixner
2021-12-01 21:21 ` Dave Jiang
2021-12-01 21:44 ` Thomas Gleixner
2021-12-01 21:49 ` Dave Jiang
2021-12-01 22:03 ` Thomas Gleixner
2021-12-01 22:53 ` Dave Jiang
2021-12-01 23:57 ` Thomas Gleixner
2021-12-09 5:23 ` Tian, Kevin
2021-12-09 8:37 ` Thomas Gleixner
2021-12-09 12:31 ` Tian, Kevin
2021-12-09 16:21 ` Jason Gunthorpe
2021-12-09 20:32 ` Thomas Gleixner
2021-12-09 20:58 ` Jason Gunthorpe
2021-12-09 22:09 ` Thomas Gleixner
2021-12-10 0:26 ` Thomas Gleixner
2021-12-10 7:29 ` Tian, Kevin
2021-12-10 12:13 ` Thomas Gleixner
2021-12-11 8:06 ` Tian, Kevin
2021-12-10 12:39 ` Jason Gunthorpe
2021-12-10 19:00 ` Thomas Gleixner
2021-12-11 7:44 ` Tian, Kevin
2021-12-11 13:04 ` Thomas Gleixner
2021-12-12 1:56 ` Tian, Kevin
2021-12-12 20:55 ` Thomas Gleixner
2021-12-12 23:37 ` Jason Gunthorpe
2021-12-13 7:50 ` Tian, Kevin
2022-09-15 9:24 ` Tian, Kevin
2022-09-20 14:09 ` Jason Gunthorpe
2022-09-21 7:57 ` Tian, Kevin
2022-09-21 12:48 ` Jason Gunthorpe
2022-09-22 5:11 ` Tian, Kevin
2022-09-22 12:13 ` Jason Gunthorpe
2022-09-22 22:42 ` Tian, Kevin
2022-09-23 13:26 ` Jason Gunthorpe
2021-12-11 7:52 ` Tian, Kevin
2021-12-12 0:12 ` Thomas Gleixner
2021-12-12 2:14 ` Tian, Kevin
2021-12-12 20:50 ` Thomas Gleixner
2021-12-12 23:42 ` Jason Gunthorpe
2021-12-10 7:36 ` Tian, Kevin
2021-12-10 12:30 ` Jason Gunthorpe
2021-12-12 6:44 ` Mika Penttilä
2021-12-12 23:27 ` Jason Gunthorpe
2021-12-01 14:52 ` Thomas Gleixner
2021-12-01 15:11 ` Jason Gunthorpe
2021-12-01 18:37 ` Thomas Gleixner
2021-12-01 18:47 ` Jason Gunthorpe
2021-12-01 20:26 ` Thomas Gleixner
2022-12-05 18:25 ` [tip: irq/core] PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X tip-bot2 for Thomas Gleixner
2022-12-05 21:41 ` tip-bot2 for Thomas Gleixner
2021-11-27 1:23 ` [patch 22/32] soc: ti: ti_sci_inta_msi: Rework MSI descriptor allocation Thomas Gleixner
2021-11-27 1:24 ` Thomas Gleixner
2021-11-27 1:23 ` [patch 23/32] soc: ti: ti_sci_inta_msi: Remove ti_sci_inta_msi_domain_free_irqs() Thomas Gleixner
2021-11-27 1:24 ` Thomas Gleixner
2021-11-27 1:23 ` [patch 24/32] bus: fsl-mc-msi: Simplify MSI descriptor handling Thomas Gleixner
2021-11-27 1:24 ` Thomas Gleixner
2021-11-27 1:23 ` [patch 25/32] platform-msi: Let core code handle MSI descriptors Thomas Gleixner
2021-11-27 1:24 ` Thomas Gleixner
2021-11-27 1:23 ` [patch 26/32] platform-msi: Simplify platform device MSI code Thomas Gleixner
2021-11-27 1:24 ` Thomas Gleixner
2021-11-27 1:23 ` [patch 27/32] genirq/msi: Make interrupt allocation less convoluted Thomas Gleixner
2021-11-27 1:24 ` Thomas Gleixner
2021-11-27 1:23 ` [patch 28/32] genirq/msi: Convert to new functions Thomas Gleixner
2021-11-27 1:24 ` Thomas Gleixner
2021-11-27 1:23 ` [patch 29/32] genirq/msi: Mop up old interfaces Thomas Gleixner
2021-11-27 1:24 ` Thomas Gleixner
2021-11-27 1:23 ` [patch 30/32] genirq/msi: Add abuse prevention comment to msi header Thomas Gleixner
2021-11-27 1:24 ` Thomas Gleixner
2021-11-27 1:23 ` [patch 31/32] genirq/msi: Simplify sysfs handling Thomas Gleixner
2021-11-27 1:24 ` Thomas Gleixner
2021-11-27 12:32 ` Greg Kroah-Hartman
2021-11-27 19:31 ` Thomas Gleixner
2021-11-28 11:07 ` Greg Kroah-Hartman
2021-11-28 19:33 ` Thomas Gleixner
2021-11-27 1:23 ` [patch 32/32] genirq/msi: Convert storage to xarray Thomas Gleixner
2021-11-27 1:24 ` Thomas Gleixner
2021-11-27 12:33 ` Greg Kroah-Hartman
2021-11-27 1:23 ` [patch 00/32] genirq/msi, PCI/MSI: Spring cleaning - Part 2 Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211203003749.GT4670@nvidia.com \
--to=jgg@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=allenbh@gmail.com \
--cc=ashok.raj@intel.com \
--cc=borntraeger@de.ibm.com \
--cc=dave.jiang@intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=hca@linux.ibm.com \
--cc=helgaas@kernel.org \
--cc=iommu@lists.linux-foundation.org \
--cc=jdmason@kudzu.us \
--cc=jroedel@suse.de \
--cc=kevin.tian@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-ntb@googlegroups.com \
--cc=linux-pci@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=maz@kernel.org \
--cc=megha.dey@intel.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).