From: Alex Williamson <alex.williamson@redhat.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: "Chatre, Reinette" <reinette.chatre@intel.com>,
"jgg@nvidia.com" <jgg@nvidia.com>,
"yishaih@nvidia.com" <yishaih@nvidia.com>,
"shameerali.kolothum.thodi@huawei.com"
<shameerali.kolothum.thodi@huawei.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Jiang, Dave" <dave.jiang@intel.com>,
"Liu, Jing2" <jing2.liu@intel.com>,
"Raj, Ashok" <ashok.raj@intel.com>,
"Yu, Fenghua" <fenghua.yu@intel.com>,
"tom.zanussi@linux.intel.com" <tom.zanussi@linux.intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"patches@lists.linux.dev" <patches@lists.linux.dev>
Subject: Re: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)
Date: Fri, 3 Nov 2023 09:51:19 -0600 [thread overview]
Message-ID: <20231103095119.63aa796f.alex.williamson@redhat.com> (raw)
In-Reply-To: <BN9PR11MB5276BCEA3275EC7203E06FDA8CA5A@BN9PR11MB5276.namprd11.prod.outlook.com>
On Fri, 3 Nov 2023 07:23:13 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Friday, November 3, 2023 5:14 AM
> >
> > On Thu, 2 Nov 2023 03:14:09 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >
> > > > From: Tian, Kevin
> > > > Sent: Thursday, November 2, 2023 10:52 AM
> > > >
> > > > >
> > > > > Without an in-tree user of this code, we're just chopping up code for
> > > > > no real purpose. There's no reason that a variant driver requiring IMS
> > > > > couldn't initially implement their own SET_IRQS ioctl. Doing that
> > > >
> > > > this is an interesting idea. We haven't seen a real usage which wants
> > > > such MSI emulation on IMS for variant drivers. but if the code is
> > > > simple enough to demonstrate the 1st user of IMS it might not be
> > > > a bad choice. There are additional trap-emulation required in the
> > > > device MMIO bar (mostly copying MSI permission entry which contains
> > > > PASID info to the corresponding IMS entry). At a glance that area
> > > > is 4k-aligned so should be doable.
> > > >
> > >
> > > misread the spec. the MSI-X permission table which provides
> > > auxiliary data to MSI-X table is not 4k-aligned. It sits in the 1st
> > > 4k page together with many other registers. emulation of them
> > > could be simple with a native read/write handler but not sure
> > > whether any of them may sit in a hot path to affect perf due to
> > > trap...
> >
> > I'm not sure if you're referring to a specific device spec or the PCI
> > spec, but the PCI spec has long included an implementation note
> > suggesting alignment of the MSI-X vector table and pba and separation
> > from CSRs, and I see this is now even more strongly worded in the 6.0
> > spec.
> >
> > Note though that for QEMU, these are emulated in the VMM and not
> > written through to the device. The result of writes to the vector
> > table in the VMM are translated to vector use/unuse operations, which
> > we see at the kernel level through SET_IRQS ioctl calls. Are you
> > expecting to get PASID information written by the guest through the
> > emulated vector table? That would entail something more than a simple
> > IMS backend to MSI-X frontend. Thanks,
> >
>
> I was referring to IDXD device spec. Basically it allows a process to
> submit a descriptor which contains a completion interrupt handle.
> The handle is the index of a MSI-X entry or IMS entry allocated by
> the idxd driver. To mark the association between application and
> related handles the driver records the PASID of the application
> in an auxiliary structure for MSI-X (called MSI-X permission table)
> or directly in the IMS entry. This additional info includes whether
> an MSI-X/IMS entry has PASID enabled and if yes what is the PASID
> value to be checked against the descriptor.
>
> As you said virtualizing MSI-X table itself is via SET_IRQS and it's
> 4k aligned. Then we also need to capture guest updates to the MSI-X
> permission table and copy the PASID information into the
> corresponding IMS entry when using the IMS backend. It's MSI-X
> permission table not 4k aligned then trapping it will affect adjacent
> registers.
>
> My quick check in idxd spec doesn't reveal an real impact in perf
> critical path. Most registers are configuration/control registers
> accessed at driver init time and a few interrupt registers related
> to errors or administrative purpose.
Right, it looks like you'll need to trap writes to the MSI-X
Permissions Table via a sparse mmap capability to avoid assumptions
whether it lives on the same page as the MSI-X vector table or PBA.
Ideally the hardware folks have considered this to avoid any conflict
with latency sensitive registers.
The variant driver would use this for collecting the meta data relative
to the IMS interrupt, but this is all tangential to whether we
preemptively slice up vfio-pci-core's SET_IRQS ioctl or the iDXD driver
implements its own.
And just to be clear, I don't expect the iDXD variant driver to go to
extraordinary lengths to duplicate the core ioctl, we can certainly
refactor and export things where it makes sense, but I think it likely
makes more sense for the variant driver to implement the shell of the
ioctl rather than trying to multiplex the entire core ioctl with an ops
structure that's so intimately tied to the core implementation and
focused only on the MSI-X code paths. Thanks,
Alex
next prev parent reply other threads:[~2023-11-03 15:52 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 01/26] PCI/MSI: Provide stubs for IMS functions Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 02/26] vfio/pci: Move PCI specific check from wrapper to PCI function Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 03/26] vfio/pci: Use unsigned int instead of unsigned Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 04/26] vfio/pci: Make core interrupt callbacks accessible to all virtual devices Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 05/26] vfio/pci: Split PCI interrupt management into front and backend Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 06/26] vfio/pci: Separate MSI and MSI-X handling Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 07/26] vfio/pci: Move interrupt eventfd to interrupt context Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 08/26] vfio/pci: Move mutex acquisition into function Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 09/26] vfio/pci: Move per-interrupt contexts to generic interrupt struct Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 10/26] vfio/pci: Move IRQ type to generic interrupt context Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 11/26] vfio/pci: Provide interrupt context to irq_is() and is_irq_none() Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 12/26] vfio/pci: Provide interrupt context to generic ops Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 13/26] vfio/pci: Provide interrupt context to vfio_msi_enable() and vfio_msi_disable() Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 14/26] vfio/pci: Let interrupt management backend interpret interrupt index Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 15/26] vfio/pci: Move generic code to frontend Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 16/26] vfio/pci: Split interrupt context initialization Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 17/26] vfio/pci: Make vfio_pci_set_irqs_ioctl() available Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 18/26] vfio/pci: Preserve per-interrupt contexts Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 19/26] vfio/pci: Store Linux IRQ number in per-interrupt context Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 20/26] vfio/pci: Separate frontend and backend code during interrupt enable/disable Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 21/26] vfio/pci: Replace backend specific calls with callbacks Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 22/26] vfio/pci: Introduce backend specific context initializer Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 23/26] vfio/pci: Support emulated interrupts Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 24/26] vfio/pci: Add core IMS support Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 25/26] vfio/pci: Add accessor for IMS index Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 26/26] vfio/pci: Support IMS cookie modification Reinette Chatre
2023-10-31 7:31 ` [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Tian, Kevin
2023-11-01 18:07 ` Alex Williamson
2023-11-02 2:51 ` Tian, Kevin
2023-11-02 3:14 ` Tian, Kevin
2023-11-02 21:13 ` Alex Williamson
2023-11-03 7:23 ` Tian, Kevin
2023-11-03 15:51 ` Alex Williamson [this message]
2023-11-07 8:29 ` Tian, Kevin
2023-11-07 19:48 ` Reinette Chatre
2023-11-07 23:06 ` Alex Williamson
2023-11-08 2:49 ` Jason Gunthorpe
2023-11-08 9:16 ` Tian, Kevin
2023-11-08 16:52 ` Reinette Chatre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231103095119.63aa796f.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=ashok.raj@intel.com \
--cc=dave.jiang@intel.com \
--cc=fenghua.yu@intel.com \
--cc=jgg@nvidia.com \
--cc=jing2.liu@intel.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=patches@lists.linux.dev \
--cc=reinette.chatre@intel.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=tom.zanussi@linux.intel.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox