public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)
@ 2023-10-27 17:00 Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 01/26] PCI/MSI: Provide stubs for IMS functions Reinette Chatre
                   ` (26 more replies)
  0 siblings, 27 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

Changes since RFC V2:
- RFC V2: https://lore.kernel.org/lkml/cover.1696609476.git.reinette.chatre@intel.com/
- Still submiting this as RFC series. I believe that this now matches the
  expectatations raised during earlier reviews. If you agree this is
  the right direction then I can drop the RFC prefix on next submission.
  If you do not agree then please do let me know where I missed
  expectations.
- First patch (PCI/MSI: Provide stubs for IMS functions)
  has been submitted upstream separately and is queued for inclusion during
  the next merge window. I do still include it in this series to avoid
  the noise about issues that bots will find when checking this series
  without it included.
  https://lore.kernel.org/lkml/169757242009.3135.5502383859327174030.tip-bot2@tip-bot2/
- Eliminated duplicate code between the PCI passthrough device backend and
  the IMS backend through more abstraction within the interrupt management
  frontend. (Kevin)
- Emulated interrupts are now managed by the interrupt management
  frontend and no longer unique to IMS. (Jason and Kevin)
- Since being an emulated interrupt is a persistent property there is
  a new functional change to PCI interrupt management in that per-interrupt
  contexts (managed by frontend) are now persistent (they remain allocated
  until device release).
- Most of the patches from RFC V2 look the same with more patches
  added to support the additional abstraction needed to eliminate the
  duplicate code. The IMS support was refactored to benefit from the
  new abstraction. Please refer to individual patches for specific changes.

Changes since RFC V1:
- RFC V1: https://lore.kernel.org/lkml/cover.1692892275.git.reinette.chatre@intel.com/
- This is a complete rewrite based on feedback from Jason and Kevin.
  Primarily the transition is to make IMS a new backend of MSI-X
  emulation: VFIO PCI transitions to be an interrupt management frontend
  with existing interrupt management for PCI passthrough devices as a
  backend and IMS interrupt management introduced as a new backend.
  The first part of the series splits VFIO PCI interrupt
  management into a "frontend" and "backend" with the existing PCI
  interrupt management as its first backend. The second part of the
  series adds IMS interrupt management as a new interrupt management
  backend.
  This is a significant change from RFC V1 as well as in the impact of
  the changes on existing VFIO PCI. This was done in response to
  feedback that I hope I understood as intended. If I did not get it
  right, please do point out to me where I went astray and I'd be
  happy to rewrite. Of course, suggestions for improvement will
  be much appreciated.

Hi Everybody,

With Interrupt Message Store (IMS) support introduced in
commit 0194425af0c8 ("PCI/MSI: Provide IMS (Interrupt Message Store)
support") a device can create a secondary interrupt domain that works
side by side with MSI-X on the same device. IMS allows for
implementation-specific interrupt storage that is managed by the
implementation specific interrupt chip associated with the IMS domain
at the time it (the IMS domain) is created for the device via
pci_create_ims_domain().

An example usage of IMS is for devices that can have their resources
assigned to guests with varying granularity. For example, an
accelerator device may support many workqueues and a single workqueue
can be composed into a virtual device for use by a guest. Using
IMS interrupts for the guest preserves MSI-X for host usage while
allowing a significantly larger number of interrupt vectors than
allowed by MSI-X. All while enabling usage of the same device driver
within the host and guest.

This series introduces IMS support to VFIO PCI for use by
virtual devices that support MSI-X interrupts that are backed by IMS
interrupts on the host. Specifically, that means that when the virtual
device's VFIO_DEVICE_SET_IRQS ioctl() receives a "trigger interrupt"
(VFIO_IRQ_SET_ACTION_TRIGGER) for a MSI-X index then VFIO PCI IMS
allocates/frees an IMS interrupt on the host.

VFIO PCI assumes that it is managing interrupts of a passthrough PCI
device. Split VFIO PCI into a "frontend" and "backend" to support
interrupt management for virtual devices that are not passthrough PCI
devices. The VFIO PCI frontend directs guest requests to the
appropriate backend. Existing interrupt management for passthrough PCI
devices is the first backend, guest MSI-X interrupts backed by
IMS interrupts on the host is the new backend (VFIO PCI IMS).

An IMS interrupt is allocated via pci_ims_alloc_irq() that requires
an implementation specific cookie that is opaque to VFIO PCI IMS. This
can be a PASID, queue ID, pointer etc. During initialization
VFIO PCI IMS learns which PCI device to operate on and what the
default cookie should be for any new interrupt allocation. VFIO PCI
IMS can also associate a unique cookie with each vector.

Guests may access a virtual device via both 'direct-path', where the
guest interacts directly with the underlying hardware, and 'intercepted
path', where the virtual device emulates operations. VFIO PCI
supports emulated interrupts (better naming suggestions are welcome) to
handle 'intercepted path' operations where completion interrupts are
signaled from the virtual device, not the underlying hardware.

This has been tested with a yet to be published VFIO driver for the
Intel Data Accelerators (IDXD) present in Intel Xeon CPUs.

While this series contains a working implementation it is presented
as an RFC with the goal to obtain feedback on whether VFIO PCI IMS
is appropriate for inclusion into VFIO and whether it is
(or could be adapted to be) appropriate for support of other
planned IMS usages you may be aware of.

Any feedback will be greatly appreciated.

Reinette

Reinette Chatre (26):
  PCI/MSI: Provide stubs for IMS functions
  vfio/pci: Move PCI specific check from wrapper to PCI function
  vfio/pci: Use unsigned int instead of unsigned
  vfio/pci: Make core interrupt callbacks accessible to all virtual
    devices
  vfio/pci: Split PCI interrupt management into front and backend
  vfio/pci: Separate MSI and MSI-X handling
  vfio/pci: Move interrupt eventfd to interrupt context
  vfio/pci: Move mutex acquisition into function
  vfio/pci: Move per-interrupt contexts to generic interrupt struct
  vfio/pci: Move IRQ type to generic interrupt context
  vfio/pci: Provide interrupt context to irq_is() and is_irq_none()
  vfio/pci: Provide interrupt context to generic ops
  vfio/pci: Provide interrupt context to vfio_msi_enable() and
    vfio_msi_disable()
  vfio/pci: Let interrupt management backend interpret interrupt index
  vfio/pci: Move generic code to frontend
  vfio/pci: Split interrupt context initialization
  vfio/pci: Make vfio_pci_set_irqs_ioctl() available
  vfio/pci: Preserve per-interrupt contexts
  vfio/pci: Store Linux IRQ number in per-interrupt context
  vfio/pci: Separate frontend and backend code during interrupt
    enable/disable
  vfio/pci: Replace backend specific calls with callbacks
  vfio/pci: Introduce backend specific context initializer
  vfio/pci: Support emulated interrupts
  vfio/pci: Add core IMS support
  vfio/pci: Add accessor for IMS index
  vfio/pci: Support IMS cookie modification

 drivers/vfio/pci/vfio_pci_config.c |   2 +-
 drivers/vfio/pci/vfio_pci_core.c   |  50 +-
 drivers/vfio/pci/vfio_pci_intrs.c  | 758 ++++++++++++++++++++++++-----
 drivers/vfio/pci/vfio_pci_priv.h   |   2 +-
 include/linux/pci.h                |  34 +-
 include/linux/vfio_pci_core.h      |  87 +++-
 6 files changed, 756 insertions(+), 177 deletions(-)


base-commit: 611da07b89fdd53f140d7b33013f255bf0ed8f34
-- 
2.34.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 01/26] PCI/MSI: Provide stubs for IMS functions
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 02/26] vfio/pci: Move PCI specific check from wrapper to PCI function Reinette Chatre
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

The IMS related functions (pci_create_ims_domain(),
pci_ims_alloc_irq(), and pci_ims_free_irq()) are not declared
when CONFIG_PCI_MSI is disabled.

Provide definitions of these functions for use when callers
are compiled with CONFIG_PCI_MSI disabled.

Fixes: 0194425af0c8 ("PCI/MSI: Provide IMS (Interrupt Message Store) support")
Fixes: c9e5bea27383 ("PCI/MSI: Provide pci_ims_alloc/free_irq()")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org      # v6.2+
---

Patch has been submitted separately and is queued for inclusion.
https://lore.kernel.org/lkml/169757242009.3135.5502383859327174030.tip-bot2@tip-bot2/
It is included in this series in support of automated testing
by bots picking series from this submission.

 include/linux/pci.h | 34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8c7c2c3c6c65..b56417276042 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1624,6 +1624,8 @@ struct msix_entry {
 	u16	entry;	/* Driver uses to specify entry, OS writes */
 };
 
+struct msi_domain_template;
+
 #ifdef CONFIG_PCI_MSI
 int pci_msi_vec_count(struct pci_dev *dev);
 void pci_disable_msi(struct pci_dev *dev);
@@ -1656,6 +1658,11 @@ void pci_msix_free_irq(struct pci_dev *pdev, struct msi_map map);
 void pci_free_irq_vectors(struct pci_dev *dev);
 int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
 const struct cpumask *pci_irq_get_affinity(struct pci_dev *pdev, int vec);
+bool pci_create_ims_domain(struct pci_dev *pdev, const struct msi_domain_template *template,
+			   unsigned int hwsize, void *data);
+struct msi_map pci_ims_alloc_irq(struct pci_dev *pdev, union msi_instance_cookie *icookie,
+				 const struct irq_affinity_desc *affdesc);
+void pci_ims_free_irq(struct pci_dev *pdev, struct msi_map map);
 
 #else
 static inline int pci_msi_vec_count(struct pci_dev *dev) { return -ENOSYS; }
@@ -1719,6 +1726,25 @@ static inline const struct cpumask *pci_irq_get_affinity(struct pci_dev *pdev,
 {
 	return cpu_possible_mask;
 }
+
+static inline bool pci_create_ims_domain(struct pci_dev *pdev,
+					 const struct msi_domain_template *template,
+					 unsigned int hwsize, void *data)
+{ return false; }
+
+static inline struct msi_map pci_ims_alloc_irq(struct pci_dev *pdev,
+					       union msi_instance_cookie *icookie,
+					       const struct irq_affinity_desc *affdesc)
+{
+	struct msi_map map = { .index = -ENOSYS, };
+
+	return map;
+}
+
+static inline void pci_ims_free_irq(struct pci_dev *pdev, struct msi_map map)
+{
+}
+
 #endif
 
 /**
@@ -2616,14 +2642,6 @@ static inline bool pci_is_thunderbolt_attached(struct pci_dev *pdev)
 void pci_uevent_ers(struct pci_dev *pdev, enum  pci_ers_result err_type);
 #endif
 
-struct msi_domain_template;
-
-bool pci_create_ims_domain(struct pci_dev *pdev, const struct msi_domain_template *template,
-			   unsigned int hwsize, void *data);
-struct msi_map pci_ims_alloc_irq(struct pci_dev *pdev, union msi_instance_cookie *icookie,
-				 const struct irq_affinity_desc *affdesc);
-void pci_ims_free_irq(struct pci_dev *pdev, struct msi_map map);
-
 #include <linux/dma-mapping.h>
 
 #define pci_printk(level, pdev, fmt, arg...) \
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 02/26] vfio/pci: Move PCI specific check from wrapper to PCI function
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 01/26] PCI/MSI: Provide stubs for IMS functions Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 03/26] vfio/pci: Use unsigned int instead of unsigned Reinette Chatre
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

vfio_pci_set_irqs_ioctl() uses a PCI device specific check to
determine if PCI specific vfio_pci_set_err_trigger() should be
called.

Move the PCI device specific check into PCI specific
vfio_pci_set_err_trigger() to make it easier for
vfio_pci_set_irqs_ioctl() to become a frontend for interrupt
backends for PCI devices as well as virtual devices.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
No changes since RFC V2.

 drivers/vfio/pci/vfio_pci_intrs.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index cbb4bcbfbf83..b5b1c09bef25 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -758,6 +758,9 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_core_device *vdev,
 				    unsigned index, unsigned start,
 				    unsigned count, uint32_t flags, void *data)
 {
+	if (!pci_is_pcie(vdev->pdev))
+		return -ENOTTY;
+
 	if (index != VFIO_PCI_ERR_IRQ_INDEX || start != 0 || count > 1)
 		return -EINVAL;
 
@@ -813,8 +816,7 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
 	case VFIO_PCI_ERR_IRQ_INDEX:
 		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
 		case VFIO_IRQ_SET_ACTION_TRIGGER:
-			if (pci_is_pcie(vdev->pdev))
-				func = vfio_pci_set_err_trigger;
+			func = vfio_pci_set_err_trigger;
 			break;
 		}
 		break;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 03/26] vfio/pci: Use unsigned int instead of unsigned
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 01/26] PCI/MSI: Provide stubs for IMS functions Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 02/26] vfio/pci: Move PCI specific check from wrapper to PCI function Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 04/26] vfio/pci: Make core interrupt callbacks accessible to all virtual devices Reinette Chatre
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

checkpatch.pl warns about usage of bare unsigned.

Change unsigned to unsigned int as a preparatory change
to avoid checkpatch.pl producing several warnings as
the work adding support for backends to VFIO interrupt
management progress.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- Include vfio_msi_set_block() in changes.

Note to maintainers:
After this change checkpatch.pl still has a few complaints
about existing code using int32_t instead of s32. This was
not changed and these warnings remain.

 drivers/vfio/pci/vfio_pci_intrs.c | 42 ++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index b5b1c09bef25..9f4f3ab48f87 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -503,8 +503,9 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 	return ret;
 }
 
-static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
-			      unsigned count, int32_t *fds, bool msix)
+static int vfio_msi_set_block(struct vfio_pci_core_device *vdev,
+			      unsigned int start, unsigned int count,
+			      int32_t *fds, bool msix)
 {
 	unsigned int i, j;
 	int ret = 0;
@@ -553,8 +554,9 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
  * IOCTL support
  */
 static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
-				    unsigned index, unsigned start,
-				    unsigned count, uint32_t flags, void *data)
+				    unsigned int index, unsigned int start,
+				    unsigned int count, uint32_t flags,
+				    void *data)
 {
 	if (!is_intx(vdev) || start != 0 || count != 1)
 		return -EINVAL;
@@ -584,8 +586,8 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
 }
 
 static int vfio_pci_set_intx_mask(struct vfio_pci_core_device *vdev,
-				  unsigned index, unsigned start,
-				  unsigned count, uint32_t flags, void *data)
+				  unsigned int index, unsigned int start,
+				  unsigned int count, uint32_t flags, void *data)
 {
 	if (!is_intx(vdev) || start != 0 || count != 1)
 		return -EINVAL;
@@ -604,8 +606,9 @@ static int vfio_pci_set_intx_mask(struct vfio_pci_core_device *vdev,
 }
 
 static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
-				     unsigned index, unsigned start,
-				     unsigned count, uint32_t flags, void *data)
+				     unsigned int index, unsigned int start,
+				     unsigned int count, uint32_t flags,
+				     void *data)
 {
 	if (is_intx(vdev) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
 		vfio_intx_disable(vdev);
@@ -647,8 +650,9 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
 }
 
 static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
-				    unsigned index, unsigned start,
-				    unsigned count, uint32_t flags, void *data)
+				    unsigned int index, unsigned int start,
+				    unsigned int count, uint32_t flags,
+				    void *data)
 {
 	struct vfio_pci_irq_ctx *ctx;
 	unsigned int i;
@@ -755,8 +759,9 @@ static int vfio_pci_set_ctx_trigger_single(struct eventfd_ctx **ctx,
 }
 
 static int vfio_pci_set_err_trigger(struct vfio_pci_core_device *vdev,
-				    unsigned index, unsigned start,
-				    unsigned count, uint32_t flags, void *data)
+				    unsigned int index, unsigned int start,
+				    unsigned int count, uint32_t flags,
+				    void *data)
 {
 	if (!pci_is_pcie(vdev->pdev))
 		return -ENOTTY;
@@ -769,8 +774,9 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_core_device *vdev,
 }
 
 static int vfio_pci_set_req_trigger(struct vfio_pci_core_device *vdev,
-				    unsigned index, unsigned start,
-				    unsigned count, uint32_t flags, void *data)
+				    unsigned int index, unsigned int start,
+				    unsigned int count, uint32_t flags,
+				    void *data)
 {
 	if (index != VFIO_PCI_REQ_IRQ_INDEX || start != 0 || count > 1)
 		return -EINVAL;
@@ -780,11 +786,11 @@ static int vfio_pci_set_req_trigger(struct vfio_pci_core_device *vdev,
 }
 
 int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
-			    unsigned index, unsigned start, unsigned count,
-			    void *data)
+			    unsigned int index, unsigned int start,
+			    unsigned int count, void *data)
 {
-	int (*func)(struct vfio_pci_core_device *vdev, unsigned index,
-		    unsigned start, unsigned count, uint32_t flags,
+	int (*func)(struct vfio_pci_core_device *vdev, unsigned int index,
+		    unsigned int start, unsigned int count, uint32_t flags,
 		    void *data) = NULL;
 
 	switch (index) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 04/26] vfio/pci: Make core interrupt callbacks accessible to all virtual devices
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (2 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 03/26] vfio/pci: Use unsigned int instead of unsigned Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 05/26] vfio/pci: Split PCI interrupt management into front and backend Reinette Chatre
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

The functions handling actions on interrupts for a virtual PCI device
triggered by VFIO_DEVICE_SET_IRQS ioctl() expect to act on a passthrough
PCI device represented by a struct vfio_pci_core_device.

A virtual device can support MSI-X while not being a passthrough PCI
device and thus not be represented by a struct vfio_pci_core_device.

To support MSI-X in virtual devices it needs to be possible for
their drivers to interact with the MSI-X interrupt management and
thus the interrupt management should not require struct
vfio_pci_core_device.

Introduce struct vfio_pci_intr_ctx that will contain a virtual device's
interrupt context to be managed by an interrupt management backend.
The first supported backend is the existing PCI device interrupt
management. Modify the core VFIO PCI interrupt management functions
to expect this structure. As a backend managing interrupts of
passthrough PCI devices the existing VFIO PCI functions continue to
operate on an actual PCI device represented by
struct vfio_pci_core_device that is provided via a private pointer.

More members will be added to struct vfio_pci_intr_ctx as members
unique to interrupt context are transitioned from struct
vfio_pci_core_device.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- Improve changelog and comments.

 drivers/vfio/pci/vfio_pci_core.c  |  7 ++++---
 drivers/vfio/pci/vfio_pci_intrs.c | 29 ++++++++++++++++++++---------
 drivers/vfio/pci/vfio_pci_priv.h  |  2 +-
 include/linux/vfio_pci_core.h     |  9 +++++++++
 4 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1929103ee59a..bb8181444c41 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -594,7 +594,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
 	/* Stop the device from further DMA */
 	pci_clear_master(pdev);
 
-	vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE |
+	vfio_pci_set_irqs_ioctl(&vdev->intr_ctx, VFIO_IRQ_SET_DATA_NONE |
 				VFIO_IRQ_SET_ACTION_TRIGGER,
 				vdev->irq_type, 0, 0, NULL);
 
@@ -1216,8 +1216,8 @@ static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev,
 
 	mutex_lock(&vdev->igate);
 
-	ret = vfio_pci_set_irqs_ioctl(vdev, hdr.flags, hdr.index, hdr.start,
-				      hdr.count, data);
+	ret = vfio_pci_set_irqs_ioctl(&vdev->intr_ctx, hdr.flags, hdr.index,
+				      hdr.start, hdr.count, data);
 
 	mutex_unlock(&vdev->igate);
 	kfree(data);
@@ -2166,6 +2166,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
 	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
 	init_rwsem(&vdev->memory_lock);
 	xa_init(&vdev->ctx);
+	vdev->intr_ctx.priv = vdev;
 
 	return 0;
 }
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 9f4f3ab48f87..af1053873eaa 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -553,11 +553,13 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
 /*
  * IOCTL support
  */
-static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
+static int vfio_pci_set_intx_unmask(struct vfio_pci_intr_ctx *intr_ctx,
 				    unsigned int index, unsigned int start,
 				    unsigned int count, uint32_t flags,
 				    void *data)
 {
+	struct vfio_pci_core_device *vdev = intr_ctx->priv;
+
 	if (!is_intx(vdev) || start != 0 || count != 1)
 		return -EINVAL;
 
@@ -585,10 +587,12 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
 	return 0;
 }
 
-static int vfio_pci_set_intx_mask(struct vfio_pci_core_device *vdev,
+static int vfio_pci_set_intx_mask(struct vfio_pci_intr_ctx *intr_ctx,
 				  unsigned int index, unsigned int start,
 				  unsigned int count, uint32_t flags, void *data)
 {
+	struct vfio_pci_core_device *vdev = intr_ctx->priv;
+
 	if (!is_intx(vdev) || start != 0 || count != 1)
 		return -EINVAL;
 
@@ -605,11 +609,13 @@ static int vfio_pci_set_intx_mask(struct vfio_pci_core_device *vdev,
 	return 0;
 }
 
-static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
+static int vfio_pci_set_intx_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 				     unsigned int index, unsigned int start,
 				     unsigned int count, uint32_t flags,
 				     void *data)
 {
+	struct vfio_pci_core_device *vdev = intr_ctx->priv;
+
 	if (is_intx(vdev) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
 		vfio_intx_disable(vdev);
 		return 0;
@@ -649,11 +655,12 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
 	return 0;
 }
 
-static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
+static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 				    unsigned int index, unsigned int start,
 				    unsigned int count, uint32_t flags,
 				    void *data)
 {
+	struct vfio_pci_core_device *vdev = intr_ctx->priv;
 	struct vfio_pci_irq_ctx *ctx;
 	unsigned int i;
 	bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
@@ -758,11 +765,13 @@ static int vfio_pci_set_ctx_trigger_single(struct eventfd_ctx **ctx,
 	return -EINVAL;
 }
 
-static int vfio_pci_set_err_trigger(struct vfio_pci_core_device *vdev,
+static int vfio_pci_set_err_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 				    unsigned int index, unsigned int start,
 				    unsigned int count, uint32_t flags,
 				    void *data)
 {
+	struct vfio_pci_core_device *vdev = intr_ctx->priv;
+
 	if (!pci_is_pcie(vdev->pdev))
 		return -ENOTTY;
 
@@ -773,11 +782,13 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_core_device *vdev,
 					       count, flags, data);
 }
 
-static int vfio_pci_set_req_trigger(struct vfio_pci_core_device *vdev,
+static int vfio_pci_set_req_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 				    unsigned int index, unsigned int start,
 				    unsigned int count, uint32_t flags,
 				    void *data)
 {
+	struct vfio_pci_core_device *vdev = intr_ctx->priv;
+
 	if (index != VFIO_PCI_REQ_IRQ_INDEX || start != 0 || count > 1)
 		return -EINVAL;
 
@@ -785,11 +796,11 @@ static int vfio_pci_set_req_trigger(struct vfio_pci_core_device *vdev,
 					       count, flags, data);
 }
 
-int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
+int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 			    unsigned int index, unsigned int start,
 			    unsigned int count, void *data)
 {
-	int (*func)(struct vfio_pci_core_device *vdev, unsigned int index,
+	int (*func)(struct vfio_pci_intr_ctx *intr_ctx, unsigned int index,
 		    unsigned int start, unsigned int count, uint32_t flags,
 		    void *data) = NULL;
 
@@ -838,5 +849,5 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
 	if (!func)
 		return -ENOTTY;
 
-	return func(vdev, index, start, count, flags, data);
+	return func(intr_ctx, index, start, count, flags, data);
 }
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index 5e4fa69aee16..6dddcfe7ab19 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -26,7 +26,7 @@ struct vfio_pci_ioeventfd {
 bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev);
 void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev);
 
-int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
+int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 			    unsigned index, unsigned start, unsigned count,
 			    void *data);
 
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 562e8754869d..38355a4817fd 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -49,6 +49,14 @@ struct vfio_pci_region {
 	u32				flags;
 };
 
+/*
+ * Interrupt context of virtual PCI device
+ * @priv:		Private data of interrupt management backend
+ */
+struct vfio_pci_intr_ctx {
+	void				*priv;
+};
+
 struct vfio_pci_core_device {
 	struct vfio_device	vdev;
 	struct pci_dev		*pdev;
@@ -96,6 +104,7 @@ struct vfio_pci_core_device {
 	struct mutex		vma_lock;
 	struct list_head	vma_list;
 	struct rw_semaphore	memory_lock;
+	struct vfio_pci_intr_ctx	intr_ctx;
 };
 
 /* Will be exported for vfio pci drivers usage */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 05/26] vfio/pci: Split PCI interrupt management into front and backend
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (3 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 04/26] vfio/pci: Make core interrupt callbacks accessible to all virtual devices Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 06/26] vfio/pci: Separate MSI and MSI-X handling Reinette Chatre
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

VFIO PCI interrupt management supports passthrough PCI devices with an
interrupt in the guest backed by the same type of interrupt on the PCI
device.

Interrupt management can be more flexible. An interrupt in the guest
may be backed by a different type of interrupt on the host, for example
MSI-X in guest can be backed by IMS on the host, or not backed by a
device interrupt at all when the interrupt is emulated by the virtual
device driver.

The main entry to guest interrupt management is via the
VFIO_DEVICE_SET_IRQS ioctl(). By default the work is passed to interrupt
management for PCI devices with the PCI specific functions called
directly.

Make the ioctl() configurable to support different interrupt management
backends. This is accomplished by introducing interrupt context specific
callbacks that are initialized by the virtual device driver and then
triggered via the ioctl().

The introduction of virtual device driver specific callbacks require its
initialization. Create a dedicated interrupt context initialization
function to avoid mixing more interrupt context initialization with
general virtual device driver initialization.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- Improve changelog and comments.
- Make vfio_pci_intr_ops static.

 drivers/vfio/pci/vfio_pci_core.c  |  2 +-
 drivers/vfio/pci/vfio_pci_intrs.c | 35 +++++++++++++++++++++++++------
 include/linux/vfio_pci_core.h     | 25 ++++++++++++++++++++++
 3 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index bb8181444c41..310259bbacae 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2166,7 +2166,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
 	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
 	init_rwsem(&vdev->memory_lock);
 	xa_init(&vdev->ctx);
-	vdev->intr_ctx.priv = vdev;
+	vfio_pci_init_intr_ctx(vdev, &vdev->intr_ctx);
 
 	return 0;
 }
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index af1053873eaa..96587acfebf0 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -796,6 +796,23 @@ static int vfio_pci_set_req_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 					       count, flags, data);
 }
 
+static struct vfio_pci_intr_ops vfio_pci_intr_ops = {
+	.set_intx_mask = vfio_pci_set_intx_mask,
+	.set_intx_unmask = vfio_pci_set_intx_unmask,
+	.set_intx_trigger = vfio_pci_set_intx_trigger,
+	.set_msi_trigger = vfio_pci_set_msi_trigger,
+	.set_err_trigger = vfio_pci_set_err_trigger,
+	.set_req_trigger = vfio_pci_set_req_trigger,
+};
+
+void vfio_pci_init_intr_ctx(struct vfio_pci_core_device *vdev,
+			    struct vfio_pci_intr_ctx *intr_ctx)
+{
+	intr_ctx->ops = &vfio_pci_intr_ops;
+	intr_ctx->priv = vdev;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_init_intr_ctx);
+
 int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 			    unsigned int index, unsigned int start,
 			    unsigned int count, void *data)
@@ -808,13 +825,16 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 	case VFIO_PCI_INTX_IRQ_INDEX:
 		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
 		case VFIO_IRQ_SET_ACTION_MASK:
-			func = vfio_pci_set_intx_mask;
+			if (intr_ctx->ops->set_intx_mask)
+				func = intr_ctx->ops->set_intx_mask;
 			break;
 		case VFIO_IRQ_SET_ACTION_UNMASK:
-			func = vfio_pci_set_intx_unmask;
+			if (intr_ctx->ops->set_intx_unmask)
+				func = intr_ctx->ops->set_intx_unmask;
 			break;
 		case VFIO_IRQ_SET_ACTION_TRIGGER:
-			func = vfio_pci_set_intx_trigger;
+			if (intr_ctx->ops->set_intx_trigger)
+				func = intr_ctx->ops->set_intx_trigger;
 			break;
 		}
 		break;
@@ -826,21 +846,24 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 			/* XXX Need masking support exported */
 			break;
 		case VFIO_IRQ_SET_ACTION_TRIGGER:
-			func = vfio_pci_set_msi_trigger;
+			if (intr_ctx->ops->set_msi_trigger)
+				func = intr_ctx->ops->set_msi_trigger;
 			break;
 		}
 		break;
 	case VFIO_PCI_ERR_IRQ_INDEX:
 		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
 		case VFIO_IRQ_SET_ACTION_TRIGGER:
-			func = vfio_pci_set_err_trigger;
+			if (intr_ctx->ops->set_err_trigger)
+				func = intr_ctx->ops->set_err_trigger;
 			break;
 		}
 		break;
 	case VFIO_PCI_REQ_IRQ_INDEX:
 		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
 		case VFIO_IRQ_SET_ACTION_TRIGGER:
-			func = vfio_pci_set_req_trigger;
+			if (intr_ctx->ops->set_req_trigger)
+				func = intr_ctx->ops->set_req_trigger;
 			break;
 		}
 		break;
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 38355a4817fd..d3fa0e49a1a8 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -51,12 +51,35 @@ struct vfio_pci_region {
 
 /*
  * Interrupt context of virtual PCI device
+ * @ops:		Interrupt management backend functions
  * @priv:		Private data of interrupt management backend
  */
 struct vfio_pci_intr_ctx {
+	const struct vfio_pci_intr_ops	*ops;
 	void				*priv;
 };
 
+struct vfio_pci_intr_ops {
+	int (*set_intx_mask)(struct vfio_pci_intr_ctx *intr_ctx,
+			     unsigned int index, unsigned int start,
+			     unsigned int count, uint32_t flags, void *data);
+	int (*set_intx_unmask)(struct vfio_pci_intr_ctx *intr_ctx,
+			       unsigned int index, unsigned int start,
+			       unsigned int count, uint32_t flags, void *data);
+	int (*set_intx_trigger)(struct vfio_pci_intr_ctx *intr_ctx,
+				unsigned int index, unsigned int start,
+				unsigned int count, uint32_t flags, void *data);
+	int (*set_msi_trigger)(struct vfio_pci_intr_ctx *intr_ctx,
+			       unsigned int index, unsigned int start,
+			       unsigned int count, uint32_t flags, void *data);
+	int (*set_err_trigger)(struct vfio_pci_intr_ctx *intr_ctx,
+			       unsigned int index, unsigned int start,
+			       unsigned int count, uint32_t flags, void *data);
+	int (*set_req_trigger)(struct vfio_pci_intr_ctx *intr_ctx,
+			       unsigned int index, unsigned int start,
+			       unsigned int count, uint32_t flags, void *data);
+};
+
 struct vfio_pci_core_device {
 	struct vfio_device	vdev;
 	struct pci_dev		*pdev;
@@ -124,6 +147,8 @@ int vfio_pci_core_sriov_configure(struct vfio_pci_core_device *vdev,
 				  int nr_virtfn);
 long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
 		unsigned long arg);
+void vfio_pci_init_intr_ctx(struct vfio_pci_core_device *vdev,
+			    struct vfio_pci_intr_ctx *intr_ctx);
 int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
 				void __user *arg, size_t argsz);
 ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 06/26] vfio/pci: Separate MSI and MSI-X handling
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (4 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 05/26] vfio/pci: Split PCI interrupt management into front and backend Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 07/26] vfio/pci: Move interrupt eventfd to interrupt context Reinette Chatre
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

VFIO PCI interrupt management uses a single entry for both
MSI and MSI-X management with the called functions using a boolean
when necessary to distinguish between MSI and MSI-X. This remains
unchanged.

Virtual device interrupt management should not be required to
use the same callback for both MSI and MSI-X. It may be possible
for a virtual device to not support MSI at all and only
provide MSI-X interrupt management.

Separate the MSI and MSI-X interrupt management by allowing
different callbacks for each interrupt type. For PCI devices
the callback remains the same.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
No changes since RFC V2.

 drivers/vfio/pci/vfio_pci_intrs.c | 14 +++++++++++++-
 include/linux/vfio_pci_core.h     |  3 +++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 96587acfebf0..7de906363402 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -801,6 +801,7 @@ static struct vfio_pci_intr_ops vfio_pci_intr_ops = {
 	.set_intx_unmask = vfio_pci_set_intx_unmask,
 	.set_intx_trigger = vfio_pci_set_intx_trigger,
 	.set_msi_trigger = vfio_pci_set_msi_trigger,
+	.set_msix_trigger = vfio_pci_set_msi_trigger,
 	.set_err_trigger = vfio_pci_set_err_trigger,
 	.set_req_trigger = vfio_pci_set_req_trigger,
 };
@@ -839,7 +840,6 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 		}
 		break;
 	case VFIO_PCI_MSI_IRQ_INDEX:
-	case VFIO_PCI_MSIX_IRQ_INDEX:
 		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
 		case VFIO_IRQ_SET_ACTION_MASK:
 		case VFIO_IRQ_SET_ACTION_UNMASK:
@@ -851,6 +851,18 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 			break;
 		}
 		break;
+	case VFIO_PCI_MSIX_IRQ_INDEX:
+		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
+		case VFIO_IRQ_SET_ACTION_MASK:
+		case VFIO_IRQ_SET_ACTION_UNMASK:
+			/* XXX Need masking support exported */
+			break;
+		case VFIO_IRQ_SET_ACTION_TRIGGER:
+			if (intr_ctx->ops->set_msix_trigger)
+				func = intr_ctx->ops->set_msix_trigger;
+			break;
+		}
+		break;
 	case VFIO_PCI_ERR_IRQ_INDEX:
 		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
 		case VFIO_IRQ_SET_ACTION_TRIGGER:
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index d3fa0e49a1a8..db7ee9517d94 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -72,6 +72,9 @@ struct vfio_pci_intr_ops {
 	int (*set_msi_trigger)(struct vfio_pci_intr_ctx *intr_ctx,
 			       unsigned int index, unsigned int start,
 			       unsigned int count, uint32_t flags, void *data);
+	int (*set_msix_trigger)(struct vfio_pci_intr_ctx *intr_ctx,
+				unsigned int index, unsigned int start,
+				unsigned int count, uint32_t flags, void *data);
 	int (*set_err_trigger)(struct vfio_pci_intr_ctx *intr_ctx,
 			       unsigned int index, unsigned int start,
 			       unsigned int count, uint32_t flags, void *data);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 07/26] vfio/pci: Move interrupt eventfd to interrupt context
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (5 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 06/26] vfio/pci: Separate MSI and MSI-X handling Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 08/26] vfio/pci: Move mutex acquisition into function Reinette Chatre
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

The eventfds associated with device request notification and
error IRQ are managed by VFIO PCI interrupt management as
triggered by the VFIO_DEVICE_SET_IRQS ioctl().

Move these eventfds as well as their mutex to the generic and
dedicated interrupt management context struct vfio_pci_intr_ctx
to enable another interrupt management backend to manage these
eventfd.

igate mutex protects eventfd modification. With the eventfd
within the bigger scoped interrupt context the mutex scope is
also expanded to mean that all members of struct
vfio_pci_intr_ctx are protected by it.

This move results in the vfio_pci_set_req_trigger() to
no longer require a struct vfio_pci_core_device, operating
just on the generic struct vfio_pci_intr_ctx, and thus
available for direct use by other interrupt management
backends.

This introduces the first interrupt context related
cleanup call. Create vfio_pci_release_intr_ctx()
to match existing vfio_pci_init_intr_ctx().

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- Improve changelog.

 drivers/vfio/pci/vfio_pci_core.c  | 39 +++++++++++++++----------------
 drivers/vfio/pci/vfio_pci_intrs.c | 13 +++++++----
 include/linux/vfio_pci_core.h     | 10 +++++---
 3 files changed, 35 insertions(+), 27 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 310259bbacae..5c9bd5d2db53 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -700,16 +700,16 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
 #endif
 	vfio_pci_core_disable(vdev);
 
-	mutex_lock(&vdev->igate);
-	if (vdev->err_trigger) {
-		eventfd_ctx_put(vdev->err_trigger);
-		vdev->err_trigger = NULL;
+	mutex_lock(&vdev->intr_ctx.igate);
+	if (vdev->intr_ctx.err_trigger) {
+		eventfd_ctx_put(vdev->intr_ctx.err_trigger);
+		vdev->intr_ctx.err_trigger = NULL;
 	}
-	if (vdev->req_trigger) {
-		eventfd_ctx_put(vdev->req_trigger);
-		vdev->req_trigger = NULL;
+	if (vdev->intr_ctx.req_trigger) {
+		eventfd_ctx_put(vdev->intr_ctx.req_trigger);
+		vdev->intr_ctx.req_trigger = NULL;
 	}
-	mutex_unlock(&vdev->igate);
+	mutex_unlock(&vdev->intr_ctx.igate);
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_close_device);
 
@@ -1214,12 +1214,12 @@ static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev,
 			return PTR_ERR(data);
 	}
 
-	mutex_lock(&vdev->igate);
+	mutex_lock(&vdev->intr_ctx.igate);
 
 	ret = vfio_pci_set_irqs_ioctl(&vdev->intr_ctx, hdr.flags, hdr.index,
 				      hdr.start, hdr.count, data);
 
-	mutex_unlock(&vdev->igate);
+	mutex_unlock(&vdev->intr_ctx.igate);
 	kfree(data);
 
 	return ret;
@@ -1876,20 +1876,20 @@ void vfio_pci_core_request(struct vfio_device *core_vdev, unsigned int count)
 		container_of(core_vdev, struct vfio_pci_core_device, vdev);
 	struct pci_dev *pdev = vdev->pdev;
 
-	mutex_lock(&vdev->igate);
+	mutex_lock(&vdev->intr_ctx.igate);
 
-	if (vdev->req_trigger) {
+	if (vdev->intr_ctx.req_trigger) {
 		if (!(count % 10))
 			pci_notice_ratelimited(pdev,
 				"Relaying device request to user (#%u)\n",
 				count);
-		eventfd_signal(vdev->req_trigger, 1);
+		eventfd_signal(vdev->intr_ctx.req_trigger, 1);
 	} else if (count == 0) {
 		pci_warn(pdev,
 			"No device request channel registered, blocked until released by user\n");
 	}
 
-	mutex_unlock(&vdev->igate);
+	mutex_unlock(&vdev->intr_ctx.igate);
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_request);
 
@@ -2156,7 +2156,6 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
 
 	vdev->pdev = to_pci_dev(core_vdev->dev);
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
-	mutex_init(&vdev->igate);
 	spin_lock_init(&vdev->irqlock);
 	mutex_init(&vdev->ioeventfds_lock);
 	INIT_LIST_HEAD(&vdev->dummy_resources_list);
@@ -2177,7 +2176,7 @@ void vfio_pci_core_release_dev(struct vfio_device *core_vdev)
 	struct vfio_pci_core_device *vdev =
 		container_of(core_vdev, struct vfio_pci_core_device, vdev);
 
-	mutex_destroy(&vdev->igate);
+	vfio_pci_release_intr_ctx(&vdev->intr_ctx);
 	mutex_destroy(&vdev->ioeventfds_lock);
 	mutex_destroy(&vdev->vma_lock);
 	kfree(vdev->region);
@@ -2300,12 +2299,12 @@ pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
 {
 	struct vfio_pci_core_device *vdev = dev_get_drvdata(&pdev->dev);
 
-	mutex_lock(&vdev->igate);
+	mutex_lock(&vdev->intr_ctx.igate);
 
-	if (vdev->err_trigger)
-		eventfd_signal(vdev->err_trigger, 1);
+	if (vdev->intr_ctx.err_trigger)
+		eventfd_signal(vdev->intr_ctx.err_trigger, 1);
 
-	mutex_unlock(&vdev->igate);
+	mutex_unlock(&vdev->intr_ctx.igate);
 
 	return PCI_ERS_RESULT_CAN_RECOVER;
 }
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 7de906363402..a4c8b589c87b 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -778,7 +778,7 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 	if (index != VFIO_PCI_ERR_IRQ_INDEX || start != 0 || count > 1)
 		return -EINVAL;
 
-	return vfio_pci_set_ctx_trigger_single(&vdev->err_trigger,
+	return vfio_pci_set_ctx_trigger_single(&intr_ctx->err_trigger,
 					       count, flags, data);
 }
 
@@ -787,12 +787,10 @@ static int vfio_pci_set_req_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 				    unsigned int count, uint32_t flags,
 				    void *data)
 {
-	struct vfio_pci_core_device *vdev = intr_ctx->priv;
-
 	if (index != VFIO_PCI_REQ_IRQ_INDEX || start != 0 || count > 1)
 		return -EINVAL;
 
-	return vfio_pci_set_ctx_trigger_single(&vdev->req_trigger,
+	return vfio_pci_set_ctx_trigger_single(&intr_ctx->req_trigger,
 					       count, flags, data);
 }
 
@@ -811,9 +809,16 @@ void vfio_pci_init_intr_ctx(struct vfio_pci_core_device *vdev,
 {
 	intr_ctx->ops = &vfio_pci_intr_ops;
 	intr_ctx->priv = vdev;
+	mutex_init(&intr_ctx->igate);
 }
 EXPORT_SYMBOL_GPL(vfio_pci_init_intr_ctx);
 
+void vfio_pci_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)
+{
+	mutex_destroy(&intr_ctx->igate);
+}
+EXPORT_SYMBOL_GPL(vfio_pci_release_intr_ctx);
+
 int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 			    unsigned int index, unsigned int start,
 			    unsigned int count, void *data)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index db7ee9517d94..1eb5842cff11 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -53,10 +53,16 @@ struct vfio_pci_region {
  * Interrupt context of virtual PCI device
  * @ops:		Interrupt management backend functions
  * @priv:		Private data of interrupt management backend
+ * @igate:		Protects members of struct vfio_pci_intr_ctx
+ * @err_trigger:	Eventfd associated with error reporting IRQ
+ * @req_trigger:	Eventfd associated with device request notification
  */
 struct vfio_pci_intr_ctx {
 	const struct vfio_pci_intr_ops	*ops;
 	void				*priv;
+	struct mutex			igate;
+	struct eventfd_ctx		*err_trigger;
+	struct eventfd_ctx		*req_trigger;
 };
 
 struct vfio_pci_intr_ops {
@@ -92,7 +98,6 @@ struct vfio_pci_core_device {
 	u8			*vconfig;
 	struct perm_bits	*msi_perm;
 	spinlock_t		irqlock;
-	struct mutex		igate;
 	struct xarray		ctx;
 	int			irq_type;
 	int			num_regions;
@@ -117,8 +122,6 @@ struct vfio_pci_core_device {
 	struct pci_saved_state	*pci_saved_state;
 	struct pci_saved_state	*pm_save;
 	int			ioeventfds_nr;
-	struct eventfd_ctx	*err_trigger;
-	struct eventfd_ctx	*req_trigger;
 	struct eventfd_ctx	*pm_wake_eventfd_ctx;
 	struct list_head	dummy_resources_list;
 	struct mutex		ioeventfds_lock;
@@ -152,6 +155,7 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
 		unsigned long arg);
 void vfio_pci_init_intr_ctx(struct vfio_pci_core_device *vdev,
 			    struct vfio_pci_intr_ctx *intr_ctx);
+void vfio_pci_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx);
 int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
 				void __user *arg, size_t argsz);
 ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 08/26] vfio/pci: Move mutex acquisition into function
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (6 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 07/26] vfio/pci: Move interrupt eventfd to interrupt context Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 09/26] vfio/pci: Move per-interrupt contexts to generic interrupt struct Reinette Chatre
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

vfio_pci_set_irqs_ioctl() is the entrypoint for interrupt management
via the VFIO_DEVICE_SET_IRQS ioctl(). vfio_pci_set_irqs_ioctl() can
be called from a virtual device driver after its callbacks have been
configured to support the needed interrupt management.

The igate mutex is obtained before vfio_pci_set_irqs_ioctl() to protect
against concurrent changes to interrupt context. It should not be
necessary for all users of vfio_pci_set_irqs_ioctl() to remember to
take the mutex.

Acquire and release the mutex within vfio_pci_set_irqs_ioctl().

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- Improve changelog.

 drivers/vfio/pci/vfio_pci_core.c  |  2 --
 drivers/vfio/pci/vfio_pci_intrs.c | 10 ++++++++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 5c9bd5d2db53..bf4de137ad2f 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1214,12 +1214,10 @@ static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev,
 			return PTR_ERR(data);
 	}
 
-	mutex_lock(&vdev->intr_ctx.igate);
 
 	ret = vfio_pci_set_irqs_ioctl(&vdev->intr_ctx, hdr.flags, hdr.index,
 				      hdr.start, hdr.count, data);
 
-	mutex_unlock(&vdev->intr_ctx.igate);
 	kfree(data);
 
 	return ret;
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index a4c8b589c87b..5d600548b5d7 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -826,7 +826,9 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 	int (*func)(struct vfio_pci_intr_ctx *intr_ctx, unsigned int index,
 		    unsigned int start, unsigned int count, uint32_t flags,
 		    void *data) = NULL;
+	int ret = -ENOTTY;
 
+	mutex_lock(&intr_ctx->igate);
 	switch (index) {
 	case VFIO_PCI_INTX_IRQ_INDEX:
 		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
@@ -887,7 +889,11 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 	}
 
 	if (!func)
-		return -ENOTTY;
+		goto out_unlock;
+
+	ret = func(intr_ctx, index, start, count, flags, data);
 
-	return func(intr_ctx, index, start, count, flags, data);
+out_unlock:
+	mutex_unlock(&intr_ctx->igate);
+	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 09/26] vfio/pci: Move per-interrupt contexts to generic interrupt struct
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (7 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 08/26] vfio/pci: Move mutex acquisition into function Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 10/26] vfio/pci: Move IRQ type to generic interrupt context Reinette Chatre
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

VFIO PCI interrupt management maintains per-interrupt context within an
xarray using the interrupt vector as index.

Move the per-interrupt context to the generic interrupt context in
struct vfio_pci_intr_ctx to enable the per-interrupt context to be
managed by different backends.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- Improve changelog.

 drivers/vfio/pci/vfio_pci_core.c  | 1 -
 drivers/vfio/pci/vfio_pci_intrs.c | 9 +++++----
 include/linux/vfio_pci_core.h     | 3 ++-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index bf4de137ad2f..cf303a9555f0 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2162,7 +2162,6 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
 	INIT_LIST_HEAD(&vdev->vma_list);
 	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
 	init_rwsem(&vdev->memory_lock);
-	xa_init(&vdev->ctx);
 	vfio_pci_init_intr_ctx(vdev, &vdev->intr_ctx);
 
 	return 0;
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 5d600548b5d7..3cfd7fdec87b 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -52,13 +52,13 @@ static
 struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
 					  unsigned long index)
 {
-	return xa_load(&vdev->ctx, index);
+	return xa_load(&vdev->intr_ctx.ctx, index);
 }
 
 static void vfio_irq_ctx_free(struct vfio_pci_core_device *vdev,
 			      struct vfio_pci_irq_ctx *ctx, unsigned long index)
 {
-	xa_erase(&vdev->ctx, index);
+	xa_erase(&vdev->intr_ctx.ctx, index);
 	kfree(ctx);
 }
 
@@ -72,7 +72,7 @@ vfio_irq_ctx_alloc(struct vfio_pci_core_device *vdev, unsigned long index)
 	if (!ctx)
 		return NULL;
 
-	ret = xa_insert(&vdev->ctx, index, ctx, GFP_KERNEL_ACCOUNT);
+	ret = xa_insert(&vdev->intr_ctx.ctx, index, ctx, GFP_KERNEL_ACCOUNT);
 	if (ret) {
 		kfree(ctx);
 		return NULL;
@@ -530,7 +530,7 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
 	unsigned long i;
 	u16 cmd;
 
-	xa_for_each(&vdev->ctx, i, ctx) {
+	xa_for_each(&vdev->intr_ctx.ctx, i, ctx) {
 		vfio_virqfd_disable(&ctx->unmask);
 		vfio_virqfd_disable(&ctx->mask);
 		vfio_msi_set_vector_signal(vdev, i, -1, msix);
@@ -810,6 +810,7 @@ void vfio_pci_init_intr_ctx(struct vfio_pci_core_device *vdev,
 	intr_ctx->ops = &vfio_pci_intr_ops;
 	intr_ctx->priv = vdev;
 	mutex_init(&intr_ctx->igate);
+	xa_init(&intr_ctx->ctx);
 }
 EXPORT_SYMBOL_GPL(vfio_pci_init_intr_ctx);
 
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 1eb5842cff11..0f9df87aedd9 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -56,6 +56,7 @@ struct vfio_pci_region {
  * @igate:		Protects members of struct vfio_pci_intr_ctx
  * @err_trigger:	Eventfd associated with error reporting IRQ
  * @req_trigger:	Eventfd associated with device request notification
+ * @ctx:		Per-interrupt context indexed by vector
  */
 struct vfio_pci_intr_ctx {
 	const struct vfio_pci_intr_ops	*ops;
@@ -63,6 +64,7 @@ struct vfio_pci_intr_ctx {
 	struct mutex			igate;
 	struct eventfd_ctx		*err_trigger;
 	struct eventfd_ctx		*req_trigger;
+	struct xarray			ctx;
 };
 
 struct vfio_pci_intr_ops {
@@ -98,7 +100,6 @@ struct vfio_pci_core_device {
 	u8			*vconfig;
 	struct perm_bits	*msi_perm;
 	spinlock_t		irqlock;
-	struct xarray		ctx;
 	int			irq_type;
 	int			num_regions;
 	struct vfio_pci_region	*region;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 10/26] vfio/pci: Move IRQ type to generic interrupt context
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (8 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 09/26] vfio/pci: Move per-interrupt contexts to generic interrupt struct Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 11/26] vfio/pci: Provide interrupt context to irq_is() and is_irq_none() Reinette Chatre
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

The type of interrupts within the guest is not unique to
PCI devices and needed for other virtual devices supporting
interrupts.

Move interrupt type to the generic interrupt context struct
vfio_pci_intr_ctx.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Question for maintainers:
irq_type is accessed in ioctl() flow as well as other
flows. It is not clear to me how it is protected against
concurrent access. Should accesses outside of ioctl() flow
take the mutex?

No changes since RFC V2.

 drivers/vfio/pci/vfio_pci_config.c |  2 +-
 drivers/vfio/pci/vfio_pci_core.c   |  5 ++---
 drivers/vfio/pci/vfio_pci_intrs.c  | 21 +++++++++++----------
 include/linux/vfio_pci_core.h      |  3 ++-
 4 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 7e2e62ab0869..2535bdbc016d 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -1168,7 +1168,7 @@ static int vfio_msi_config_write(struct vfio_pci_core_device *vdev, int pos,
 		flags = le16_to_cpu(*pflags);
 
 		/* MSI is enabled via ioctl */
-		if  (vdev->irq_type != VFIO_PCI_MSI_IRQ_INDEX)
+		if  (vdev->intr_ctx.irq_type != VFIO_PCI_MSI_IRQ_INDEX)
 			flags &= ~PCI_MSI_FLAGS_ENABLE;
 
 		/* Check queue size */
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index cf303a9555f0..34109ed38454 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -427,7 +427,7 @@ static int vfio_pci_core_runtime_suspend(struct device *dev)
 	 * vfio_pci_intx_mask() will return false and in that case, INTx
 	 * should not be unmasked in the runtime resume.
 	 */
-	vdev->pm_intx_masked = ((vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX) &&
+	vdev->pm_intx_masked = ((vdev->intr_ctx.irq_type == VFIO_PCI_INTX_IRQ_INDEX) &&
 				vfio_pci_intx_mask(vdev));
 
 	return 0;
@@ -596,7 +596,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
 
 	vfio_pci_set_irqs_ioctl(&vdev->intr_ctx, VFIO_IRQ_SET_DATA_NONE |
 				VFIO_IRQ_SET_ACTION_TRIGGER,
-				vdev->irq_type, 0, 0, NULL);
+				vdev->intr_ctx.irq_type, 0, 0, NULL);
 
 	/* Device closed, don't need mutex here */
 	list_for_each_entry_safe(ioeventfd, ioeventfd_tmp,
@@ -2153,7 +2153,6 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
 		container_of(core_vdev, struct vfio_pci_core_device, vdev);
 
 	vdev->pdev = to_pci_dev(core_vdev->dev);
-	vdev->irq_type = VFIO_PCI_NUM_IRQS;
 	spin_lock_init(&vdev->irqlock);
 	mutex_init(&vdev->ioeventfds_lock);
 	INIT_LIST_HEAD(&vdev->dummy_resources_list);
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 3cfd7fdec87b..858795ba50fe 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -33,19 +33,19 @@ struct vfio_pci_irq_ctx {
 
 static bool irq_is(struct vfio_pci_core_device *vdev, int type)
 {
-	return vdev->irq_type == type;
+	return vdev->intr_ctx.irq_type == type;
 }
 
 static bool is_intx(struct vfio_pci_core_device *vdev)
 {
-	return vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX;
+	return vdev->intr_ctx.irq_type == VFIO_PCI_INTX_IRQ_INDEX;
 }
 
 static bool is_irq_none(struct vfio_pci_core_device *vdev)
 {
-	return !(vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX ||
-		 vdev->irq_type == VFIO_PCI_MSI_IRQ_INDEX ||
-		 vdev->irq_type == VFIO_PCI_MSIX_IRQ_INDEX);
+	return !(vdev->intr_ctx.irq_type == VFIO_PCI_INTX_IRQ_INDEX ||
+		 vdev->intr_ctx.irq_type == VFIO_PCI_MSI_IRQ_INDEX ||
+		 vdev->intr_ctx.irq_type == VFIO_PCI_MSIX_IRQ_INDEX);
 }
 
 static
@@ -255,7 +255,7 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
 	if (vdev->pci_2_3)
 		pci_intx(vdev->pdev, !ctx->masked);
 
-	vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
+	vdev->intr_ctx.irq_type = VFIO_PCI_INTX_IRQ_INDEX;
 
 	return 0;
 }
@@ -331,7 +331,7 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
 		vfio_virqfd_disable(&ctx->mask);
 	}
 	vfio_intx_set_signal(vdev, -1);
-	vdev->irq_type = VFIO_PCI_NUM_IRQS;
+	vdev->intr_ctx.irq_type = VFIO_PCI_NUM_IRQS;
 	vfio_irq_ctx_free(vdev, ctx, 0);
 }
 
@@ -367,7 +367,7 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
 	}
 	vfio_pci_memory_unlock_and_restore(vdev, cmd);
 
-	vdev->irq_type = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
+	vdev->intr_ctx.irq_type = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
 				VFIO_PCI_MSI_IRQ_INDEX;
 
 	if (!msix) {
@@ -547,7 +547,7 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
 	if (vdev->nointx)
 		pci_intx(pdev, 0);
 
-	vdev->irq_type = VFIO_PCI_NUM_IRQS;
+	vdev->intr_ctx.irq_type = VFIO_PCI_NUM_IRQS;
 }
 
 /*
@@ -677,7 +677,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 		int32_t *fds = data;
 		int ret;
 
-		if (vdev->irq_type == index)
+		if (vdev->intr_ctx.irq_type == index)
 			return vfio_msi_set_block(vdev, start, count,
 						  fds, msix);
 
@@ -807,6 +807,7 @@ static struct vfio_pci_intr_ops vfio_pci_intr_ops = {
 void vfio_pci_init_intr_ctx(struct vfio_pci_core_device *vdev,
 			    struct vfio_pci_intr_ctx *intr_ctx)
 {
+	intr_ctx->irq_type = VFIO_PCI_NUM_IRQS;
 	intr_ctx->ops = &vfio_pci_intr_ops;
 	intr_ctx->priv = vdev;
 	mutex_init(&intr_ctx->igate);
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 0f9df87aedd9..e666c19da223 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -57,6 +57,7 @@ struct vfio_pci_region {
  * @err_trigger:	Eventfd associated with error reporting IRQ
  * @req_trigger:	Eventfd associated with device request notification
  * @ctx:		Per-interrupt context indexed by vector
+ * @irq_type:		Type of interrupt from guest perspective
  */
 struct vfio_pci_intr_ctx {
 	const struct vfio_pci_intr_ops	*ops;
@@ -65,6 +66,7 @@ struct vfio_pci_intr_ctx {
 	struct eventfd_ctx		*err_trigger;
 	struct eventfd_ctx		*req_trigger;
 	struct xarray			ctx;
+	int				irq_type;
 };
 
 struct vfio_pci_intr_ops {
@@ -100,7 +102,6 @@ struct vfio_pci_core_device {
 	u8			*vconfig;
 	struct perm_bits	*msi_perm;
 	spinlock_t		irqlock;
-	int			irq_type;
 	int			num_regions;
 	struct vfio_pci_region	*region;
 	u8			msi_qmax;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 11/26] vfio/pci: Provide interrupt context to irq_is() and is_irq_none()
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (9 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 10/26] vfio/pci: Move IRQ type to generic interrupt context Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 12/26] vfio/pci: Provide interrupt context to generic ops Reinette Chatre
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

The IRQ type moved to the interrupt context, struct vfio_pci_intr_ctx.

Let the tests on the IRQ type use the interrupt context directly without
any assumption about the containing structure. Doing so makes these
generic utilities available to all interrupt management backends.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- New patch

 drivers/vfio/pci/vfio_pci_intrs.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 858795ba50fe..9aff5c38f198 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -31,9 +31,9 @@ struct vfio_pci_irq_ctx {
 	struct irq_bypass_producer	producer;
 };
 
-static bool irq_is(struct vfio_pci_core_device *vdev, int type)
+static bool irq_is(struct vfio_pci_intr_ctx *intr_ctx, int type)
 {
-	return vdev->intr_ctx.irq_type == type;
+	return intr_ctx->irq_type == type;
 }
 
 static bool is_intx(struct vfio_pci_core_device *vdev)
@@ -41,11 +41,11 @@ static bool is_intx(struct vfio_pci_core_device *vdev)
 	return vdev->intr_ctx.irq_type == VFIO_PCI_INTX_IRQ_INDEX;
 }
 
-static bool is_irq_none(struct vfio_pci_core_device *vdev)
+static bool is_irq_none(struct vfio_pci_intr_ctx *intr_ctx)
 {
-	return !(vdev->intr_ctx.irq_type == VFIO_PCI_INTX_IRQ_INDEX ||
-		 vdev->intr_ctx.irq_type == VFIO_PCI_MSI_IRQ_INDEX ||
-		 vdev->intr_ctx.irq_type == VFIO_PCI_MSIX_IRQ_INDEX);
+	return !(intr_ctx->irq_type == VFIO_PCI_INTX_IRQ_INDEX ||
+		 intr_ctx->irq_type == VFIO_PCI_MSI_IRQ_INDEX ||
+		 intr_ctx->irq_type == VFIO_PCI_MSIX_IRQ_INDEX);
 }
 
 static
@@ -235,7 +235,7 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
 {
 	struct vfio_pci_irq_ctx *ctx;
 
-	if (!is_irq_none(vdev))
+	if (!is_irq_none(&vdev->intr_ctx))
 		return -EINVAL;
 
 	if (!vdev->pdev->irq)
@@ -353,7 +353,7 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
 	int ret;
 	u16 cmd;
 
-	if (!is_irq_none(vdev))
+	if (!is_irq_none(&vdev->intr_ctx))
 		return -EINVAL;
 
 	/* return the number of supported vectors if we can't get all: */
@@ -621,7 +621,7 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 		return 0;
 	}
 
-	if (!(is_intx(vdev) || is_irq_none(vdev)) || start != 0 || count != 1)
+	if (!(is_intx(vdev) || is_irq_none(intr_ctx)) || start != 0 || count != 1)
 		return -EINVAL;
 
 	if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
@@ -665,12 +665,12 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 	unsigned int i;
 	bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
 
-	if (irq_is(vdev, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
+	if (irq_is(intr_ctx, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
 		vfio_msi_disable(vdev, msix);
 		return 0;
 	}
 
-	if (!(irq_is(vdev, index) || is_irq_none(vdev)))
+	if (!(irq_is(intr_ctx, index) || is_irq_none(intr_ctx)))
 		return -EINVAL;
 
 	if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
@@ -692,7 +692,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 		return ret;
 	}
 
-	if (!irq_is(vdev, index))
+	if (!irq_is(intr_ctx, index))
 		return -EINVAL;
 
 	for (i = start; i < start + count; i++) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 12/26] vfio/pci: Provide interrupt context to generic ops
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (10 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 11/26] vfio/pci: Provide interrupt context to irq_is() and is_irq_none() Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 13/26] vfio/pci: Provide interrupt context to vfio_msi_enable() and vfio_msi_disable() Reinette Chatre
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

The functions operating on the per-interrupt context were originally
created to support management of PCI device interrupts where the
interrupt context was maintained within the virtual PCI device's
struct vfio_pci_core_device. Now that the per-interrupt context
has been moved to a more generic struct vfio_pci_intr_ctx these utilities
can be changed to expect the generic structure instead. This enables
these utilities to be used in other interrupt management backends.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
No changes since RFC V2.

 drivers/vfio/pci/vfio_pci_intrs.c | 41 ++++++++++++++++---------------
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 9aff5c38f198..cdb6f875271f 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -49,21 +49,21 @@ static bool is_irq_none(struct vfio_pci_intr_ctx *intr_ctx)
 }
 
 static
-struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
+struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_intr_ctx *intr_ctx,
 					  unsigned long index)
 {
-	return xa_load(&vdev->intr_ctx.ctx, index);
+	return xa_load(&intr_ctx->ctx, index);
 }
 
-static void vfio_irq_ctx_free(struct vfio_pci_core_device *vdev,
+static void vfio_irq_ctx_free(struct vfio_pci_intr_ctx *intr_ctx,
 			      struct vfio_pci_irq_ctx *ctx, unsigned long index)
 {
-	xa_erase(&vdev->intr_ctx.ctx, index);
+	xa_erase(&intr_ctx->ctx, index);
 	kfree(ctx);
 }
 
 static struct vfio_pci_irq_ctx *
-vfio_irq_ctx_alloc(struct vfio_pci_core_device *vdev, unsigned long index)
+vfio_irq_ctx_alloc(struct vfio_pci_intr_ctx *intr_ctx, unsigned long index)
 {
 	struct vfio_pci_irq_ctx *ctx;
 	int ret;
@@ -72,7 +72,7 @@ vfio_irq_ctx_alloc(struct vfio_pci_core_device *vdev, unsigned long index)
 	if (!ctx)
 		return NULL;
 
-	ret = xa_insert(&vdev->intr_ctx.ctx, index, ctx, GFP_KERNEL_ACCOUNT);
+	ret = xa_insert(&intr_ctx->ctx, index, ctx, GFP_KERNEL_ACCOUNT);
 	if (ret) {
 		kfree(ctx);
 		return NULL;
@@ -91,7 +91,7 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
 	if (likely(is_intx(vdev) && !vdev->virq_disabled)) {
 		struct vfio_pci_irq_ctx *ctx;
 
-		ctx = vfio_irq_ctx_get(vdev, 0);
+		ctx = vfio_irq_ctx_get(&vdev->intr_ctx, 0);
 		if (WARN_ON_ONCE(!ctx))
 			return;
 		eventfd_signal(ctx->trigger, 1);
@@ -120,7 +120,7 @@ bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
 		goto out_unlock;
 	}
 
-	ctx = vfio_irq_ctx_get(vdev, 0);
+	ctx = vfio_irq_ctx_get(&vdev->intr_ctx, 0);
 	if (WARN_ON_ONCE(!ctx))
 		goto out_unlock;
 
@@ -169,7 +169,7 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
 		goto out_unlock;
 	}
 
-	ctx = vfio_irq_ctx_get(vdev, 0);
+	ctx = vfio_irq_ctx_get(&vdev->intr_ctx, 0);
 	if (WARN_ON_ONCE(!ctx))
 		goto out_unlock;
 
@@ -207,7 +207,7 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
 	unsigned long flags;
 	int ret = IRQ_NONE;
 
-	ctx = vfio_irq_ctx_get(vdev, 0);
+	ctx = vfio_irq_ctx_get(&vdev->intr_ctx, 0);
 	if (WARN_ON_ONCE(!ctx))
 		return ret;
 
@@ -241,7 +241,7 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
 	if (!vdev->pdev->irq)
 		return -ENODEV;
 
-	ctx = vfio_irq_ctx_alloc(vdev, 0);
+	ctx = vfio_irq_ctx_alloc(&vdev->intr_ctx, 0);
 	if (!ctx)
 		return -ENOMEM;
 
@@ -269,7 +269,7 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
 	unsigned long flags;
 	int ret;
 
-	ctx = vfio_irq_ctx_get(vdev, 0);
+	ctx = vfio_irq_ctx_get(&vdev->intr_ctx, 0);
 	if (WARN_ON_ONCE(!ctx))
 		return -EINVAL;
 
@@ -324,7 +324,7 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
 {
 	struct vfio_pci_irq_ctx *ctx;
 
-	ctx = vfio_irq_ctx_get(vdev, 0);
+	ctx = vfio_irq_ctx_get(&vdev->intr_ctx, 0);
 	WARN_ON_ONCE(!ctx);
 	if (ctx) {
 		vfio_virqfd_disable(&ctx->unmask);
@@ -332,7 +332,7 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
 	}
 	vfio_intx_set_signal(vdev, -1);
 	vdev->intr_ctx.irq_type = VFIO_PCI_NUM_IRQS;
-	vfio_irq_ctx_free(vdev, ctx, 0);
+	vfio_irq_ctx_free(&vdev->intr_ctx, ctx, 0);
 }
 
 /*
@@ -421,7 +421,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 	int irq = -EINVAL, ret;
 	u16 cmd;
 
-	ctx = vfio_irq_ctx_get(vdev, vector);
+	ctx = vfio_irq_ctx_get(&vdev->intr_ctx, vector);
 
 	if (ctx) {
 		irq_bypass_unregister_producer(&ctx->producer);
@@ -432,7 +432,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 		/* Interrupt stays allocated, will be freed at MSI-X disable. */
 		kfree(ctx->name);
 		eventfd_ctx_put(ctx->trigger);
-		vfio_irq_ctx_free(vdev, ctx, vector);
+		vfio_irq_ctx_free(&vdev->intr_ctx, ctx, vector);
 	}
 
 	if (fd < 0)
@@ -445,7 +445,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 			return irq;
 	}
 
-	ctx = vfio_irq_ctx_alloc(vdev, vector);
+	ctx = vfio_irq_ctx_alloc(&vdev->intr_ctx, vector);
 	if (!ctx)
 		return -ENOMEM;
 
@@ -499,7 +499,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 out_free_name:
 	kfree(ctx->name);
 out_free_ctx:
-	vfio_irq_ctx_free(vdev, ctx, vector);
+	vfio_irq_ctx_free(&vdev->intr_ctx, ctx, vector);
 	return ret;
 }
 
@@ -570,7 +570,8 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_intr_ctx *intr_ctx,
 		if (unmask)
 			vfio_pci_intx_unmask(vdev);
 	} else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
-		struct vfio_pci_irq_ctx *ctx = vfio_irq_ctx_get(vdev, 0);
+		struct vfio_pci_irq_ctx *ctx = vfio_irq_ctx_get(&vdev->intr_ctx,
+								0);
 		int32_t fd = *(int32_t *)data;
 
 		if (WARN_ON_ONCE(!ctx))
@@ -696,7 +697,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 		return -EINVAL;
 
 	for (i = start; i < start + count; i++) {
-		ctx = vfio_irq_ctx_get(vdev, i);
+		ctx = vfio_irq_ctx_get(&vdev->intr_ctx, i);
 		if (!ctx)
 			continue;
 		if (flags & VFIO_IRQ_SET_DATA_NONE) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 13/26] vfio/pci: Provide interrupt context to vfio_msi_enable() and vfio_msi_disable()
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (11 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 12/26] vfio/pci: Provide interrupt context to generic ops Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 14/26] vfio/pci: Let interrupt management backend interpret interrupt index Reinette Chatre
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

vfio_msi_enable() and vfio_msi_disable() perform the PCI specific
operations to allocate and free interrupts on the device that will
back the guest interrupts. This makes these functions backend
specific calls that should be called by the interrupt management
frontend.

Pass the interrupt context as parameter to vfio_msi_enable() and
vfio_msi_disable() so that they can be called by a generic frontend
and make it possible for other backends to provide their own
vfio_msi_enable() and vfio_msi_disable().

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- New patch

 drivers/vfio/pci/vfio_pci_intrs.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index cdb6f875271f..ad3f9c1baccc 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -346,14 +346,15 @@ static irqreturn_t vfio_msihandler(int irq, void *arg)
 	return IRQ_HANDLED;
 }
 
-static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msix)
+static int vfio_msi_enable(struct vfio_pci_intr_ctx *intr_ctx, int nvec, bool msix)
 {
+	struct vfio_pci_core_device *vdev = intr_ctx->priv;
 	struct pci_dev *pdev = vdev->pdev;
 	unsigned int flag = msix ? PCI_IRQ_MSIX : PCI_IRQ_MSI;
 	int ret;
 	u16 cmd;
 
-	if (!is_irq_none(&vdev->intr_ctx))
+	if (!is_irq_none(intr_ctx))
 		return -EINVAL;
 
 	/* return the number of supported vectors if we can't get all: */
@@ -367,7 +368,7 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
 	}
 	vfio_pci_memory_unlock_and_restore(vdev, cmd);
 
-	vdev->intr_ctx.irq_type = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
+	intr_ctx->irq_type = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
 				VFIO_PCI_MSI_IRQ_INDEX;
 
 	if (!msix) {
@@ -523,14 +524,15 @@ static int vfio_msi_set_block(struct vfio_pci_core_device *vdev,
 	return ret;
 }
 
-static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
+static void vfio_msi_disable(struct vfio_pci_intr_ctx *intr_ctx, bool msix)
 {
+	struct vfio_pci_core_device *vdev = intr_ctx->priv;
 	struct pci_dev *pdev = vdev->pdev;
 	struct vfio_pci_irq_ctx *ctx;
 	unsigned long i;
 	u16 cmd;
 
-	xa_for_each(&vdev->intr_ctx.ctx, i, ctx) {
+	xa_for_each(&intr_ctx->ctx, i, ctx) {
 		vfio_virqfd_disable(&ctx->unmask);
 		vfio_virqfd_disable(&ctx->mask);
 		vfio_msi_set_vector_signal(vdev, i, -1, msix);
@@ -547,7 +549,7 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
 	if (vdev->nointx)
 		pci_intx(pdev, 0);
 
-	vdev->intr_ctx.irq_type = VFIO_PCI_NUM_IRQS;
+	intr_ctx->irq_type = VFIO_PCI_NUM_IRQS;
 }
 
 /*
@@ -667,7 +669,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 	bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
 
 	if (irq_is(intr_ctx, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
-		vfio_msi_disable(vdev, msix);
+		vfio_msi_disable(intr_ctx, msix);
 		return 0;
 	}
 
@@ -682,13 +684,13 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 			return vfio_msi_set_block(vdev, start, count,
 						  fds, msix);
 
-		ret = vfio_msi_enable(vdev, start + count, msix);
+		ret = vfio_msi_enable(intr_ctx, start + count, msix);
 		if (ret)
 			return ret;
 
 		ret = vfio_msi_set_block(vdev, start, count, fds, msix);
 		if (ret)
-			vfio_msi_disable(vdev, msix);
+			vfio_msi_disable(intr_ctx, msix);
 
 		return ret;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 14/26] vfio/pci: Let interrupt management backend interpret interrupt index
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (12 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 13/26] vfio/pci: Provide interrupt context to vfio_msi_enable() and vfio_msi_disable() Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 15/26] vfio/pci: Move generic code to frontend Reinette Chatre
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

vfio_pci_set_msi_trigger() and vfio_msi_set_block() are generic
and can be shared by different interrupt backends. This implies
that these functions should not interpret user provided parameters
but instead pass them to the backend specific code for interpretation.

Instead of assuming that only MSI or MSI-X can be provided via the
index and passing a boolean based on what was received, pass the
actual index to backend for interpretation.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- New patch

 drivers/vfio/pci/vfio_pci_intrs.c | 38 +++++++++++++++++--------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index ad3f9c1baccc..d2b80e176651 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -346,17 +346,20 @@ static irqreturn_t vfio_msihandler(int irq, void *arg)
 	return IRQ_HANDLED;
 }
 
-static int vfio_msi_enable(struct vfio_pci_intr_ctx *intr_ctx, int nvec, bool msix)
+static int vfio_msi_enable(struct vfio_pci_intr_ctx *intr_ctx, int nvec,
+			   unsigned int index)
 {
 	struct vfio_pci_core_device *vdev = intr_ctx->priv;
 	struct pci_dev *pdev = vdev->pdev;
-	unsigned int flag = msix ? PCI_IRQ_MSIX : PCI_IRQ_MSI;
+	unsigned int flag;
 	int ret;
 	u16 cmd;
 
 	if (!is_irq_none(intr_ctx))
 		return -EINVAL;
 
+	flag = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? PCI_IRQ_MSIX : PCI_IRQ_MSI;
+
 	/* return the number of supported vectors if we can't get all: */
 	cmd = vfio_pci_memory_lock_and_enable(vdev);
 	ret = pci_alloc_irq_vectors(pdev, 1, nvec, flag);
@@ -368,10 +371,9 @@ static int vfio_msi_enable(struct vfio_pci_intr_ctx *intr_ctx, int nvec, bool ms
 	}
 	vfio_pci_memory_unlock_and_restore(vdev, cmd);
 
-	intr_ctx->irq_type = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
-				VFIO_PCI_MSI_IRQ_INDEX;
+	intr_ctx->irq_type = index;
 
-	if (!msix) {
+	if (index == VFIO_PCI_MSI_IRQ_INDEX) {
 		/*
 		 * Compute the virtual hardware field for max msi vectors -
 		 * it is the log base 2 of the number of vectors.
@@ -414,8 +416,10 @@ static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev,
 }
 
 static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
-				      unsigned int vector, int fd, bool msix)
+				      unsigned int vector, int fd,
+				      unsigned int index)
 {
+	bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
 	struct pci_dev *pdev = vdev->pdev;
 	struct vfio_pci_irq_ctx *ctx;
 	struct eventfd_ctx *trigger;
@@ -506,25 +510,26 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 
 static int vfio_msi_set_block(struct vfio_pci_core_device *vdev,
 			      unsigned int start, unsigned int count,
-			      int32_t *fds, bool msix)
+			      int32_t *fds, unsigned int index)
 {
 	unsigned int i, j;
 	int ret = 0;
 
 	for (i = 0, j = start; i < count && !ret; i++, j++) {
 		int fd = fds ? fds[i] : -1;
-		ret = vfio_msi_set_vector_signal(vdev, j, fd, msix);
+		ret = vfio_msi_set_vector_signal(vdev, j, fd, index);
 	}
 
 	if (ret) {
 		for (i = start; i < j; i++)
-			vfio_msi_set_vector_signal(vdev, i, -1, msix);
+			vfio_msi_set_vector_signal(vdev, i, -1, index);
 	}
 
 	return ret;
 }
 
-static void vfio_msi_disable(struct vfio_pci_intr_ctx *intr_ctx, bool msix)
+static void vfio_msi_disable(struct vfio_pci_intr_ctx *intr_ctx,
+			     unsigned int index)
 {
 	struct vfio_pci_core_device *vdev = intr_ctx->priv;
 	struct pci_dev *pdev = vdev->pdev;
@@ -535,7 +540,7 @@ static void vfio_msi_disable(struct vfio_pci_intr_ctx *intr_ctx, bool msix)
 	xa_for_each(&intr_ctx->ctx, i, ctx) {
 		vfio_virqfd_disable(&ctx->unmask);
 		vfio_virqfd_disable(&ctx->mask);
-		vfio_msi_set_vector_signal(vdev, i, -1, msix);
+		vfio_msi_set_vector_signal(vdev, i, -1, index);
 	}
 
 	cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -666,10 +671,9 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 	struct vfio_pci_core_device *vdev = intr_ctx->priv;
 	struct vfio_pci_irq_ctx *ctx;
 	unsigned int i;
-	bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
 
 	if (irq_is(intr_ctx, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
-		vfio_msi_disable(intr_ctx, msix);
+		vfio_msi_disable(intr_ctx, index);
 		return 0;
 	}
 
@@ -682,15 +686,15 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 
 		if (vdev->intr_ctx.irq_type == index)
 			return vfio_msi_set_block(vdev, start, count,
-						  fds, msix);
+						  fds, index);
 
-		ret = vfio_msi_enable(intr_ctx, start + count, msix);
+		ret = vfio_msi_enable(intr_ctx, start + count, index);
 		if (ret)
 			return ret;
 
-		ret = vfio_msi_set_block(vdev, start, count, fds, msix);
+		ret = vfio_msi_set_block(vdev, start, count, fds, index);
 		if (ret)
-			vfio_msi_disable(intr_ctx, msix);
+			vfio_msi_disable(intr_ctx, index);
 
 		return ret;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 15/26] vfio/pci: Move generic code to frontend
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (13 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 14/26] vfio/pci: Let interrupt management backend interpret interrupt index Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 16/26] vfio/pci: Split interrupt context initialization Reinette Chatre
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

vfio_pci_set_msi_trigger() and vfio_msi_set_block() are generic
and thus appropriate to be frontend code. This means that they
should operate on the interrupt context, not backend specific
data.

Provide the interrupt context as parameter to vfio_pci_set_msi_trigger()
and vfio_msi_set_block() and remove all references to the PCI interrupt
management data from these functions. This enables these functions to
form part of the interrupt management frontend shared by different
interrupt management backends.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- New patch

 drivers/vfio/pci/vfio_pci_intrs.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index d2b80e176651..adad93c31ac6 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -415,18 +415,19 @@ static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev,
 	return map.index < 0 ? map.index : map.virq;
 }
 
-static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
+static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
 				      unsigned int vector, int fd,
 				      unsigned int index)
 {
 	bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
+	struct vfio_pci_core_device *vdev = intr_ctx->priv;
 	struct pci_dev *pdev = vdev->pdev;
 	struct vfio_pci_irq_ctx *ctx;
 	struct eventfd_ctx *trigger;
 	int irq = -EINVAL, ret;
 	u16 cmd;
 
-	ctx = vfio_irq_ctx_get(&vdev->intr_ctx, vector);
+	ctx = vfio_irq_ctx_get(intr_ctx, vector);
 
 	if (ctx) {
 		irq_bypass_unregister_producer(&ctx->producer);
@@ -437,7 +438,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 		/* Interrupt stays allocated, will be freed at MSI-X disable. */
 		kfree(ctx->name);
 		eventfd_ctx_put(ctx->trigger);
-		vfio_irq_ctx_free(&vdev->intr_ctx, ctx, vector);
+		vfio_irq_ctx_free(intr_ctx, ctx, vector);
 	}
 
 	if (fd < 0)
@@ -450,7 +451,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 			return irq;
 	}
 
-	ctx = vfio_irq_ctx_alloc(&vdev->intr_ctx, vector);
+	ctx = vfio_irq_ctx_alloc(intr_ctx, vector);
 	if (!ctx)
 		return -ENOMEM;
 
@@ -504,11 +505,11 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 out_free_name:
 	kfree(ctx->name);
 out_free_ctx:
-	vfio_irq_ctx_free(&vdev->intr_ctx, ctx, vector);
+	vfio_irq_ctx_free(intr_ctx, ctx, vector);
 	return ret;
 }
 
-static int vfio_msi_set_block(struct vfio_pci_core_device *vdev,
+static int vfio_msi_set_block(struct vfio_pci_intr_ctx *intr_ctx,
 			      unsigned int start, unsigned int count,
 			      int32_t *fds, unsigned int index)
 {
@@ -517,12 +518,12 @@ static int vfio_msi_set_block(struct vfio_pci_core_device *vdev,
 
 	for (i = 0, j = start; i < count && !ret; i++, j++) {
 		int fd = fds ? fds[i] : -1;
-		ret = vfio_msi_set_vector_signal(vdev, j, fd, index);
+		ret = vfio_msi_set_vector_signal(intr_ctx, j, fd, index);
 	}
 
 	if (ret) {
 		for (i = start; i < j; i++)
-			vfio_msi_set_vector_signal(vdev, i, -1, index);
+			vfio_msi_set_vector_signal(intr_ctx, i, -1, index);
 	}
 
 	return ret;
@@ -540,7 +541,7 @@ static void vfio_msi_disable(struct vfio_pci_intr_ctx *intr_ctx,
 	xa_for_each(&intr_ctx->ctx, i, ctx) {
 		vfio_virqfd_disable(&ctx->unmask);
 		vfio_virqfd_disable(&ctx->mask);
-		vfio_msi_set_vector_signal(vdev, i, -1, index);
+		vfio_msi_set_vector_signal(intr_ctx, i, -1, index);
 	}
 
 	cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -668,7 +669,6 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 				    unsigned int count, uint32_t flags,
 				    void *data)
 {
-	struct vfio_pci_core_device *vdev = intr_ctx->priv;
 	struct vfio_pci_irq_ctx *ctx;
 	unsigned int i;
 
@@ -684,15 +684,15 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 		int32_t *fds = data;
 		int ret;
 
-		if (vdev->intr_ctx.irq_type == index)
-			return vfio_msi_set_block(vdev, start, count,
+		if (intr_ctx->irq_type == index)
+			return vfio_msi_set_block(intr_ctx, start, count,
 						  fds, index);
 
 		ret = vfio_msi_enable(intr_ctx, start + count, index);
 		if (ret)
 			return ret;
 
-		ret = vfio_msi_set_block(vdev, start, count, fds, index);
+		ret = vfio_msi_set_block(intr_ctx, start, count, fds, index);
 		if (ret)
 			vfio_msi_disable(intr_ctx, index);
 
@@ -703,7 +703,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 		return -EINVAL;
 
 	for (i = start; i < start + count; i++) {
-		ctx = vfio_irq_ctx_get(&vdev->intr_ctx, i);
+		ctx = vfio_irq_ctx_get(intr_ctx, i);
 		if (!ctx)
 			continue;
 		if (flags & VFIO_IRQ_SET_DATA_NONE) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 16/26] vfio/pci: Split interrupt context initialization
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (14 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 15/26] vfio/pci: Move generic code to frontend Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 17/26] vfio/pci: Make vfio_pci_set_irqs_ioctl() available Reinette Chatre
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

struct vfio_pci_intr_ctx is the context associated with interrupts
of a virtual device. The interrupt context is initialized with
backend specific data required by the particular interrupt management
backend as well as common initialization required by all interrupt
management backends.

Split interrupt context initialization into common and interrupt
management backend specific calls. The entrypoint will be the
initialization of a particular interrupt management backend which
in turn calls the common initialization.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
No changes since RFC V2.

 drivers/vfio/pci/vfio_pci_intrs.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index adad93c31ac6..14131d5288e3 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -801,6 +801,18 @@ static int vfio_pci_set_req_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 					       count, flags, data);
 }
 
+static void _vfio_pci_init_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)
+{
+	intr_ctx->irq_type = VFIO_PCI_NUM_IRQS;
+	mutex_init(&intr_ctx->igate);
+	xa_init(&intr_ctx->ctx);
+}
+
+static void _vfio_pci_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)
+{
+	mutex_destroy(&intr_ctx->igate);
+}
+
 static struct vfio_pci_intr_ops vfio_pci_intr_ops = {
 	.set_intx_mask = vfio_pci_set_intx_mask,
 	.set_intx_unmask = vfio_pci_set_intx_unmask,
@@ -814,17 +826,15 @@ static struct vfio_pci_intr_ops vfio_pci_intr_ops = {
 void vfio_pci_init_intr_ctx(struct vfio_pci_core_device *vdev,
 			    struct vfio_pci_intr_ctx *intr_ctx)
 {
-	intr_ctx->irq_type = VFIO_PCI_NUM_IRQS;
+	_vfio_pci_init_intr_ctx(intr_ctx);
 	intr_ctx->ops = &vfio_pci_intr_ops;
 	intr_ctx->priv = vdev;
-	mutex_init(&intr_ctx->igate);
-	xa_init(&intr_ctx->ctx);
 }
 EXPORT_SYMBOL_GPL(vfio_pci_init_intr_ctx);
 
 void vfio_pci_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)
 {
-	mutex_destroy(&intr_ctx->igate);
+	_vfio_pci_release_intr_ctx(intr_ctx);
 }
 EXPORT_SYMBOL_GPL(vfio_pci_release_intr_ctx);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 17/26] vfio/pci: Make vfio_pci_set_irqs_ioctl() available
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (15 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 16/26] vfio/pci: Split interrupt context initialization Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 18/26] vfio/pci: Preserve per-interrupt contexts Reinette Chatre
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

vfio_pci_set_irqs_ioctl() is now a generic entrypoint that can
be configured to support different interrupt management backend.

Export vfio_pci_set_irqs_ioctl() for use by other virtual device
drivers.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- Improve changelog.

 drivers/vfio/pci/vfio_pci_intrs.c | 1 +
 include/linux/vfio_pci_core.h     | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 14131d5288e3..80040fde6f6b 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -916,3 +916,4 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 	mutex_unlock(&intr_ctx->igate);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(vfio_pci_set_irqs_ioctl);
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index e666c19da223..8d2fb51a2dcc 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -158,6 +158,9 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
 void vfio_pci_init_intr_ctx(struct vfio_pci_core_device *vdev,
 			    struct vfio_pci_intr_ctx *intr_ctx);
 void vfio_pci_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx);
+int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
+			    unsigned int index, unsigned int start,
+			    unsigned int count, void *data);
 int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
 				void __user *arg, size_t argsz);
 ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 18/26] vfio/pci: Preserve per-interrupt contexts
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (16 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 17/26] vfio/pci: Make vfio_pci_set_irqs_ioctl() available Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 19/26] vfio/pci: Store Linux IRQ number in per-interrupt context Reinette Chatre
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

Interrupt management for PCI passthrough devices create a new
per-interrupt context every time an interrupt is allocated, freeing
it when the interrupt is freed.

The per-interrupt context contains the properties of a particular
interrupt. Without a property that guides interrupt allocation and
free it is acceptable to always create a new per-interrupt context.

Maintain per-interrupt context across interrupt allocate and free
events in preparation for per-interrupt properties that guide
interrupt allocation and free. Examples of such properties are:
(a) whether the interrupt is emulated or not, which guides whether
the backend should indeed allocate and/or free an interrupt, (b)
an instance cookie associated with the interrupt that needs to be
provided to interrupt allocation when the interrupt is backed by IMS.

This means that existence of per-interrupt context no longer implies
a valid trigger, pointers to freed memory should be cleared, and a new
per-interrupt context cannot be assumed needing allocation when an
interrupt is allocated.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- New patch

 drivers/vfio/pci/vfio_pci_intrs.c | 41 ++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 80040fde6f6b..8d84e7d62594 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -429,7 +429,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
 
 	ctx = vfio_irq_ctx_get(intr_ctx, vector);
 
-	if (ctx) {
+	if (ctx && ctx->trigger) {
 		irq_bypass_unregister_producer(&ctx->producer);
 		irq = pci_irq_vector(pdev, vector);
 		cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -437,8 +437,9 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
 		vfio_pci_memory_unlock_and_restore(vdev, cmd);
 		/* Interrupt stays allocated, will be freed at MSI-X disable. */
 		kfree(ctx->name);
+		ctx->name = NULL;
 		eventfd_ctx_put(ctx->trigger);
-		vfio_irq_ctx_free(intr_ctx, ctx, vector);
+		ctx->trigger = NULL;
 	}
 
 	if (fd < 0)
@@ -451,16 +452,17 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
 			return irq;
 	}
 
-	ctx = vfio_irq_ctx_alloc(intr_ctx, vector);
-	if (!ctx)
-		return -ENOMEM;
+	/* Per-interrupt context remain allocated. */
+	if (!ctx) {
+		ctx = vfio_irq_ctx_alloc(intr_ctx, vector);
+		if (!ctx)
+			return -ENOMEM;
+	}
 
 	ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
 			      msix ? "x" : "", vector, pci_name(pdev));
-	if (!ctx->name) {
-		ret = -ENOMEM;
-		goto out_free_ctx;
-	}
+	if (!ctx->name)
+		return -ENOMEM;
 
 	trigger = eventfd_ctx_fdget(fd);
 	if (IS_ERR(trigger)) {
@@ -504,8 +506,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
 	eventfd_ctx_put(trigger);
 out_free_name:
 	kfree(ctx->name);
-out_free_ctx:
-	vfio_irq_ctx_free(intr_ctx, ctx, vector);
+	ctx->name = NULL;
 	return ret;
 }
 
@@ -704,7 +705,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 
 	for (i = start; i < start + count; i++) {
 		ctx = vfio_irq_ctx_get(intr_ctx, i);
-		if (!ctx)
+		if (!ctx || !ctx->trigger)
 			continue;
 		if (flags & VFIO_IRQ_SET_DATA_NONE) {
 			eventfd_signal(ctx->trigger, 1);
@@ -810,6 +811,22 @@ static void _vfio_pci_init_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)
 
 static void _vfio_pci_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)
 {
+	struct vfio_pci_irq_ctx *ctx;
+	unsigned long i;
+
+	/*
+	 * Per-interrupt context remains allocated after interrupt is
+	 * freed. Per-interrupt context need to be freed separately.
+	 */
+	mutex_lock(&intr_ctx->igate);
+	xa_for_each(&intr_ctx->ctx, i, ctx) {
+		WARN_ON_ONCE(ctx->trigger);
+		WARN_ON_ONCE(ctx->name);
+		xa_erase(&intr_ctx->ctx, i);
+		kfree(ctx);
+	}
+	mutex_unlock(&intr_ctx->igate);
+
 	mutex_destroy(&intr_ctx->igate);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 19/26] vfio/pci: Store Linux IRQ number in per-interrupt context
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (17 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 18/26] vfio/pci: Preserve per-interrupt contexts Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 20/26] vfio/pci: Separate frontend and backend code during interrupt enable/disable Reinette Chatre
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

The Linux IRQ number is a property shared among all interrupt
backends but not all interrupt management backends have a simple
query for it. pci_irq_vector() can be used to obtain the Linux
IRQ number of a MSI-X interrupt but there is no such
query for IMS interrupts.

The Linux IRQ number is needed during interrupt free as well
as during register of IRQ bypass producer. It is unnecessary to
query the Linux IRQ number at each stage, the number can be
stored at the time the interrupt is allocated and obtained
from its per-interrupt context when needed.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- New patch

 drivers/vfio/pci/vfio_pci_intrs.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 8d84e7d62594..fd0713dc9f81 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -29,6 +29,7 @@ struct vfio_pci_irq_ctx {
 	char			*name;
 	bool			masked;
 	struct irq_bypass_producer	producer;
+	int			virq;
 };
 
 static bool irq_is(struct vfio_pci_intr_ctx *intr_ctx, int type)
@@ -431,10 +432,11 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
 
 	if (ctx && ctx->trigger) {
 		irq_bypass_unregister_producer(&ctx->producer);
-		irq = pci_irq_vector(pdev, vector);
+		irq = ctx->virq;
 		cmd = vfio_pci_memory_lock_and_enable(vdev);
-		free_irq(irq, ctx->trigger);
+		free_irq(ctx->virq, ctx->trigger);
 		vfio_pci_memory_unlock_and_restore(vdev, cmd);
+		ctx->virq = 0;
 		/* Interrupt stays allocated, will be freed at MSI-X disable. */
 		kfree(ctx->name);
 		ctx->name = NULL;
@@ -488,8 +490,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
 	if (ret)
 		goto out_put_eventfd_ctx;
 
+	ctx->virq = irq;
+
 	ctx->producer.token = trigger;
-	ctx->producer.irq = irq;
+	ctx->producer.irq = ctx->virq;
 	ret = irq_bypass_register_producer(&ctx->producer);
 	if (unlikely(ret)) {
 		dev_info(&pdev->dev,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 20/26] vfio/pci: Separate frontend and backend code during interrupt enable/disable
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (18 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 19/26] vfio/pci: Store Linux IRQ number in per-interrupt context Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 21/26] vfio/pci: Replace backend specific calls with callbacks Reinette Chatre
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

vfio_msi_set_vector_signal() contains a mix of generic and
backend specific code.

Separate the backend specific code into functions that can be
replaced by backend-specific callbacks.

The dev_info() used in error message is replaced by a pr_info()
that prints the device name generated by the backend specific code
intended to be used during request_irq().

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- New patch

 drivers/vfio/pci/vfio_pci_intrs.c | 110 +++++++++++++++++++-----------
 1 file changed, 70 insertions(+), 40 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index fd0713dc9f81..c1f65b8adfe2 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -416,28 +416,81 @@ static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev,
 	return map.index < 0 ? map.index : map.virq;
 }
 
-static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
-				      unsigned int vector, int fd,
+static void vfio_msi_free_interrupt(struct vfio_pci_intr_ctx *intr_ctx,
+				    struct vfio_pci_irq_ctx *ctx,
+				    unsigned int vector)
+{
+	struct vfio_pci_core_device *vdev = intr_ctx->priv;
+	u16 cmd;
+
+	cmd = vfio_pci_memory_lock_and_enable(vdev);
+	free_irq(ctx->virq, ctx->trigger);
+	vfio_pci_memory_unlock_and_restore(vdev, cmd);
+	ctx->virq = 0;
+	/* Interrupt stays allocated, will be freed at MSI-X disable. */
+}
+
+static int vfio_msi_request_interrupt(struct vfio_pci_intr_ctx *intr_ctx,
+				      struct vfio_pci_irq_ctx *ctx,
+				      unsigned int vector,
 				      unsigned int index)
+{
+	bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
+	struct vfio_pci_core_device *vdev = intr_ctx->priv;
+	int irq, ret;
+	u16 cmd;
+
+	/* Interrupt stays allocated, will be freed at MSI-X disable. */
+	irq = vfio_msi_alloc_irq(vdev, vector, msix);
+	if (irq < 0)
+		return irq;
+
+	/*
+	 * If the vector was previously allocated, refresh the on-device
+	 * message data before enabling in case it had been cleared or
+	 * corrupted (e.g. due to backdoor resets) since writing.
+	 */
+	cmd = vfio_pci_memory_lock_and_enable(vdev);
+	if (msix) {
+		struct msi_msg msg;
+
+		get_cached_msi_msg(irq, &msg);
+		pci_write_msi_msg(irq, &msg);
+	}
+
+	ret = request_irq(irq, vfio_msihandler, 0, ctx->name, ctx->trigger);
+	vfio_pci_memory_unlock_and_restore(vdev, cmd);
+
+	ctx->virq = irq;
+
+	return ret;
+}
+
+static char *vfio_msi_device_name(struct vfio_pci_intr_ctx *intr_ctx,
+				  unsigned int vector,
+				  unsigned int index)
 {
 	bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
 	struct vfio_pci_core_device *vdev = intr_ctx->priv;
 	struct pci_dev *pdev = vdev->pdev;
+
+	return kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
+			 msix ? "x" : "", vector, pci_name(pdev));
+}
+
+static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
+				      unsigned int vector, int fd,
+				      unsigned int index)
+{
 	struct vfio_pci_irq_ctx *ctx;
 	struct eventfd_ctx *trigger;
-	int irq = -EINVAL, ret;
-	u16 cmd;
+	int ret;
 
 	ctx = vfio_irq_ctx_get(intr_ctx, vector);
 
 	if (ctx && ctx->trigger) {
 		irq_bypass_unregister_producer(&ctx->producer);
-		irq = ctx->virq;
-		cmd = vfio_pci_memory_lock_and_enable(vdev);
-		free_irq(ctx->virq, ctx->trigger);
-		vfio_pci_memory_unlock_and_restore(vdev, cmd);
-		ctx->virq = 0;
-		/* Interrupt stays allocated, will be freed at MSI-X disable. */
+		vfio_msi_free_interrupt(intr_ctx, ctx, vector);
 		kfree(ctx->name);
 		ctx->name = NULL;
 		eventfd_ctx_put(ctx->trigger);
@@ -447,13 +500,6 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
 	if (fd < 0)
 		return 0;
 
-	if (irq == -EINVAL) {
-		/* Interrupt stays allocated, will be freed at MSI-X disable. */
-		irq = vfio_msi_alloc_irq(vdev, vector, msix);
-		if (irq < 0)
-			return irq;
-	}
-
 	/* Per-interrupt context remain allocated. */
 	if (!ctx) {
 		ctx = vfio_irq_ctx_alloc(intr_ctx, vector);
@@ -461,8 +507,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
 			return -ENOMEM;
 	}
 
-	ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
-			      msix ? "x" : "", vector, pci_name(pdev));
+	ctx->name = vfio_msi_device_name(intr_ctx, vector, index);
 	if (!ctx->name)
 		return -ENOMEM;
 
@@ -472,42 +517,27 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
 		goto out_free_name;
 	}
 
-	/*
-	 * If the vector was previously allocated, refresh the on-device
-	 * message data before enabling in case it had been cleared or
-	 * corrupted (e.g. due to backdoor resets) since writing.
-	 */
-	cmd = vfio_pci_memory_lock_and_enable(vdev);
-	if (msix) {
-		struct msi_msg msg;
-
-		get_cached_msi_msg(irq, &msg);
-		pci_write_msi_msg(irq, &msg);
-	}
+	ctx->trigger = trigger;
 
-	ret = request_irq(irq, vfio_msihandler, 0, ctx->name, trigger);
-	vfio_pci_memory_unlock_and_restore(vdev, cmd);
+	ret = vfio_msi_request_interrupt(intr_ctx, ctx, vector, index);
 	if (ret)
 		goto out_put_eventfd_ctx;
 
-	ctx->virq = irq;
-
 	ctx->producer.token = trigger;
 	ctx->producer.irq = ctx->virq;
 	ret = irq_bypass_register_producer(&ctx->producer);
 	if (unlikely(ret)) {
-		dev_info(&pdev->dev,
-		"irq bypass producer (token %p) registration fails: %d\n",
-		ctx->producer.token, ret);
+		pr_info("%s irq bypass producer (token %p) registration fails: %d\n",
+			ctx->name, ctx->producer.token, ret);
 
 		ctx->producer.token = NULL;
 	}
-	ctx->trigger = trigger;
 
 	return 0;
 
 out_put_eventfd_ctx:
-	eventfd_ctx_put(trigger);
+	eventfd_ctx_put(ctx->trigger);
+	ctx->trigger = NULL;
 out_free_name:
 	kfree(ctx->name);
 	ctx->name = NULL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 21/26] vfio/pci: Replace backend specific calls with callbacks
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (19 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 20/26] vfio/pci: Separate frontend and backend code during interrupt enable/disable Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 22/26] vfio/pci: Introduce backend specific context initializer Reinette Chatre
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

The backend specific code needed to manage the interrupts are
isolated into separate functions. With the backend specific
code isolated into functions, these functions
can be turned into callbacks for other backends to use.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- New patch

 drivers/vfio/pci/vfio_pci_intrs.c | 17 +++++++++++------
 include/linux/vfio_pci_core.h     | 15 +++++++++++++++
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index c1f65b8adfe2..1e6376b048de 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -490,7 +490,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
 
 	if (ctx && ctx->trigger) {
 		irq_bypass_unregister_producer(&ctx->producer);
-		vfio_msi_free_interrupt(intr_ctx, ctx, vector);
+		intr_ctx->ops->msi_free_interrupt(intr_ctx, ctx, vector);
 		kfree(ctx->name);
 		ctx->name = NULL;
 		eventfd_ctx_put(ctx->trigger);
@@ -507,7 +507,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
 			return -ENOMEM;
 	}
 
-	ctx->name = vfio_msi_device_name(intr_ctx, vector, index);
+	ctx->name = intr_ctx->ops->msi_device_name(intr_ctx, vector, index);
 	if (!ctx->name)
 		return -ENOMEM;
 
@@ -519,7 +519,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
 
 	ctx->trigger = trigger;
 
-	ret = vfio_msi_request_interrupt(intr_ctx, ctx, vector, index);
+	ret = intr_ctx->ops->msi_request_interrupt(intr_ctx, ctx, vector, index);
 	if (ret)
 		goto out_put_eventfd_ctx;
 
@@ -708,7 +708,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 	unsigned int i;
 
 	if (irq_is(intr_ctx, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
-		vfio_msi_disable(intr_ctx, index);
+		intr_ctx->ops->msi_disable(intr_ctx, index);
 		return 0;
 	}
 
@@ -723,13 +723,13 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
 			return vfio_msi_set_block(intr_ctx, start, count,
 						  fds, index);
 
-		ret = vfio_msi_enable(intr_ctx, start + count, index);
+		ret = intr_ctx->ops->msi_enable(intr_ctx, start + count, index);
 		if (ret)
 			return ret;
 
 		ret = vfio_msi_set_block(intr_ctx, start, count, fds, index);
 		if (ret)
-			vfio_msi_disable(intr_ctx, index);
+			intr_ctx->ops->msi_disable(intr_ctx, index);
 
 		return ret;
 	}
@@ -872,6 +872,11 @@ static struct vfio_pci_intr_ops vfio_pci_intr_ops = {
 	.set_msix_trigger = vfio_pci_set_msi_trigger,
 	.set_err_trigger = vfio_pci_set_err_trigger,
 	.set_req_trigger = vfio_pci_set_req_trigger,
+	.msi_enable = vfio_msi_enable,
+	.msi_disable = vfio_msi_disable,
+	.msi_request_interrupt = vfio_msi_request_interrupt,
+	.msi_free_interrupt = vfio_msi_free_interrupt,
+	.msi_device_name = vfio_msi_device_name,
 };
 
 void vfio_pci_init_intr_ctx(struct vfio_pci_core_device *vdev,
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 8d2fb51a2dcc..f0951084a26f 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -69,6 +69,8 @@ struct vfio_pci_intr_ctx {
 	int				irq_type;
 };
 
+struct vfio_pci_irq_ctx;
+
 struct vfio_pci_intr_ops {
 	int (*set_intx_mask)(struct vfio_pci_intr_ctx *intr_ctx,
 			     unsigned int index, unsigned int start,
@@ -91,6 +93,19 @@ struct vfio_pci_intr_ops {
 	int (*set_req_trigger)(struct vfio_pci_intr_ctx *intr_ctx,
 			       unsigned int index, unsigned int start,
 			       unsigned int count, uint32_t flags, void *data);
+	int (*msi_enable)(struct vfio_pci_intr_ctx *intr_ctx, int nvec,
+			  unsigned int index);
+	void (*msi_disable)(struct vfio_pci_intr_ctx *intr_ctx,
+			    unsigned int index);
+	int (*msi_request_interrupt)(struct vfio_pci_intr_ctx *intr_ctx,
+				     struct vfio_pci_irq_ctx *ctx,
+				     unsigned int vector,
+				     unsigned int index);
+	void (*msi_free_interrupt)(struct vfio_pci_intr_ctx *intr_ctx,
+				   struct vfio_pci_irq_ctx *ctx,
+				   unsigned int vector);
+	char *(*msi_device_name)(struct vfio_pci_intr_ctx *intr_ctx,
+				 unsigned int vector, unsigned int index);
 };
 
 struct vfio_pci_core_device {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 22/26] vfio/pci: Introduce backend specific context initializer
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (20 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 21/26] vfio/pci: Replace backend specific calls with callbacks Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 23/26] vfio/pci: Support emulated interrupts Reinette Chatre
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

The per-interrupt context may contain backend specific data.

Call a backend provided initializer on per-interrupt context
creation.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- New patch

 drivers/vfio/pci/vfio_pci_intrs.c | 8 ++++++++
 include/linux/vfio_pci_core.h     | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 1e6376b048de..8c86f2d6229f 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -73,6 +73,14 @@ vfio_irq_ctx_alloc(struct vfio_pci_intr_ctx *intr_ctx, unsigned long index)
 	if (!ctx)
 		return NULL;
 
+	if (intr_ctx->ops->init_irq_ctx) {
+		ret = intr_ctx->ops->init_irq_ctx(intr_ctx, ctx);
+		if (ret < 0) {
+			kfree(ctx);
+			return NULL;
+		}
+	}
+
 	ret = xa_insert(&intr_ctx->ctx, index, ctx, GFP_KERNEL_ACCOUNT);
 	if (ret) {
 		kfree(ctx);
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index f0951084a26f..d5140a732741 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -106,6 +106,8 @@ struct vfio_pci_intr_ops {
 				   unsigned int vector);
 	char *(*msi_device_name)(struct vfio_pci_intr_ctx *intr_ctx,
 				 unsigned int vector, unsigned int index);
+	int (*init_irq_ctx)(struct vfio_pci_intr_ctx *intr_ctx,
+			    struct vfio_pci_irq_ctx *ctx);
 };
 
 struct vfio_pci_core_device {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 23/26] vfio/pci: Support emulated interrupts
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (21 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 22/26] vfio/pci: Introduce backend specific context initializer Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 24/26] vfio/pci: Add core IMS support Reinette Chatre
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

Access from a guest to a virtual device may be either 'direct-path',
where the guest interacts directly with the underlying hardware,
or 'intercepted path' where the virtual device emulates operations.

Support emulated interrupts that can be used to handle 'intercepted
path' operations. For example, a virtual device may use 'intercepted
path' for configuration. Doing so, configuration requests intercepted
by the virtual device driver are handled within the virtual device
driver with completion signaled to the guest without interacting with
the underlying hardware.

Add vfio_pci_set_emulated() and vfio_pci_send_signal() to the
VFIO PCI API. vfio_pci_set_emulated() configures a range of interrupts
to be emulated.

Any range of interrupts can be configured as emulated as long as no
interrupt has previously been allocated at that vector. The virtual
device driver uses vfio_pci_send_signal() to trigger interrupts in
the guest.

Originally-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- Remove the backend "supports_emulated" flag. All backends now support
  emulated interrupts.
- Move emulated interrupt enabling from IMS backend to frontend.

 drivers/vfio/pci/vfio_pci_intrs.c | 87 ++++++++++++++++++++++++++++++-
 include/linux/vfio_pci_core.h     |  3 ++
 2 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 8c86f2d6229f..6e34b8d8c216 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -23,6 +23,7 @@
 #include "vfio_pci_priv.h"
 
 struct vfio_pci_irq_ctx {
+	bool			emulated:1;
 	struct eventfd_ctx	*trigger;
 	struct virqfd		*unmask;
 	struct virqfd		*mask;
@@ -497,8 +498,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
 	ctx = vfio_irq_ctx_get(intr_ctx, vector);
 
 	if (ctx && ctx->trigger) {
-		irq_bypass_unregister_producer(&ctx->producer);
-		intr_ctx->ops->msi_free_interrupt(intr_ctx, ctx, vector);
+		if (!ctx->emulated) {
+			irq_bypass_unregister_producer(&ctx->producer);
+			intr_ctx->ops->msi_free_interrupt(intr_ctx, ctx, vector);
+		}
 		kfree(ctx->name);
 		ctx->name = NULL;
 		eventfd_ctx_put(ctx->trigger);
@@ -527,6 +530,9 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
 
 	ctx->trigger = trigger;
 
+	if (ctx->emulated)
+		return 0;
+
 	ret = intr_ctx->ops->msi_request_interrupt(intr_ctx, ctx, vector, index);
 	if (ret)
 		goto out_put_eventfd_ctx;
@@ -902,6 +908,83 @@ void vfio_pci_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)
 }
 EXPORT_SYMBOL_GPL(vfio_pci_release_intr_ctx);
 
+/*
+ * vfio_pci_send_signal() - Send signal to the eventfd.
+ * @intr_ctx:	Interrupt context.
+ * @vector:	Vector for which interrupt will be signaled.
+ *
+ * Trigger signal to guest for emulated interrupts.
+ */
+void vfio_pci_send_signal(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector)
+{
+	struct vfio_pci_irq_ctx *ctx;
+
+	mutex_lock(&intr_ctx->igate);
+
+	ctx = vfio_irq_ctx_get(intr_ctx, vector);
+
+	if (WARN_ON_ONCE(!ctx || !ctx->emulated || !ctx->trigger))
+		goto out_unlock;
+
+	eventfd_signal(ctx->trigger, 1);
+
+out_unlock:
+	mutex_unlock(&intr_ctx->igate);
+}
+EXPORT_SYMBOL_GPL(vfio_pci_send_signal);
+
+/*
+ * vfio_pci_set_emulated() - Set range of interrupts that will be emulated.
+ * @intr_ctx:	Interrupt context.
+ * @start:	First emulated interrupt vector.
+ * @count:	Number of emulated interrupts starting from @start.
+ *
+ * Emulated interrupts will not be backed by hardware interrupts but
+ * instead triggered by virtual device driver.
+ *
+ * Return: error code on failure (-EBUSY if the vector is not available,
+ * -ENOMEM on allocation failure), 0 on success. No partial success, on
+ * success entire range was set as emulated, on failure no interrupt in
+ * range was set as emulated.
+ */
+int vfio_pci_set_emulated(struct vfio_pci_intr_ctx *intr_ctx,
+			  unsigned int start, unsigned int count)
+{
+	struct vfio_pci_irq_ctx *ctx;
+	unsigned long i, j;
+	int ret = -EINVAL;
+
+	mutex_lock(&intr_ctx->igate);
+
+	for (i = start; i < start + count; i++) {
+		ctx = kzalloc(sizeof(*ctx), GFP_KERNEL_ACCOUNT);
+		if (!ctx) {
+			ret = -ENOMEM;
+			goto out_err;
+		}
+		ctx->emulated = true;
+		ret = xa_insert(&intr_ctx->ctx, i, ctx, GFP_KERNEL_ACCOUNT);
+		if (ret) {
+			kfree(ctx);
+			goto out_err;
+		}
+	}
+
+	mutex_unlock(&intr_ctx->igate);
+	return 0;
+
+out_err:
+	for (j = start; j < i; j++) {
+		ctx = vfio_irq_ctx_get(intr_ctx, j);
+		vfio_irq_ctx_free(intr_ctx, ctx, j);
+	}
+
+	mutex_unlock(&intr_ctx->igate);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_set_emulated);
+
 int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 			    unsigned int index, unsigned int start,
 			    unsigned int count, void *data)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index d5140a732741..4fe0df25162f 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -178,6 +178,9 @@ void vfio_pci_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx);
 int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 			    unsigned int index, unsigned int start,
 			    unsigned int count, void *data);
+void vfio_pci_send_signal(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector);
+int vfio_pci_set_emulated(struct vfio_pci_intr_ctx *intr_ctx,
+			  unsigned int start, unsigned int count);
 int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
 				void __user *arg, size_t argsz);
 ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 24/26] vfio/pci: Add core IMS support
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (22 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 23/26] vfio/pci: Support emulated interrupts Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 25/26] vfio/pci: Add accessor for IMS index Reinette Chatre
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

Add a new interrupt management backend enabling a guest MSI-X
interrupt to be backed by an IMS interrupt on the host.

An IMS interrupt is allocated via pci_ims_alloc_irq() and requires
an implementation specific cookie that is opaque to the IMS backend.
This can be a PASID, queue ID, pointer etc. During initialization
the IMS backend learns which PCI device to operate on (and thus which
interrupt domain to allocate from) and what the default cookie should
be for any new interrupt allocation.

A virtual device driver starts by initializing the backend
using new vfio_pci_ims_init_intr_ctx(), cleanup using new
vfio_pci_ims_release_intr_ctx(). Once initialized the virtual
device driver can call vfio_pci_set_irqs_ioctl() to handle the
VFIO_DEVICE_SET_IRQS ioctl() after it has validated the parameters
to be appropriate for the particular device.

To support the IMS backend the core utilities need to be aware
which interrupt context it interacts with. New ims_backed_irq
enables this and is false for the PCI passthrough backend and
true for the IMS backend.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V2:
- Improve changelog.
- Refactored implementation to use new callbacks for interrupt
  enable/disable and allocate/free to eliminate code duplication. (Kevin)
- Make vfio_pci_ims_intr_ops static.

 drivers/vfio/pci/vfio_pci_intrs.c | 178 ++++++++++++++++++++++++++++++
 include/linux/vfio_pci_core.h     |   7 ++
 2 files changed, 185 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 6e34b8d8c216..b318a3f671e8 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -22,6 +22,21 @@
 
 #include "vfio_pci_priv.h"
 
+/*
+ * Interrupt Message Store (IMS) private interrupt context data
+ * @vdev:		Virtual device. Used for name of device in
+ *			request_irq().
+ * @pdev:		PCI device owning the IMS domain from where
+ *			interrupts are allocated.
+ * @default_cookie:	Default cookie used for IMS interrupts without unique
+ *			cookie.
+ */
+struct vfio_pci_ims {
+	struct vfio_device		*vdev;
+	struct pci_dev			*pdev;
+	union msi_instance_cookie	default_cookie;
+};
+
 struct vfio_pci_irq_ctx {
 	bool			emulated:1;
 	struct eventfd_ctx	*trigger;
@@ -31,6 +46,8 @@ struct vfio_pci_irq_ctx {
 	bool			masked;
 	struct irq_bypass_producer	producer;
 	int			virq;
+	int			ims_id;
+	union msi_instance_cookie	icookie;
 };
 
 static bool irq_is(struct vfio_pci_intr_ctx *intr_ctx, int type)
@@ -899,6 +916,7 @@ void vfio_pci_init_intr_ctx(struct vfio_pci_core_device *vdev,
 	_vfio_pci_init_intr_ctx(intr_ctx);
 	intr_ctx->ops = &vfio_pci_intr_ops;
 	intr_ctx->priv = vdev;
+	intr_ctx->ims_backed_irq = false;
 }
 EXPORT_SYMBOL_GPL(vfio_pci_init_intr_ctx);
 
@@ -985,6 +1003,166 @@ int vfio_pci_set_emulated(struct vfio_pci_intr_ctx *intr_ctx,
 }
 EXPORT_SYMBOL_GPL(vfio_pci_set_emulated);
 
+/* Guest MSI-X interrupts backed by IMS host interrupts */
+
+/*
+ * Free the IMS interrupt associated with @ctx.
+ *
+ * For an IMS interrupt the interrupt is freed from the underlying
+ * PCI device's IMS domain.
+ */
+static void vfio_pci_ims_irq_free(struct vfio_pci_intr_ctx *intr_ctx,
+				  struct vfio_pci_irq_ctx *ctx)
+{
+	struct vfio_pci_ims *ims = intr_ctx->priv;
+	struct msi_map irq_map = {};
+
+	irq_map.index = ctx->ims_id;
+	irq_map.virq = ctx->virq;
+	pci_ims_free_irq(ims->pdev, irq_map);
+	ctx->ims_id = -EINVAL;
+	ctx->virq = 0;
+}
+
+/*
+ * Allocate a host IMS interrupt for @ctx.
+ *
+ * For an IMS interrupt the interrupt is allocated from the underlying
+ * PCI device's IMS domain.
+ */
+static int vfio_pci_ims_irq_alloc(struct vfio_pci_intr_ctx *intr_ctx,
+				  struct vfio_pci_irq_ctx *ctx)
+{
+	struct vfio_pci_ims *ims = intr_ctx->priv;
+	struct msi_map irq_map = {};
+
+	irq_map = pci_ims_alloc_irq(ims->pdev, &ctx->icookie, NULL);
+	if (irq_map.index < 0)
+		return irq_map.index;
+
+	ctx->ims_id = irq_map.index;
+	ctx->virq = irq_map.virq;
+
+	return 0;
+}
+
+static void vfio_ims_free_interrupt(struct vfio_pci_intr_ctx *intr_ctx,
+				    struct vfio_pci_irq_ctx *ctx,
+				    unsigned int vector)
+{
+	free_irq(ctx->virq, ctx->trigger);
+	vfio_pci_ims_irq_free(intr_ctx, ctx);
+}
+
+static int vfio_ims_request_interrupt(struct vfio_pci_intr_ctx *intr_ctx,
+				      struct vfio_pci_irq_ctx *ctx,
+				      unsigned int vector,
+				      unsigned int index)
+{
+	int ret;
+
+	ret = vfio_pci_ims_irq_alloc(intr_ctx, ctx);
+	if (ret < 0)
+		return ret;
+
+	ret = request_irq(ctx->virq, vfio_msihandler, 0, ctx->name,
+			  ctx->trigger);
+	if (ret < 0) {
+		vfio_pci_ims_irq_free(intr_ctx, ctx);
+		return ret;
+	}
+
+	return 0;
+}
+
+static char *vfio_ims_device_name(struct vfio_pci_intr_ctx *intr_ctx,
+				  unsigned int vector,
+				  unsigned int index)
+{
+	struct vfio_pci_ims *ims = intr_ctx->priv;
+	struct device *dev = &ims->vdev->device;
+
+	return kasprintf(GFP_KERNEL, "vfio-ims[%d](%s)", vector, dev_name(dev));
+}
+
+static void vfio_ims_disable(struct vfio_pci_intr_ctx *intr_ctx,
+			     unsigned int index)
+{
+	struct vfio_pci_irq_ctx *ctx;
+	unsigned long i;
+
+	xa_for_each(&intr_ctx->ctx, i, ctx)
+		vfio_msi_set_vector_signal(intr_ctx, i, -1, index);
+}
+
+/*
+ * The virtual device driver is responsible for enabling IMS by creating
+ * the IMS domaim from where interrupts will be allocated dynamically.
+ * IMS thus has to be enabled by the time an ioctl() arrives.
+ */
+static int vfio_ims_enable(struct vfio_pci_intr_ctx *intr_ctx, int nvec,
+			   unsigned int index)
+{
+	return -EINVAL;
+}
+
+static int vfio_ims_init_irq_ctx(struct vfio_pci_intr_ctx *intr_ctx,
+				 struct vfio_pci_irq_ctx *ctx)
+{
+	struct vfio_pci_ims *ims = intr_ctx->priv;
+
+	ctx->icookie = ims->default_cookie;
+
+	return 0;
+}
+
+static struct vfio_pci_intr_ops vfio_pci_ims_intr_ops = {
+	.set_msix_trigger = vfio_pci_set_msi_trigger,
+	.set_req_trigger = vfio_pci_set_req_trigger,
+	.msi_enable = vfio_ims_enable,
+	.msi_disable = vfio_ims_disable,
+	.msi_request_interrupt = vfio_ims_request_interrupt,
+	.msi_free_interrupt = vfio_ims_free_interrupt,
+	.msi_device_name = vfio_ims_device_name,
+	.init_irq_ctx = vfio_ims_init_irq_ctx,
+};
+
+int vfio_pci_ims_init_intr_ctx(struct vfio_device *vdev,
+			       struct vfio_pci_intr_ctx *intr_ctx,
+			       struct pci_dev *pdev,
+			       union msi_instance_cookie *default_cookie)
+{
+	struct vfio_pci_ims *ims;
+
+	ims = kzalloc(sizeof(*ims), GFP_KERNEL_ACCOUNT);
+	if (!ims)
+		return -ENOMEM;
+
+	ims->pdev = pdev;
+	ims->default_cookie = *default_cookie;
+	ims->vdev = vdev;
+
+	_vfio_pci_init_intr_ctx(intr_ctx);
+
+	intr_ctx->ops = &vfio_pci_ims_intr_ops;
+	intr_ctx->priv = ims;
+	intr_ctx->ims_backed_irq = true;
+	intr_ctx->irq_type = VFIO_PCI_MSIX_IRQ_INDEX;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_ims_init_intr_ctx);
+
+void vfio_pci_ims_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)
+{
+	struct vfio_pci_ims *ims = intr_ctx->priv;
+
+	_vfio_pci_release_intr_ctx(intr_ctx);
+	kfree(ims);
+	intr_ctx->irq_type = VFIO_PCI_NUM_IRQS;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_ims_release_intr_ctx);
+
 int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 			    unsigned int index, unsigned int start,
 			    unsigned int count, void *data)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 4fe0df25162f..a3161af791f8 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -58,6 +58,7 @@ struct vfio_pci_region {
  * @req_trigger:	Eventfd associated with device request notification
  * @ctx:		Per-interrupt context indexed by vector
  * @irq_type:		Type of interrupt from guest perspective
+ * @ims_backed_irq:	Interrupts managed by IMS backend
  */
 struct vfio_pci_intr_ctx {
 	const struct vfio_pci_intr_ops	*ops;
@@ -67,6 +68,7 @@ struct vfio_pci_intr_ctx {
 	struct eventfd_ctx		*req_trigger;
 	struct xarray			ctx;
 	int				irq_type;
+	bool				ims_backed_irq:1;
 };
 
 struct vfio_pci_irq_ctx;
@@ -181,6 +183,11 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 void vfio_pci_send_signal(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector);
 int vfio_pci_set_emulated(struct vfio_pci_intr_ctx *intr_ctx,
 			  unsigned int start, unsigned int count);
+int vfio_pci_ims_init_intr_ctx(struct vfio_device *vdev,
+			       struct vfio_pci_intr_ctx *intr_ctx,
+			       struct pci_dev *pdev,
+			       union msi_instance_cookie *default_cookie);
+void vfio_pci_ims_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx);
 int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
 				void __user *arg, size_t argsz);
 ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 25/26] vfio/pci: Add accessor for IMS index
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (23 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 24/26] vfio/pci: Add core IMS support Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-27 17:00 ` [RFC PATCH V3 26/26] vfio/pci: Support IMS cookie modification Reinette Chatre
  2023-10-31  7:31 ` [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Tian, Kevin
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

A virtual device driver needs to facilitate translation between
the guest's MSI-X interrupt and the backing IMS interrupt with
which the physical device is programmed. For example, the
guest may need to obtain the IMS index from the virtual device driver
that it needs to program into descriptors submitted to the device
to ensure that the completion interrupts are generated correctly.

Introduce vfio_pci_ims_hwirq() to the IMS backend as a helper
that returns the IMS interrupt index backing a provided MSI-X
interrupt index belonging to a guest.

Originally-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
No changes since RFC V2.

 drivers/vfio/pci/vfio_pci_intrs.c | 25 +++++++++++++++++++++++++
 include/linux/vfio_pci_core.h     |  1 +
 2 files changed, 26 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index b318a3f671e8..32ebc8fec4c4 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -1163,6 +1163,31 @@ void vfio_pci_ims_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)
 }
 EXPORT_SYMBOL_GPL(vfio_pci_ims_release_intr_ctx);
 
+/*
+ * Return IMS index of IMS interrupt backing MSI-X interrupt @vector
+ */
+int vfio_pci_ims_hwirq(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector)
+{
+	struct vfio_pci_irq_ctx *ctx;
+	int id = -EINVAL;
+
+	mutex_lock(&intr_ctx->igate);
+
+	if (!intr_ctx->ims_backed_irq)
+		goto out_unlock;
+
+	ctx = vfio_irq_ctx_get(intr_ctx, vector);
+	if (!ctx || ctx->emulated)
+		goto out_unlock;
+
+	id = ctx->ims_id;
+
+out_unlock:
+	mutex_unlock(&intr_ctx->igate);
+	return id;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_ims_hwirq);
+
 int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 			    unsigned int index, unsigned int start,
 			    unsigned int count, void *data)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index a3161af791f8..dbc77839ef26 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -180,6 +180,7 @@ void vfio_pci_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx);
 int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 			    unsigned int index, unsigned int start,
 			    unsigned int count, void *data);
+int vfio_pci_ims_hwirq(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector);
 void vfio_pci_send_signal(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector);
 int vfio_pci_set_emulated(struct vfio_pci_intr_ctx *intr_ctx,
 			  unsigned int start, unsigned int count);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V3 26/26] vfio/pci: Support IMS cookie modification
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (24 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 25/26] vfio/pci: Add accessor for IMS index Reinette Chatre
@ 2023-10-27 17:00 ` Reinette Chatre
  2023-10-31  7:31 ` [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Tian, Kevin
  26 siblings, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-10-27 17:00 UTC (permalink / raw)
  To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
	alex.williamson
  Cc: kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
	reinette.chatre, linux-kernel, patches

IMS supports an implementation specific cookie that is associated
with each interrupt. By default the IMS interrupt allocation backend
will assign a default cookie to a new interrupt instance.

Add support for a virtual device driver to set the interrupt instance
specific cookie. For example, the virtual device driver may intercept
the guest's MMIO write that configuresa a new PASID for a particular
interrupt. Calling vfio_pci_ims_set_cookie() with the new PASID value
as IMS cookie enables subsequent interrupts to be allocated with
accurate data.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
No changes since RFC V2.

 drivers/vfio/pci/vfio_pci_intrs.c | 53 +++++++++++++++++++++++++++++++
 include/linux/vfio_pci_core.h     |  3 ++
 2 files changed, 56 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 32ebc8fec4c4..5dc22dd9390e 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -1188,6 +1188,59 @@ int vfio_pci_ims_hwirq(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector)
 }
 EXPORT_SYMBOL_GPL(vfio_pci_ims_hwirq);
 
+/*
+ * vfio_pci_ims_set_cookie() - Set unique cookie for vector.
+ * @intr_ctx:	Interrupt context.
+ * @vector:	Vector.
+ * @icookie:	New cookie for @vector.
+ *
+ * When new IMS interrupt is allocated for @vector it will be
+ * assigned @icookie.
+ */
+int vfio_pci_ims_set_cookie(struct vfio_pci_intr_ctx *intr_ctx,
+			    unsigned int vector,
+			    union msi_instance_cookie *icookie)
+{
+	struct vfio_pci_irq_ctx *ctx;
+	int ret = -EINVAL;
+
+	mutex_lock(&intr_ctx->igate);
+
+	if (!intr_ctx->ims_backed_irq)
+		goto out_unlock;
+
+	ctx = vfio_irq_ctx_get(intr_ctx, vector);
+	if (ctx) {
+		if (WARN_ON_ONCE(ctx->emulated)) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+		ctx->icookie = *icookie;
+		ret = 0;
+		goto out_unlock;
+	}
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL_ACCOUNT);
+	if (!ctx) {
+		ret = -ENOMEM;
+		goto out_unlock;
+	}
+
+	ctx->icookie = *icookie;
+	ret = xa_insert(&intr_ctx->ctx, vector, ctx, GFP_KERNEL_ACCOUNT);
+	if (ret) {
+		kfree(ctx);
+		goto out_unlock;
+	}
+
+	ret = 0;
+
+out_unlock:
+	mutex_unlock(&intr_ctx->igate);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_ims_set_cookie);
+
 int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 			    unsigned int index, unsigned int start,
 			    unsigned int count, void *data)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index dbc77839ef26..b989b533e852 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -181,6 +181,9 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
 			    unsigned int index, unsigned int start,
 			    unsigned int count, void *data);
 int vfio_pci_ims_hwirq(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector);
+int vfio_pci_ims_set_cookie(struct vfio_pci_intr_ctx *intr_ctx,
+			    unsigned int vector,
+			    union msi_instance_cookie *icookie);
 void vfio_pci_send_signal(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector);
 int vfio_pci_set_emulated(struct vfio_pci_intr_ctx *intr_ctx,
 			  unsigned int start, unsigned int count);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* RE: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)
  2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
                   ` (25 preceding siblings ...)
  2023-10-27 17:00 ` [RFC PATCH V3 26/26] vfio/pci: Support IMS cookie modification Reinette Chatre
@ 2023-10-31  7:31 ` Tian, Kevin
  2023-11-01 18:07   ` Alex Williamson
  26 siblings, 1 reply; 40+ messages in thread
From: Tian, Kevin @ 2023-10-31  7:31 UTC (permalink / raw)
  To: Chatre, Reinette, jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, alex.williamson@redhat.com
  Cc: kvm@vger.kernel.org, Jiang, Dave, Liu, Jing2, Raj, Ashok,
	Yu, Fenghua, tom.zanussi@linux.intel.com,
	linux-kernel@vger.kernel.org, patches@lists.linux.dev

> From: Chatre, Reinette <reinette.chatre@intel.com>
> Sent: Saturday, October 28, 2023 1:01 AM
> 
> Changes since RFC V2:
> - RFC V2:
> https://lore.kernel.org/lkml/cover.1696609476.git.reinette.chatre@intel.com
> /
> - Still submiting this as RFC series. I believe that this now matches the
>   expectatations raised during earlier reviews. If you agree this is
>   the right direction then I can drop the RFC prefix on next submission.
>   If you do not agree then please do let me know where I missed
>   expectations.

Overall this matches my expectation. Let's wait for Alex/Jason's thoughts
before moving to next-level refinement.

btw as commented to last version, if this is the agreed direction probably
next version can be split into two parts: part1 contains the new framework
and converts intel vgpu driver to use it, then part2 for ims specific logic.

this way part1 can be verified and merged as a integral part. 😊

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)
  2023-10-31  7:31 ` [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Tian, Kevin
@ 2023-11-01 18:07   ` Alex Williamson
  2023-11-02  2:51     ` Tian, Kevin
  2023-11-02  3:14     ` Tian, Kevin
  0 siblings, 2 replies; 40+ messages in thread
From: Alex Williamson @ 2023-11-01 18:07 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Chatre, Reinette, jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, kvm@vger.kernel.org,
	Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
	tom.zanussi@linux.intel.com, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

On Tue, 31 Oct 2023 07:31:24 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Chatre, Reinette <reinette.chatre@intel.com>
> > Sent: Saturday, October 28, 2023 1:01 AM
> > 
> > Changes since RFC V2:
> > - RFC V2:
> > https://lore.kernel.org/lkml/cover.1696609476.git.reinette.chatre@intel.com
> > /
> > - Still submiting this as RFC series. I believe that this now matches the
> >   expectatations raised during earlier reviews. If you agree this is
> >   the right direction then I can drop the RFC prefix on next submission.
> >   If you do not agree then please do let me know where I missed
> >   expectations.  
> 
> Overall this matches my expectation. Let's wait for Alex/Jason's thoughts
> before moving to next-level refinement.

It feels like there's a lot of gratuitous change without any clear
purpose.  We create an ops structure so that a variant/mdev driver can
make use of the vfio-pci-core set_irqs ioctl piecemeal, but then the
two entry points that are actually implemented by the ims version are
the same as the core version, so the ops appear to be at the wrong
level.  The use of the priv pointer for the core callbacks looks like
it's just trying to justify the existence of the opaque pointer, it
should really just be using container_of().  We drill down into various
support functions for MSI (ie. enable, disable, request_interrupt,
free_interrupt, device name), but INTx is largely ignored, where we
haven't even kept is_intx() consistent with the other helpers.

Without an in-tree user of this code, we're just chopping up code for
no real purpose.  There's no reason that a variant driver requiring IMS
couldn't initially implement their own SET_IRQS ioctl.  Doing that
might lead to a more organic solution where we create interfaces where
they're actually needed.  The existing mdev sample drivers should also
be considered in any schemes to refactor the core code into a generic
SET_IRQS helper for devices exposing a vfio-pci API.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)
  2023-11-01 18:07   ` Alex Williamson
@ 2023-11-02  2:51     ` Tian, Kevin
  2023-11-02  3:14     ` Tian, Kevin
  1 sibling, 0 replies; 40+ messages in thread
From: Tian, Kevin @ 2023-11-02  2:51 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Chatre, Reinette, jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, kvm@vger.kernel.org,
	Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
	tom.zanussi@linux.intel.com, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Thursday, November 2, 2023 2:07 AM
> 
> On Tue, 31 Oct 2023 07:31:24 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Chatre, Reinette <reinette.chatre@intel.com>
> > > Sent: Saturday, October 28, 2023 1:01 AM
> > >
> > > Changes since RFC V2:
> > > - RFC V2:
> > >
> https://lore.kernel.org/lkml/cover.1696609476.git.reinette.chatre@intel.com
> > > /
> > > - Still submiting this as RFC series. I believe that this now matches the
> > >   expectatations raised during earlier reviews. If you agree this is
> > >   the right direction then I can drop the RFC prefix on next submission.
> > >   If you do not agree then please do let me know where I missed
> > >   expectations.
> >
> > Overall this matches my expectation. Let's wait for Alex/Jason's thoughts
> > before moving to next-level refinement.
> 
> It feels like there's a lot of gratuitous change without any clear
> purpose.  We create an ops structure so that a variant/mdev driver can
> make use of the vfio-pci-core set_irqs ioctl piecemeal, but then the
> two entry points that are actually implemented by the ims version are
> the same as the core version, so the ops appear to be at the wrong
> level.  The use of the priv pointer for the core callbacks looks like
> it's just trying to justify the existence of the opaque pointer, it
> should really just be using container_of().  We drill down into various
> support functions for MSI (ie. enable, disable, request_interrupt,
> free_interrupt, device name), but INTx is largely ignored, where we
> haven't even kept is_intx() consistent with the other helpers.

All above are good points. The main interest of this series is to share
MSI frontend interface with various backends (PCI MSI/X, IMS, and
purely emulated). From this angle the current ops abstraction does
sound to sit in a wrong level. But if counting your suggestion to also
refactor mdev sample driver (e.g. mtty emulates INTx) then there
might be a different outcome.

> 
> Without an in-tree user of this code, we're just chopping up code for
> no real purpose.  There's no reason that a variant driver requiring IMS
> couldn't initially implement their own SET_IRQS ioctl.  Doing that

this is an interesting idea. We haven't seen a real usage which wants
such MSI emulation on IMS for variant drivers. but if the code is
simple enough to demonstrate the 1st user of IMS it might not be
a bad choice. There are additional trap-emulation required in the
device MMIO bar (mostly copying MSI permission entry which contains
PASID info to the corresponding IMS entry). At a glance that area
is 4k-aligned so should be doable.

let's explore more into this option.

> might lead to a more organic solution where we create interfaces where
> they're actually needed.  The existing mdev sample drivers should also
> be considered in any schemes to refactor the core code into a generic
> SET_IRQS helper for devices exposing a vfio-pci API.  Thanks,
> 

In this case we'll have mtty to demonstrate an emulated INTx backend
and intel vgpu to contain an emulated MSI backend.

and moving forward all drivers with a vfio-pci API should share a same
frontend interface.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)
  2023-11-01 18:07   ` Alex Williamson
  2023-11-02  2:51     ` Tian, Kevin
@ 2023-11-02  3:14     ` Tian, Kevin
  2023-11-02 21:13       ` Alex Williamson
  1 sibling, 1 reply; 40+ messages in thread
From: Tian, Kevin @ 2023-11-02  3:14 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Chatre, Reinette, jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, kvm@vger.kernel.org,
	Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
	tom.zanussi@linux.intel.com, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

> From: Tian, Kevin
> Sent: Thursday, November 2, 2023 10:52 AM
> 
> >
> > Without an in-tree user of this code, we're just chopping up code for
> > no real purpose.  There's no reason that a variant driver requiring IMS
> > couldn't initially implement their own SET_IRQS ioctl.  Doing that
> 
> this is an interesting idea. We haven't seen a real usage which wants
> such MSI emulation on IMS for variant drivers. but if the code is
> simple enough to demonstrate the 1st user of IMS it might not be
> a bad choice. There are additional trap-emulation required in the
> device MMIO bar (mostly copying MSI permission entry which contains
> PASID info to the corresponding IMS entry). At a glance that area
> is 4k-aligned so should be doable.
> 

misread the spec. the MSI-X permission table which provides
auxiliary data to MSI-X table is not 4k-aligned. It sits in the 1st
4k page together with many other registers. emulation of them
could be simple with a native read/write handler but not sure
whether any of them may sit in a hot path to affect perf due to
trap...

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)
  2023-11-02  3:14     ` Tian, Kevin
@ 2023-11-02 21:13       ` Alex Williamson
  2023-11-03  7:23         ` Tian, Kevin
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Williamson @ 2023-11-02 21:13 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Chatre, Reinette, jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, kvm@vger.kernel.org,
	Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
	tom.zanussi@linux.intel.com, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

On Thu, 2 Nov 2023 03:14:09 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Tian, Kevin
> > Sent: Thursday, November 2, 2023 10:52 AM
> >   
> > >
> > > Without an in-tree user of this code, we're just chopping up code for
> > > no real purpose.  There's no reason that a variant driver requiring IMS
> > > couldn't initially implement their own SET_IRQS ioctl.  Doing that  
> > 
> > this is an interesting idea. We haven't seen a real usage which wants
> > such MSI emulation on IMS for variant drivers. but if the code is
> > simple enough to demonstrate the 1st user of IMS it might not be
> > a bad choice. There are additional trap-emulation required in the
> > device MMIO bar (mostly copying MSI permission entry which contains
> > PASID info to the corresponding IMS entry). At a glance that area
> > is 4k-aligned so should be doable.
> >   
> 
> misread the spec. the MSI-X permission table which provides
> auxiliary data to MSI-X table is not 4k-aligned. It sits in the 1st
> 4k page together with many other registers. emulation of them
> could be simple with a native read/write handler but not sure
> whether any of them may sit in a hot path to affect perf due to
> trap...

I'm not sure if you're referring to a specific device spec or the PCI
spec, but the PCI spec has long included an implementation note
suggesting alignment of the MSI-X vector table and pba and separation
from CSRs, and I see this is now even more strongly worded in the 6.0
spec.

Note though that for QEMU, these are emulated in the VMM and not
written through to the device.  The result of writes to the vector
table in the VMM are translated to vector use/unuse operations, which
we see at the kernel level through SET_IRQS ioctl calls.  Are you
expecting to get PASID information written by the guest through the
emulated vector table?  That would entail something more than a simple
IMS backend to MSI-X frontend.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)
  2023-11-02 21:13       ` Alex Williamson
@ 2023-11-03  7:23         ` Tian, Kevin
  2023-11-03 15:51           ` Alex Williamson
  0 siblings, 1 reply; 40+ messages in thread
From: Tian, Kevin @ 2023-11-03  7:23 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Chatre, Reinette, jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, kvm@vger.kernel.org,
	Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
	tom.zanussi@linux.intel.com, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, November 3, 2023 5:14 AM
> 
> On Thu, 2 Nov 2023 03:14:09 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Tian, Kevin
> > > Sent: Thursday, November 2, 2023 10:52 AM
> > >
> > > >
> > > > Without an in-tree user of this code, we're just chopping up code for
> > > > no real purpose.  There's no reason that a variant driver requiring IMS
> > > > couldn't initially implement their own SET_IRQS ioctl.  Doing that
> > >
> > > this is an interesting idea. We haven't seen a real usage which wants
> > > such MSI emulation on IMS for variant drivers. but if the code is
> > > simple enough to demonstrate the 1st user of IMS it might not be
> > > a bad choice. There are additional trap-emulation required in the
> > > device MMIO bar (mostly copying MSI permission entry which contains
> > > PASID info to the corresponding IMS entry). At a glance that area
> > > is 4k-aligned so should be doable.
> > >
> >
> > misread the spec. the MSI-X permission table which provides
> > auxiliary data to MSI-X table is not 4k-aligned. It sits in the 1st
> > 4k page together with many other registers. emulation of them
> > could be simple with a native read/write handler but not sure
> > whether any of them may sit in a hot path to affect perf due to
> > trap...
> 
> I'm not sure if you're referring to a specific device spec or the PCI
> spec, but the PCI spec has long included an implementation note
> suggesting alignment of the MSI-X vector table and pba and separation
> from CSRs, and I see this is now even more strongly worded in the 6.0
> spec.
> 
> Note though that for QEMU, these are emulated in the VMM and not
> written through to the device.  The result of writes to the vector
> table in the VMM are translated to vector use/unuse operations, which
> we see at the kernel level through SET_IRQS ioctl calls.  Are you
> expecting to get PASID information written by the guest through the
> emulated vector table?  That would entail something more than a simple
> IMS backend to MSI-X frontend.  Thanks,
> 

I was referring to IDXD device spec. Basically it allows a process to
submit a descriptor which contains a completion interrupt handle.
The handle is the index of a MSI-X entry or IMS entry allocated by
the idxd driver. To mark the association between application and
related handles the driver records the PASID of the application
in an auxiliary structure for MSI-X (called MSI-X permission table)
or directly in the IMS entry. This additional info includes whether
an MSI-X/IMS entry has PASID enabled and if yes what is the PASID
value to be checked against the descriptor.

As you said virtualizing MSI-X table itself is via SET_IRQS and it's
4k aligned. Then we also need to capture guest updates to the MSI-X
permission table and copy the PASID information into the
corresponding IMS entry when using the IMS backend. It's MSI-X
permission table not 4k aligned then trapping it will affect adjacent
registers.

My quick check in idxd spec doesn't reveal an real impact in perf
critical path. Most registers are configuration/control registers
accessed at driver init time and a few interrupt registers related
to errors or administrative purpose.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)
  2023-11-03  7:23         ` Tian, Kevin
@ 2023-11-03 15:51           ` Alex Williamson
  2023-11-07  8:29             ` Tian, Kevin
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Williamson @ 2023-11-03 15:51 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Chatre, Reinette, jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, kvm@vger.kernel.org,
	Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
	tom.zanussi@linux.intel.com, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

On Fri, 3 Nov 2023 07:23:13 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Friday, November 3, 2023 5:14 AM
> > 
> > On Thu, 2 Nov 2023 03:14:09 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >   
> > > > From: Tian, Kevin
> > > > Sent: Thursday, November 2, 2023 10:52 AM
> > > >  
> > > > >
> > > > > Without an in-tree user of this code, we're just chopping up code for
> > > > > no real purpose.  There's no reason that a variant driver requiring IMS
> > > > > couldn't initially implement their own SET_IRQS ioctl.  Doing that  
> > > >
> > > > this is an interesting idea. We haven't seen a real usage which wants
> > > > such MSI emulation on IMS for variant drivers. but if the code is
> > > > simple enough to demonstrate the 1st user of IMS it might not be
> > > > a bad choice. There are additional trap-emulation required in the
> > > > device MMIO bar (mostly copying MSI permission entry which contains
> > > > PASID info to the corresponding IMS entry). At a glance that area
> > > > is 4k-aligned so should be doable.
> > > >  
> > >
> > > misread the spec. the MSI-X permission table which provides
> > > auxiliary data to MSI-X table is not 4k-aligned. It sits in the 1st
> > > 4k page together with many other registers. emulation of them
> > > could be simple with a native read/write handler but not sure
> > > whether any of them may sit in a hot path to affect perf due to
> > > trap...  
> > 
> > I'm not sure if you're referring to a specific device spec or the PCI
> > spec, but the PCI spec has long included an implementation note
> > suggesting alignment of the MSI-X vector table and pba and separation
> > from CSRs, and I see this is now even more strongly worded in the 6.0
> > spec.
> > 
> > Note though that for QEMU, these are emulated in the VMM and not
> > written through to the device.  The result of writes to the vector
> > table in the VMM are translated to vector use/unuse operations, which
> > we see at the kernel level through SET_IRQS ioctl calls.  Are you
> > expecting to get PASID information written by the guest through the
> > emulated vector table?  That would entail something more than a simple
> > IMS backend to MSI-X frontend.  Thanks,
> >   
> 
> I was referring to IDXD device spec. Basically it allows a process to
> submit a descriptor which contains a completion interrupt handle.
> The handle is the index of a MSI-X entry or IMS entry allocated by
> the idxd driver. To mark the association between application and
> related handles the driver records the PASID of the application
> in an auxiliary structure for MSI-X (called MSI-X permission table)
> or directly in the IMS entry. This additional info includes whether
> an MSI-X/IMS entry has PASID enabled and if yes what is the PASID
> value to be checked against the descriptor.
> 
> As you said virtualizing MSI-X table itself is via SET_IRQS and it's
> 4k aligned. Then we also need to capture guest updates to the MSI-X
> permission table and copy the PASID information into the
> corresponding IMS entry when using the IMS backend. It's MSI-X
> permission table not 4k aligned then trapping it will affect adjacent
> registers.
> 
> My quick check in idxd spec doesn't reveal an real impact in perf
> critical path. Most registers are configuration/control registers
> accessed at driver init time and a few interrupt registers related
> to errors or administrative purpose.

Right, it looks like you'll need to trap writes to the MSI-X
Permissions Table via a sparse mmap capability to avoid assumptions
whether it lives on the same page as the MSI-X vector table or PBA.
Ideally the hardware folks have considered this to avoid any conflict
with latency sensitive registers.

The variant driver would use this for collecting the meta data relative
to the IMS interrupt, but this is all tangential to whether we
preemptively slice up vfio-pci-core's SET_IRQS ioctl or the iDXD driver
implements its own.

And just to be clear, I don't expect the iDXD variant driver to go to
extraordinary lengths to duplicate the core ioctl, we can certainly
refactor and export things where it makes sense, but I think it likely
makes more sense for the variant driver to implement the shell of the
ioctl rather than trying to multiplex the entire core ioctl with an ops
structure that's so intimately tied to the core implementation and
focused only on the MSI-X code paths.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)
  2023-11-03 15:51           ` Alex Williamson
@ 2023-11-07  8:29             ` Tian, Kevin
  2023-11-07 19:48               ` Reinette Chatre
  0 siblings, 1 reply; 40+ messages in thread
From: Tian, Kevin @ 2023-11-07  8:29 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Chatre, Reinette, jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, kvm@vger.kernel.org,
	Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
	tom.zanussi@linux.intel.com, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, November 3, 2023 11:51 PM
> 
> On Fri, 3 Nov 2023 07:23:13 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Friday, November 3, 2023 5:14 AM
> > >
> > > On Thu, 2 Nov 2023 03:14:09 +0000
> > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > >
> > > > > From: Tian, Kevin
> > > > > Sent: Thursday, November 2, 2023 10:52 AM
> > > > >
> > > > > >
> > > > > > Without an in-tree user of this code, we're just chopping up code for
> > > > > > no real purpose.  There's no reason that a variant driver requiring
> IMS
> > > > > > couldn't initially implement their own SET_IRQS ioctl.  Doing that
> > > > >
> > > > > this is an interesting idea. We haven't seen a real usage which wants
> > > > > such MSI emulation on IMS for variant drivers. but if the code is
> > > > > simple enough to demonstrate the 1st user of IMS it might not be
> > > > > a bad choice. There are additional trap-emulation required in the
> > > > > device MMIO bar (mostly copying MSI permission entry which
> contains
> > > > > PASID info to the corresponding IMS entry). At a glance that area
> > > > > is 4k-aligned so should be doable.
> > > > >
> > > >
> > > > misread the spec. the MSI-X permission table which provides
> > > > auxiliary data to MSI-X table is not 4k-aligned. It sits in the 1st
> > > > 4k page together with many other registers. emulation of them
> > > > could be simple with a native read/write handler but not sure
> > > > whether any of them may sit in a hot path to affect perf due to
> > > > trap...
> > >
> > > I'm not sure if you're referring to a specific device spec or the PCI
> > > spec, but the PCI spec has long included an implementation note
> > > suggesting alignment of the MSI-X vector table and pba and separation
> > > from CSRs, and I see this is now even more strongly worded in the 6.0
> > > spec.
> > >
> > > Note though that for QEMU, these are emulated in the VMM and not
> > > written through to the device.  The result of writes to the vector
> > > table in the VMM are translated to vector use/unuse operations, which
> > > we see at the kernel level through SET_IRQS ioctl calls.  Are you
> > > expecting to get PASID information written by the guest through the
> > > emulated vector table?  That would entail something more than a simple
> > > IMS backend to MSI-X frontend.  Thanks,
> > >
> >
> > I was referring to IDXD device spec. Basically it allows a process to
> > submit a descriptor which contains a completion interrupt handle.
> > The handle is the index of a MSI-X entry or IMS entry allocated by
> > the idxd driver. To mark the association between application and
> > related handles the driver records the PASID of the application
> > in an auxiliary structure for MSI-X (called MSI-X permission table)
> > or directly in the IMS entry. This additional info includes whether
> > an MSI-X/IMS entry has PASID enabled and if yes what is the PASID
> > value to be checked against the descriptor.
> >
> > As you said virtualizing MSI-X table itself is via SET_IRQS and it's
> > 4k aligned. Then we also need to capture guest updates to the MSI-X
> > permission table and copy the PASID information into the
> > corresponding IMS entry when using the IMS backend. It's MSI-X
> > permission table not 4k aligned then trapping it will affect adjacent
> > registers.
> >
> > My quick check in idxd spec doesn't reveal an real impact in perf
> > critical path. Most registers are configuration/control registers
> > accessed at driver init time and a few interrupt registers related
> > to errors or administrative purpose.
> 
> Right, it looks like you'll need to trap writes to the MSI-X
> Permissions Table via a sparse mmap capability to avoid assumptions
> whether it lives on the same page as the MSI-X vector table or PBA.
> Ideally the hardware folks have considered this to avoid any conflict
> with latency sensitive registers.
> 
> The variant driver would use this for collecting the meta data relative
> to the IMS interrupt, but this is all tangential to whether we
> preemptively slice up vfio-pci-core's SET_IRQS ioctl or the iDXD driver
> implements its own.

Agree

> 
> And just to be clear, I don't expect the iDXD variant driver to go to
> extraordinary lengths to duplicate the core ioctl, we can certainly
> refactor and export things where it makes sense, but I think it likely
> makes more sense for the variant driver to implement the shell of the
> ioctl rather than trying to multiplex the entire core ioctl with an ops
> structure that's so intimately tied to the core implementation and
> focused only on the MSI-X code paths.  Thanks,
> 

btw I'll let Reinette to decide whether she wants to implement such
a variant driver or waits until idxd siov driver is ready to demonstrate
the usage. One concern with the variant driver approach is lacking
of a real-world usage (e.g. what IMS brings when guest only wants
MSI-X on an assigned PF). Though it may provide a shorter path
to enable the IMS backend support, also comes with the long-term
maintenance burden.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)
  2023-11-07  8:29             ` Tian, Kevin
@ 2023-11-07 19:48               ` Reinette Chatre
  2023-11-07 23:06                 ` Alex Williamson
  0 siblings, 1 reply; 40+ messages in thread
From: Reinette Chatre @ 2023-11-07 19:48 UTC (permalink / raw)
  To: Tian, Kevin, Alex Williamson
  Cc: jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, kvm@vger.kernel.org,
	Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
	tom.zanussi@linux.intel.com, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

Hi Alex and Kevin,

On 11/7/2023 12:29 AM, Tian, Kevin wrote:
>> From: Alex Williamson <alex.williamson@redhat.com>
>> Sent: Friday, November 3, 2023 11:51 PM
>>
>> On Fri, 3 Nov 2023 07:23:13 +0000
>> "Tian, Kevin" <kevin.tian@intel.com> wrote:
>>
>>>> From: Alex Williamson <alex.williamson@redhat.com>
>>>> Sent: Friday, November 3, 2023 5:14 AM
>>>>
>>>> On Thu, 2 Nov 2023 03:14:09 +0000
>>>> "Tian, Kevin" <kevin.tian@intel.com> wrote:
>>>>
>>>>>> From: Tian, Kevin
>>>>>> Sent: Thursday, November 2, 2023 10:52 AM
>>>>>>
>>>>>>>
>>>>>>> Without an in-tree user of this code, we're just chopping up code for
>>>>>>> no real purpose.  There's no reason that a variant driver requiring
>> IMS
>>>>>>> couldn't initially implement their own SET_IRQS ioctl.  Doing that
>>>>>>
>>>>>> this is an interesting idea. We haven't seen a real usage which wants
>>>>>> such MSI emulation on IMS for variant drivers. but if the code is
>>>>>> simple enough to demonstrate the 1st user of IMS it might not be
>>>>>> a bad choice. There are additional trap-emulation required in the
>>>>>> device MMIO bar (mostly copying MSI permission entry which
>> contains
>>>>>> PASID info to the corresponding IMS entry). At a glance that area
>>>>>> is 4k-aligned so should be doable.
>>>>>>
>>>>>
>>>>> misread the spec. the MSI-X permission table which provides
>>>>> auxiliary data to MSI-X table is not 4k-aligned. It sits in the 1st
>>>>> 4k page together with many other registers. emulation of them
>>>>> could be simple with a native read/write handler but not sure
>>>>> whether any of them may sit in a hot path to affect perf due to
>>>>> trap...
>>>>
>>>> I'm not sure if you're referring to a specific device spec or the PCI
>>>> spec, but the PCI spec has long included an implementation note
>>>> suggesting alignment of the MSI-X vector table and pba and separation
>>>> from CSRs, and I see this is now even more strongly worded in the 6.0
>>>> spec.
>>>>
>>>> Note though that for QEMU, these are emulated in the VMM and not
>>>> written through to the device.  The result of writes to the vector
>>>> table in the VMM are translated to vector use/unuse operations, which
>>>> we see at the kernel level through SET_IRQS ioctl calls.  Are you
>>>> expecting to get PASID information written by the guest through the
>>>> emulated vector table?  That would entail something more than a simple
>>>> IMS backend to MSI-X frontend.  Thanks,
>>>>
>>>
>>> I was referring to IDXD device spec. Basically it allows a process to
>>> submit a descriptor which contains a completion interrupt handle.
>>> The handle is the index of a MSI-X entry or IMS entry allocated by
>>> the idxd driver. To mark the association between application and
>>> related handles the driver records the PASID of the application
>>> in an auxiliary structure for MSI-X (called MSI-X permission table)
>>> or directly in the IMS entry. This additional info includes whether
>>> an MSI-X/IMS entry has PASID enabled and if yes what is the PASID
>>> value to be checked against the descriptor.
>>>
>>> As you said virtualizing MSI-X table itself is via SET_IRQS and it's
>>> 4k aligned. Then we also need to capture guest updates to the MSI-X
>>> permission table and copy the PASID information into the
>>> corresponding IMS entry when using the IMS backend. It's MSI-X
>>> permission table not 4k aligned then trapping it will affect adjacent
>>> registers.
>>>
>>> My quick check in idxd spec doesn't reveal an real impact in perf
>>> critical path. Most registers are configuration/control registers
>>> accessed at driver init time and a few interrupt registers related
>>> to errors or administrative purpose.
>>
>> Right, it looks like you'll need to trap writes to the MSI-X
>> Permissions Table via a sparse mmap capability to avoid assumptions
>> whether it lives on the same page as the MSI-X vector table or PBA.
>> Ideally the hardware folks have considered this to avoid any conflict
>> with latency sensitive registers.
>>
>> The variant driver would use this for collecting the meta data relative
>> to the IMS interrupt, but this is all tangential to whether we
>> preemptively slice up vfio-pci-core's SET_IRQS ioctl or the iDXD driver
>> implements its own.
> 
> Agree
> 
>>
>> And just to be clear, I don't expect the iDXD variant driver to go to
>> extraordinary lengths to duplicate the core ioctl, we can certainly
>> refactor and export things where it makes sense, but I think it likely
>> makes more sense for the variant driver to implement the shell of the
>> ioctl rather than trying to multiplex the entire core ioctl with an ops
>> structure that's so intimately tied to the core implementation and
>> focused only on the MSI-X code paths.  Thanks,
>>
> 
> btw I'll let Reinette to decide whether she wants to implement such
> a variant driver or waits until idxd siov driver is ready to demonstrate
> the usage. One concern with the variant driver approach is lacking
> of a real-world usage (e.g. what IMS brings when guest only wants
> MSI-X on an assigned PF). Though it may provide a shorter path
> to enable the IMS backend support, also comes with the long-term
> maintenance burden.

Thank you very much for your guidance and advice.

I'd be happy to implement what is required for this work. Unfortunately
I am not confident that I understand what is meant with "variant driver".

I initially understood "variant driver" to mean the planned IDXD virtual
device driver (the "IDXD VDCM" driver) that will assign IDXD resources
to guests with varying granularity and back the guest MSI-X interrupts
of these virtual devices with IMS interrupts on the host. From Kevin
I understand "variant driver" is a new sample driver for an IDXD
assigned PF, backing guest MSI-X interrupts with IMS interrupts on
the host.

The IDXD VDCM driver is in progress. If a new variant driver needs to be
created then I still need to do some research in how to accomplish it.
Even so, it is not clear to me who the users of this new driver would be.
If the requirement is to demonstrate this VFIO IMS usage then we could
perhaps wait until the IDXD VDCM driver is ready and thus not have to deal
with additional maintenance burden.

In the mean time there are items that I do understand better
and will work on right away:
- Do not have ops span the SET_IRQS ioctl()
- Use container_of() instead of opaque pointer
- Do not ignore INTx, consider the mdev sample driver when refactoring
  this code.
- Consider the Intel vgpu driver as user of new emulated interrupt
  interface.

Reinette

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)
  2023-11-07 19:48               ` Reinette Chatre
@ 2023-11-07 23:06                 ` Alex Williamson
  2023-11-08  2:49                   ` Jason Gunthorpe
  2023-11-08 16:52                   ` Reinette Chatre
  0 siblings, 2 replies; 40+ messages in thread
From: Alex Williamson @ 2023-11-07 23:06 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Tian, Kevin, jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, kvm@vger.kernel.org,
	Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
	tom.zanussi@linux.intel.com, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

On Tue, 7 Nov 2023 11:48:28 -0800
Reinette Chatre <reinette.chatre@intel.com> wrote:

> Hi Alex and Kevin,
> 
> On 11/7/2023 12:29 AM, Tian, Kevin wrote:
> >> From: Alex Williamson <alex.williamson@redhat.com>
> >> Sent: Friday, November 3, 2023 11:51 PM
> >>
> >> On Fri, 3 Nov 2023 07:23:13 +0000
> >> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >>  
> >>>> From: Alex Williamson <alex.williamson@redhat.com>
> >>>> Sent: Friday, November 3, 2023 5:14 AM
> >>>>
> >>>> On Thu, 2 Nov 2023 03:14:09 +0000
> >>>> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >>>>  
> >>>>>> From: Tian, Kevin
> >>>>>> Sent: Thursday, November 2, 2023 10:52 AM
> >>>>>>  
> >>>>>>>
> >>>>>>> Without an in-tree user of this code, we're just chopping up code for
> >>>>>>> no real purpose.  There's no reason that a variant driver requiring  
> >> IMS  
> >>>>>>> couldn't initially implement their own SET_IRQS ioctl.  Doing that  
> >>>>>>
> >>>>>> this is an interesting idea. We haven't seen a real usage which wants
> >>>>>> such MSI emulation on IMS for variant drivers. but if the code is
> >>>>>> simple enough to demonstrate the 1st user of IMS it might not be
> >>>>>> a bad choice. There are additional trap-emulation required in the
> >>>>>> device MMIO bar (mostly copying MSI permission entry which  
> >> contains  
> >>>>>> PASID info to the corresponding IMS entry). At a glance that area
> >>>>>> is 4k-aligned so should be doable.
> >>>>>>  
> >>>>>
> >>>>> misread the spec. the MSI-X permission table which provides
> >>>>> auxiliary data to MSI-X table is not 4k-aligned. It sits in the 1st
> >>>>> 4k page together with many other registers. emulation of them
> >>>>> could be simple with a native read/write handler but not sure
> >>>>> whether any of them may sit in a hot path to affect perf due to
> >>>>> trap...  
> >>>>
> >>>> I'm not sure if you're referring to a specific device spec or the PCI
> >>>> spec, but the PCI spec has long included an implementation note
> >>>> suggesting alignment of the MSI-X vector table and pba and separation
> >>>> from CSRs, and I see this is now even more strongly worded in the 6.0
> >>>> spec.
> >>>>
> >>>> Note though that for QEMU, these are emulated in the VMM and not
> >>>> written through to the device.  The result of writes to the vector
> >>>> table in the VMM are translated to vector use/unuse operations, which
> >>>> we see at the kernel level through SET_IRQS ioctl calls.  Are you
> >>>> expecting to get PASID information written by the guest through the
> >>>> emulated vector table?  That would entail something more than a simple
> >>>> IMS backend to MSI-X frontend.  Thanks,
> >>>>  
> >>>
> >>> I was referring to IDXD device spec. Basically it allows a process to
> >>> submit a descriptor which contains a completion interrupt handle.
> >>> The handle is the index of a MSI-X entry or IMS entry allocated by
> >>> the idxd driver. To mark the association between application and
> >>> related handles the driver records the PASID of the application
> >>> in an auxiliary structure for MSI-X (called MSI-X permission table)
> >>> or directly in the IMS entry. This additional info includes whether
> >>> an MSI-X/IMS entry has PASID enabled and if yes what is the PASID
> >>> value to be checked against the descriptor.
> >>>
> >>> As you said virtualizing MSI-X table itself is via SET_IRQS and it's
> >>> 4k aligned. Then we also need to capture guest updates to the MSI-X
> >>> permission table and copy the PASID information into the
> >>> corresponding IMS entry when using the IMS backend. It's MSI-X
> >>> permission table not 4k aligned then trapping it will affect adjacent
> >>> registers.
> >>>
> >>> My quick check in idxd spec doesn't reveal an real impact in perf
> >>> critical path. Most registers are configuration/control registers
> >>> accessed at driver init time and a few interrupt registers related
> >>> to errors or administrative purpose.  
> >>
> >> Right, it looks like you'll need to trap writes to the MSI-X
> >> Permissions Table via a sparse mmap capability to avoid assumptions
> >> whether it lives on the same page as the MSI-X vector table or PBA.
> >> Ideally the hardware folks have considered this to avoid any conflict
> >> with latency sensitive registers.
> >>
> >> The variant driver would use this for collecting the meta data relative
> >> to the IMS interrupt, but this is all tangential to whether we
> >> preemptively slice up vfio-pci-core's SET_IRQS ioctl or the iDXD driver
> >> implements its own.  
> > 
> > Agree
> >   
> >>
> >> And just to be clear, I don't expect the iDXD variant driver to go to
> >> extraordinary lengths to duplicate the core ioctl, we can certainly
> >> refactor and export things where it makes sense, but I think it likely
> >> makes more sense for the variant driver to implement the shell of the
> >> ioctl rather than trying to multiplex the entire core ioctl with an ops
> >> structure that's so intimately tied to the core implementation and
> >> focused only on the MSI-X code paths.  Thanks,
> >>  
> > 
> > btw I'll let Reinette to decide whether she wants to implement such
> > a variant driver or waits until idxd siov driver is ready to demonstrate
> > the usage. One concern with the variant driver approach is lacking
> > of a real-world usage (e.g. what IMS brings when guest only wants
> > MSI-X on an assigned PF). Though it may provide a shorter path
> > to enable the IMS backend support, also comes with the long-term
> > maintenance burden.  
> 
> Thank you very much for your guidance and advice.
> 
> I'd be happy to implement what is required for this work. Unfortunately
> I am not confident that I understand what is meant with "variant driver".
> 
> I initially understood "variant driver" to mean the planned IDXD virtual
> device driver (the "IDXD VDCM" driver) that will assign IDXD resources
> to guests with varying granularity and back the guest MSI-X interrupts
> of these virtual devices with IMS interrupts on the host. From Kevin
> I understand "variant driver" is a new sample driver for an IDXD
> assigned PF, backing guest MSI-X interrupts with IMS interrupts on
> the host.
> 
> The IDXD VDCM driver is in progress. If a new variant driver needs to be
> created then I still need to do some research in how to accomplish it.
> Even so, it is not clear to me who the users of this new driver would be.
> If the requirement is to demonstrate this VFIO IMS usage then we could
> perhaps wait until the IDXD VDCM driver is ready and thus not have to deal
> with additional maintenance burden.

A vfio-pci variant driver is specifically a driver that leverages
portions of vfio-pci-core for implementing vfio_device_ops and binds to
a PCI device.  It might actually be the wrong term here, but I jumped
to that since the series tries to generalize portions of one of the
vfio-pci-core code paths. You might very well be intending to use this
with something more like an mdev driver, which is fine.

That also sort of illustrates the point though that this series is
taking a pretty broad approach to slicing up vfio-pci-core's SET_IRQS
ioctl code path, enabling support for IMS backed interrupts, but in
effect complicating the whole thing without any actual consumer to
justify the complication.  Meanwhile I think the goal is to reduce
complication to a driver that doesn't exist yet.  So it currently seems
like a poor trade-off.

This driver that doesn't exist yet could implement its own SET_IRQS
ioctl that backs MSI-X with IMS as a starting point.  Presumably we
expect multiple drivers to require this behavior, so common code makes
sense, but the rest of us in the community can't really evaluate how
much it makes sense to slice the common code without seeing that
implementation and how it might leverage, if not directly use, the
existing core code.

The sample drivers come into play in that if we were to make the
vfio-pci-core SET_IRQS path usable in this generic way, then any driver
implementing SET_IRQS for a PCI device should be able to take advantage
of it, including those that already exist in samples/vfio-mdev/.  I
don't think anyone is requesting an iDXD sample driver, the real driver
should be sufficient.

> In the mean time there are items that I do understand better
> and will work on right away:
> - Do not have ops span the SET_IRQS ioctl()
> - Use container_of() instead of opaque pointer
> - Do not ignore INTx, consider the mdev sample driver when refactoring
>   this code.
> - Consider the Intel vgpu driver as user of new emulated interrupt
>   interface.

I think looking at kvmgt and the existing sample drivers to find
commonality that actually provides simplification would be a good
start, but I don't anticipate implementing IMS backed MSI-X in common
code unless we're at least imminently able to make use of it.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)
  2023-11-07 23:06                 ` Alex Williamson
@ 2023-11-08  2:49                   ` Jason Gunthorpe
  2023-11-08  9:16                     ` Tian, Kevin
  2023-11-08 16:52                   ` Reinette Chatre
  1 sibling, 1 reply; 40+ messages in thread
From: Jason Gunthorpe @ 2023-11-08  2:49 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Reinette Chatre, Tian, Kevin, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, kvm@vger.kernel.org,
	Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
	tom.zanussi@linux.intel.com, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

On Tue, Nov 07, 2023 at 04:06:41PM -0700, Alex Williamson wrote:

> A vfio-pci variant driver is specifically a driver that leverages
> portions of vfio-pci-core for implementing vfio_device_ops and binds to
> a PCI device.  It might actually be the wrong term here, but I jumped
> to that since the series tries to generalize portions of one of the
> vfio-pci-core code paths. You might very well be intending to use this
> with something more like an mdev driver, which is fine.

IDXD will be a SIOV device and we need to have a serious talk about
how SIOV device lifecycle will work..

> That also sort of illustrates the point though that this series is
> taking a pretty broad approach to slicing up vfio-pci-core's SET_IRQS
> ioctl code path, enabling support for IMS backed interrupts, but in
> effect complicating the whole thing without any actual consumer to
> justify the complication.  Meanwhile I think the goal is to reduce
> complication to a driver that doesn't exist yet.  So it currently seems
> like a poor trade-off.

I think we need to see some draft of the IDXD driver to really
understand this

> This driver that doesn't exist yet could implement its own SET_IRQS
> ioctl that backs MSI-X with IMS as a starting point.  Presumably we
> expect multiple drivers to require this behavior, so common code makes
> sense, but the rest of us in the community can't really evaluate how
> much it makes sense to slice the common code without seeing that
> implementation and how it might leverage, if not directly use, the
> existing core code.

I've been seeing a general interest in taking something that is not
MSI-X (eg "IMS" for IDXD) and converting it into MSI-X for the vPCI
function. I think this will be a durable need in this space.

Ideally it will be overtaken by simply teaching the guest, vfio and
the hypervisor interrupt logic how to directly generate interrupts
with a guest controlled addr/data pair without requiring MSI-X
trapping. That is the fundamental reason why this has to be done this
convoluted way.

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)
  2023-11-08  2:49                   ` Jason Gunthorpe
@ 2023-11-08  9:16                     ` Tian, Kevin
  0 siblings, 0 replies; 40+ messages in thread
From: Tian, Kevin @ 2023-11-08  9:16 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: Chatre, Reinette, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, kvm@vger.kernel.org,
	Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
	tom.zanussi@linux.intel.com, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, November 8, 2023 10:50 AM
> 
> On Tue, Nov 07, 2023 at 04:06:41PM -0700, Alex Williamson wrote:
> 
> > This driver that doesn't exist yet could implement its own SET_IRQS
> > ioctl that backs MSI-X with IMS as a starting point.  Presumably we
> > expect multiple drivers to require this behavior, so common code makes
> > sense, but the rest of us in the community can't really evaluate how
> > much it makes sense to slice the common code without seeing that
> > implementation and how it might leverage, if not directly use, the
> > existing core code.
> 
> I've been seeing a general interest in taking something that is not
> MSI-X (eg "IMS" for IDXD) and converting it into MSI-X for the vPCI
> function. I think this will be a durable need in this space.
> 
> Ideally it will be overtaken by simply teaching the guest, vfio and
> the hypervisor interrupt logic how to directly generate interrupts
> with a guest controlled addr/data pair without requiring MSI-X
> trapping. That is the fundamental reason why this has to be done this
> convoluted way.
> 

Even with that a legacy guest which doesn't support such enlightened
way still needs this convoluted way. 😊

and for SIOV anyway the trap cannot be eliminated given the interrupt
storage is shared by all vdev's.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)
  2023-11-07 23:06                 ` Alex Williamson
  2023-11-08  2:49                   ` Jason Gunthorpe
@ 2023-11-08 16:52                   ` Reinette Chatre
  1 sibling, 0 replies; 40+ messages in thread
From: Reinette Chatre @ 2023-11-08 16:52 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, kvm@vger.kernel.org,
	Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
	tom.zanussi@linux.intel.com, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

Hi Alex,

On 11/7/2023 3:06 PM, Alex Williamson wrote:
> That also sort of illustrates the point though that this series is
> taking a pretty broad approach to slicing up vfio-pci-core's SET_IRQS
> ioctl code path, enabling support for IMS backed interrupts, but in
> effect complicating the whole thing without any actual consumer to
> justify the complication.  Meanwhile I think the goal is to reduce
> complication to a driver that doesn't exist yet.  So it currently seems
> like a poor trade-off.
> 
> This driver that doesn't exist yet could implement its own SET_IRQS
> ioctl that backs MSI-X with IMS as a starting point.  Presumably we
> expect multiple drivers to require this behavior, so common code makes
> sense, but the rest of us in the community can't really evaluate how
> much it makes sense to slice the common code without seeing that
> implementation and how it might leverage, if not directly use, the
> existing core code.

I understand. I'm hearing the same from you and Jason. I plan to
work on addressing your feedback but will only share it when it can be
accompanied by a draft of the IDXD VDCM driver. Please let me know
if you prefer a different approach.

Reinette

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2023-11-08 16:52 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-27 17:00 [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 01/26] PCI/MSI: Provide stubs for IMS functions Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 02/26] vfio/pci: Move PCI specific check from wrapper to PCI function Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 03/26] vfio/pci: Use unsigned int instead of unsigned Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 04/26] vfio/pci: Make core interrupt callbacks accessible to all virtual devices Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 05/26] vfio/pci: Split PCI interrupt management into front and backend Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 06/26] vfio/pci: Separate MSI and MSI-X handling Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 07/26] vfio/pci: Move interrupt eventfd to interrupt context Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 08/26] vfio/pci: Move mutex acquisition into function Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 09/26] vfio/pci: Move per-interrupt contexts to generic interrupt struct Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 10/26] vfio/pci: Move IRQ type to generic interrupt context Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 11/26] vfio/pci: Provide interrupt context to irq_is() and is_irq_none() Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 12/26] vfio/pci: Provide interrupt context to generic ops Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 13/26] vfio/pci: Provide interrupt context to vfio_msi_enable() and vfio_msi_disable() Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 14/26] vfio/pci: Let interrupt management backend interpret interrupt index Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 15/26] vfio/pci: Move generic code to frontend Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 16/26] vfio/pci: Split interrupt context initialization Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 17/26] vfio/pci: Make vfio_pci_set_irqs_ioctl() available Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 18/26] vfio/pci: Preserve per-interrupt contexts Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 19/26] vfio/pci: Store Linux IRQ number in per-interrupt context Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 20/26] vfio/pci: Separate frontend and backend code during interrupt enable/disable Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 21/26] vfio/pci: Replace backend specific calls with callbacks Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 22/26] vfio/pci: Introduce backend specific context initializer Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 23/26] vfio/pci: Support emulated interrupts Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 24/26] vfio/pci: Add core IMS support Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 25/26] vfio/pci: Add accessor for IMS index Reinette Chatre
2023-10-27 17:00 ` [RFC PATCH V3 26/26] vfio/pci: Support IMS cookie modification Reinette Chatre
2023-10-31  7:31 ` [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS) Tian, Kevin
2023-11-01 18:07   ` Alex Williamson
2023-11-02  2:51     ` Tian, Kevin
2023-11-02  3:14     ` Tian, Kevin
2023-11-02 21:13       ` Alex Williamson
2023-11-03  7:23         ` Tian, Kevin
2023-11-03 15:51           ` Alex Williamson
2023-11-07  8:29             ` Tian, Kevin
2023-11-07 19:48               ` Reinette Chatre
2023-11-07 23:06                 ` Alex Williamson
2023-11-08  2:49                   ` Jason Gunthorpe
2023-11-08  9:16                     ` Tian, Kevin
2023-11-08 16:52                   ` Reinette Chatre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox