qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] SVM API declaration for emulated devices
@ 2025-05-20  7:18 CLEMENT MATHIEU--DRIF
  2025-05-20  7:18 ` [PATCH 02/11] pcie: Helper functions to check if PASID is enabled CLEMENT MATHIEU--DRIF
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-05-20  7:18 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: pbonzini@redhat.com, peterx@redhat.com, david@redhat.com,
	philmd@linaro.org, mst@redhat.com, marcel.apfelbaum@gmail.com,
	CLEMENT MATHIEU--DRIF

This patch set belongs to a list of series that add SVM support in VT-d.

Here we focus on introducing a common PCI-level API for ATS and PRI to be
used by virtual devices.

The API introduced in this series is mainly based on the PCIe Gen 5 spec.

What is ATS?
''''''''''''

ATS (Address Translation Service) is a PCIe-level protocol that
enables PCIe devices to query an IOMMU for virtual to physical
address translations in a specific address space (such as a userspace
process address space). When a device receives translation responses
from an IOMMU, it may decide to store them in an internal cache,
often known as "ATC" (Address Translation Cache) or "Device IOTLB".
To keep page tables and caches consistent, the IOMMU is allowed to 
send asynchronous invalidation requests to its client devices.

What is PRI?
''''''''''''

PRI (Page Request Interface) is a PCIe-level protocol that
enables PCIe devices to request page fault resolutions to
the kernel through an IOMMU. PRI combined with ATS are the
2 cornerstones of a technology called SVM (Shared Virtual
Memory) or SVA (Shared Virtual Addressing) which allows
PCIe devices to read to and write from the memory of
userspace applications without requiring page pinning.

Here is a link to our GitHub repository that contains:
    - Qemu with all the patches for SVM in VT-d
        - ATS
        - PRI
        - Device IOTLB invalidations
        - Requests with already pre-translated addresses
    - A demo device
    - A simple driver for the demo device
    - A userspace program (for testing and demonstration purposes)

https://github.com/BullSequana/Qemu-in-guest-SVM-demo

Clement Mathieu--Drif (11):
  pcie: Add helper to declare PASID capability for a pcie device
  pcie: Helper functions to check if PASID is enabled
  pcie: Helper function to check if ATS is enabled
  pcie: Add a helper to declare the PRI capability for a pcie device
  pcie: Helper functions to check to check if PRI is enabled
  pci: Cache the bus mastering status in the device
  pci: Add an API to get IOMMU's min page size and virtual address width
  memory: Store user data pointer in the IOMMU notifiers
  pci: Add a pci-level initialization function for IOMMU notifiers
  pci: Add a pci-level API for ATS
  pci: Add a PCI-level API for PRI

 hw/pci/pci.c                | 204 +++++++++++++++++++++--
 hw/pci/pcie.c               |  78 +++++++++
 include/hw/pci/pci.h        | 315 ++++++++++++++++++++++++++++++++++++
 include/hw/pci/pci_device.h |   1 +
 include/hw/pci/pcie.h       |  13 +-
 include/hw/pci/pcie_regs.h  |   8 +
 include/system/memory.h     |   1 +
 7 files changed, 609 insertions(+), 11 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 01/11] pcie: Add helper to declare PASID capability for a pcie device
  2025-05-20  7:18 [PATCH 00/11] SVM API declaration for emulated devices CLEMENT MATHIEU--DRIF
  2025-05-20  7:18 ` [PATCH 02/11] pcie: Helper functions to check if PASID is enabled CLEMENT MATHIEU--DRIF
@ 2025-05-20  7:18 ` CLEMENT MATHIEU--DRIF
  2025-05-20  7:18 ` [PATCH 03/11] pcie: Helper function to check if ATS is enabled CLEMENT MATHIEU--DRIF
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-05-20  7:18 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: pbonzini@redhat.com, peterx@redhat.com, david@redhat.com,
	philmd@linaro.org, mst@redhat.com, marcel.apfelbaum@gmail.com,
	CLEMENT MATHIEU--DRIF

Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/pci/pcie.c              | 25 +++++++++++++++++++++++++
 include/hw/pci/pcie.h      |  6 +++++-
 include/hw/pci/pcie_regs.h |  5 +++++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 1b12db6fa2..4f935ff420 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -1214,3 +1214,28 @@ void pcie_acs_reset(PCIDevice *dev)
         pci_set_word(dev->config + dev->exp.acs_cap + PCI_ACS_CTRL, 0);
     }
 }
+
+/* PASID */
+void pcie_pasid_init(PCIDevice *dev, uint16_t offset, uint8_t pasid_width,
+                     bool exec_perm, bool priv_mod)
+{
+    static const uint16_t control_reg_rw_mask = 0x07;
+    uint16_t capability_reg;
+
+    assert(pasid_width <= PCI_EXT_CAP_PASID_MAX_WIDTH);
+
+    pcie_add_capability(dev, PCI_EXT_CAP_ID_PASID, PCI_PASID_VER, offset,
+                        PCI_EXT_CAP_PASID_SIZEOF);
+
+    capability_reg = ((uint16_t)pasid_width) << PCI_PASID_CAP_WIDTH_SHIFT;
+    capability_reg |= exec_perm ? PCI_PASID_CAP_EXEC : 0;
+    capability_reg |= priv_mod  ? PCI_PASID_CAP_PRIV : 0;
+    pci_set_word(dev->config + offset + PCI_PASID_CAP, capability_reg);
+
+    /* Everything is disabled by default */
+    pci_set_word(dev->config + offset + PCI_PASID_CTRL, 0);
+
+    pci_set_word(dev->wmask + offset + PCI_PASID_CTRL, control_reg_rw_mask);
+
+    dev->exp.pasid_cap = offset;
+}
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 70a5de09de..fe82e0a915 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -70,8 +70,9 @@ struct PCIExpressDevice {
     uint16_t aer_cap;
     PCIEAERLog aer_log;
 
-    /* Offset of ATS capability in config space */
+    /* Offset of ATS and PASID capabilities in config space */
     uint16_t ats_cap;
+    uint16_t pasid_cap;
 
     /* ACS */
     uint16_t acs_cap;
@@ -150,4 +151,7 @@ void pcie_cap_slot_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
                              Error **errp);
 void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev,
                                      DeviceState *dev, Error **errp);
+
+void pcie_pasid_init(PCIDevice *dev, uint16_t offset, uint8_t pasid_width,
+                     bool exec_perm, bool priv_mod);
 #endif /* QEMU_PCIE_H */
diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
index 9d3b6868dc..4d9cf4a29c 100644
--- a/include/hw/pci/pcie_regs.h
+++ b/include/hw/pci/pcie_regs.h
@@ -86,6 +86,11 @@ typedef enum PCIExpLinkWidth {
 #define PCI_ARI_VER                     1
 #define PCI_ARI_SIZEOF                  8
 
+/* PASID */
+#define PCI_PASID_VER                   1
+#define PCI_EXT_CAP_PASID_MAX_WIDTH     20
+#define PCI_PASID_CAP_WIDTH_SHIFT       8
+
 /* AER */
 #define PCI_ERR_VER                     2
 #define PCI_ERR_SIZEOF                  0x48
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 02/11] pcie: Helper functions to check if PASID is enabled
  2025-05-20  7:18 [PATCH 00/11] SVM API declaration for emulated devices CLEMENT MATHIEU--DRIF
@ 2025-05-20  7:18 ` CLEMENT MATHIEU--DRIF
  2025-05-20  7:18 ` [PATCH 01/11] pcie: Add helper to declare PASID capability for a pcie device CLEMENT MATHIEU--DRIF
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-05-20  7:18 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: pbonzini@redhat.com, peterx@redhat.com, david@redhat.com,
	philmd@linaro.org, mst@redhat.com, marcel.apfelbaum@gmail.com,
	CLEMENT MATHIEU--DRIF

pasid_enabled checks whether the capability is
present or not. If so, we read the configuration space to get
the status of the feature (enabled or not).

Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/pci/pcie.c         | 9 +++++++++
 include/hw/pci/pcie.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 4f935ff420..db9756d861 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -1239,3 +1239,12 @@ void pcie_pasid_init(PCIDevice *dev, uint16_t offset, uint8_t pasid_width,
 
     dev->exp.pasid_cap = offset;
 }
+
+bool pcie_pasid_enabled(const PCIDevice *dev)
+{
+    if (!pci_is_express(dev) || !dev->exp.pasid_cap) {
+        return false;
+    }
+    return (pci_get_word(dev->config + dev->exp.pasid_cap + PCI_PASID_CTRL) &
+                PCI_PASID_CTRL_ENABLE) != 0;
+}
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index fe82e0a915..dff98ff2c6 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -154,4 +154,6 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev,
 
 void pcie_pasid_init(PCIDevice *dev, uint16_t offset, uint8_t pasid_width,
                      bool exec_perm, bool priv_mod);
+
+bool pcie_pasid_enabled(const PCIDevice *dev);
 #endif /* QEMU_PCIE_H */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 03/11] pcie: Helper function to check if ATS is enabled
  2025-05-20  7:18 [PATCH 00/11] SVM API declaration for emulated devices CLEMENT MATHIEU--DRIF
  2025-05-20  7:18 ` [PATCH 02/11] pcie: Helper functions to check if PASID is enabled CLEMENT MATHIEU--DRIF
  2025-05-20  7:18 ` [PATCH 01/11] pcie: Add helper to declare PASID capability for a pcie device CLEMENT MATHIEU--DRIF
@ 2025-05-20  7:18 ` CLEMENT MATHIEU--DRIF
  2025-05-20  7:18 ` [PATCH 04/11] pcie: Add a helper to declare the PRI capability for a pcie device CLEMENT MATHIEU--DRIF
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-05-20  7:18 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: pbonzini@redhat.com, peterx@redhat.com, david@redhat.com,
	philmd@linaro.org, mst@redhat.com, marcel.apfelbaum@gmail.com,
	CLEMENT MATHIEU--DRIF

ats_enabled checks whether the capability is
present or not. If so, we read the configuration space to get
the status of the feature (enabled or not).

Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/pci/pcie.c         | 9 +++++++++
 include/hw/pci/pcie.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index db9756d861..36de709801 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -1248,3 +1248,12 @@ bool pcie_pasid_enabled(const PCIDevice *dev)
     return (pci_get_word(dev->config + dev->exp.pasid_cap + PCI_PASID_CTRL) &
                 PCI_PASID_CTRL_ENABLE) != 0;
 }
+
+bool pcie_ats_enabled(const PCIDevice *dev)
+{
+    if (!pci_is_express(dev) || !dev->exp.ats_cap) {
+        return false;
+    }
+    return (pci_get_word(dev->config + dev->exp.ats_cap + PCI_ATS_CTRL) &
+                PCI_ATS_CTRL_ENABLE) != 0;
+}
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index dff98ff2c6..497d0bc2d2 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -156,4 +156,5 @@ void pcie_pasid_init(PCIDevice *dev, uint16_t offset, uint8_t pasid_width,
                      bool exec_perm, bool priv_mod);
 
 bool pcie_pasid_enabled(const PCIDevice *dev);
+bool pcie_ats_enabled(const PCIDevice *dev);
 #endif /* QEMU_PCIE_H */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 04/11] pcie: Add a helper to declare the PRI capability for a pcie device
  2025-05-20  7:18 [PATCH 00/11] SVM API declaration for emulated devices CLEMENT MATHIEU--DRIF
                   ` (2 preceding siblings ...)
  2025-05-20  7:18 ` [PATCH 03/11] pcie: Helper function to check if ATS is enabled CLEMENT MATHIEU--DRIF
@ 2025-05-20  7:18 ` CLEMENT MATHIEU--DRIF
  2025-05-20  7:18 ` [PATCH 05/11] pcie: Helper functions to check to check if PRI is enabled CLEMENT MATHIEU--DRIF
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-05-20  7:18 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: pbonzini@redhat.com, peterx@redhat.com, david@redhat.com,
	philmd@linaro.org, mst@redhat.com, marcel.apfelbaum@gmail.com,
	CLEMENT MATHIEU--DRIF

Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/pci/pcie.c              | 26 ++++++++++++++++++++++++++
 include/hw/pci/pcie.h      |  5 ++++-
 include/hw/pci/pcie_regs.h |  3 +++
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 36de709801..542172b3fa 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -1240,6 +1240,32 @@ void pcie_pasid_init(PCIDevice *dev, uint16_t offset, uint8_t pasid_width,
     dev->exp.pasid_cap = offset;
 }
 
+/* PRI */
+void pcie_pri_init(PCIDevice *dev, uint16_t offset, uint32_t outstanding_pr_cap,
+                   bool prg_response_pasid_req)
+{
+    static const uint16_t control_reg_rw_mask = 0x3;
+    static const uint16_t status_reg_rw1_mask = 0x3;
+    static const uint32_t pr_alloc_reg_rw_mask = 0xffffffff;
+    uint16_t status_reg;
+
+    status_reg = prg_response_pasid_req ? PCI_PRI_STATUS_PASID : 0;
+    status_reg |= PCI_PRI_STATUS_STOPPED; /* Stopped by default */
+
+    pcie_add_capability(dev, PCI_EXT_CAP_ID_PRI, PCI_PRI_VER, offset,
+                        PCI_EXT_CAP_PRI_SIZEOF);
+    /* Disabled by default */
+
+    pci_set_word(dev->config + offset + PCI_PRI_STATUS, status_reg);
+    pci_set_long(dev->config + offset + PCI_PRI_MAX_REQ, outstanding_pr_cap);
+
+    pci_set_word(dev->wmask + offset + PCI_PRI_CTRL, control_reg_rw_mask);
+    pci_set_word(dev->w1cmask + offset + PCI_PRI_STATUS, status_reg_rw1_mask);
+    pci_set_long(dev->wmask + offset + PCI_PRI_ALLOC_REQ, pr_alloc_reg_rw_mask);
+
+    dev->exp.pri_cap = offset;
+}
+
 bool pcie_pasid_enabled(const PCIDevice *dev)
 {
     if (!pci_is_express(dev) || !dev->exp.pasid_cap) {
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 497d0bc2d2..17f06cd5d6 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -70,9 +70,10 @@ struct PCIExpressDevice {
     uint16_t aer_cap;
     PCIEAERLog aer_log;
 
-    /* Offset of ATS and PASID capabilities in config space */
+    /* Offset of ATS, PRI and PASID capabilities in config space */
     uint16_t ats_cap;
     uint16_t pasid_cap;
+    uint16_t pri_cap;
 
     /* ACS */
     uint16_t acs_cap;
@@ -154,6 +155,8 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev,
 
 void pcie_pasid_init(PCIDevice *dev, uint16_t offset, uint8_t pasid_width,
                      bool exec_perm, bool priv_mod);
+void pcie_pri_init(PCIDevice *dev, uint16_t offset, uint32_t outstanding_pr_cap,
+                   bool prg_response_pasid_req);
 
 bool pcie_pasid_enabled(const PCIDevice *dev);
 bool pcie_ats_enabled(const PCIDevice *dev);
diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
index 4d9cf4a29c..33a22229fe 100644
--- a/include/hw/pci/pcie_regs.h
+++ b/include/hw/pci/pcie_regs.h
@@ -91,6 +91,9 @@ typedef enum PCIExpLinkWidth {
 #define PCI_EXT_CAP_PASID_MAX_WIDTH     20
 #define PCI_PASID_CAP_WIDTH_SHIFT       8
 
+/* PRI */
+#define PCI_PRI_VER                     1
+
 /* AER */
 #define PCI_ERR_VER                     2
 #define PCI_ERR_SIZEOF                  0x48
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 05/11] pcie: Helper functions to check to check if PRI is enabled
  2025-05-20  7:18 [PATCH 00/11] SVM API declaration for emulated devices CLEMENT MATHIEU--DRIF
                   ` (3 preceding siblings ...)
  2025-05-20  7:18 ` [PATCH 04/11] pcie: Add a helper to declare the PRI capability for a pcie device CLEMENT MATHIEU--DRIF
@ 2025-05-20  7:18 ` CLEMENT MATHIEU--DRIF
  2025-05-20  7:18 ` [PATCH 06/11] pci: Cache the bus mastering status in the device CLEMENT MATHIEU--DRIF
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-05-20  7:18 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: pbonzini@redhat.com, peterx@redhat.com, david@redhat.com,
	philmd@linaro.org, mst@redhat.com, marcel.apfelbaum@gmail.com,
	CLEMENT MATHIEU--DRIF

pri_enabled can be used to check whether the capability is present and
enabled on a PCIe device

Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/pci/pcie.c         | 9 +++++++++
 include/hw/pci/pcie.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 542172b3fa..eaeb68894e 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -1266,6 +1266,15 @@ void pcie_pri_init(PCIDevice *dev, uint16_t offset, uint32_t outstanding_pr_cap,
     dev->exp.pri_cap = offset;
 }
 
+bool pcie_pri_enabled(const PCIDevice *dev)
+{
+    if (!pci_is_express(dev) || !dev->exp.pri_cap) {
+        return false;
+    }
+    return (pci_get_word(dev->config + dev->exp.pri_cap + PCI_PRI_CTRL) &
+                PCI_PRI_CTRL_ENABLE) != 0;
+}
+
 bool pcie_pasid_enabled(const PCIDevice *dev)
 {
     if (!pci_is_express(dev) || !dev->exp.pasid_cap) {
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 17f06cd5d6..ff6ce08e13 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -158,6 +158,7 @@ void pcie_pasid_init(PCIDevice *dev, uint16_t offset, uint8_t pasid_width,
 void pcie_pri_init(PCIDevice *dev, uint16_t offset, uint32_t outstanding_pr_cap,
                    bool prg_response_pasid_req);
 
+bool pcie_pri_enabled(const PCIDevice *dev);
 bool pcie_pasid_enabled(const PCIDevice *dev);
 bool pcie_ats_enabled(const PCIDevice *dev);
 #endif /* QEMU_PCIE_H */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 06/11] pci: Cache the bus mastering status in the device
  2025-05-20  7:18 [PATCH 00/11] SVM API declaration for emulated devices CLEMENT MATHIEU--DRIF
                   ` (4 preceding siblings ...)
  2025-05-20  7:18 ` [PATCH 05/11] pcie: Helper functions to check to check if PRI is enabled CLEMENT MATHIEU--DRIF
@ 2025-05-20  7:18 ` CLEMENT MATHIEU--DRIF
  2025-05-20  7:18 ` [PATCH 07/11] pci: Add an API to get IOMMU's min page size and virtual address width CLEMENT MATHIEU--DRIF
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-05-20  7:18 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: pbonzini@redhat.com, peterx@redhat.com, david@redhat.com,
	philmd@linaro.org, mst@redhat.com, marcel.apfelbaum@gmail.com,
	CLEMENT MATHIEU--DRIF

The cached is_master value is necessary to know if a device is
allowed to issue ATS/PRI requests or not as these operations do not go
through the master_enable memory region.

Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/pci/pci.c                | 23 +++++++++++++----------
 include/hw/pci/pci_device.h |  1 +
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index f5ab510697..1114ba8529 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -128,6 +128,12 @@ static GSequence *pci_acpi_index_list(void)
     return used_acpi_index_list;
 }
 
+static void pci_set_master(PCIDevice *d, bool enable)
+{
+    memory_region_set_enabled(&d->bus_master_enable_region, enable);
+    d->is_master = enable; /* cache the status */
+}
+
 static void pci_init_bus_master(PCIDevice *pci_dev)
 {
     AddressSpace *dma_as = pci_device_iommu_address_space(pci_dev);
@@ -135,7 +141,7 @@ static void pci_init_bus_master(PCIDevice *pci_dev)
     memory_region_init_alias(&pci_dev->bus_master_enable_region,
                              OBJECT(pci_dev), "bus master",
                              dma_as->root, 0, memory_region_size(dma_as->root));
-    memory_region_set_enabled(&pci_dev->bus_master_enable_region, false);
+    pci_set_master(pci_dev, false);
     memory_region_add_subregion(&pci_dev->bus_master_container_region, 0,
                                 &pci_dev->bus_master_enable_region);
 }
@@ -804,9 +810,8 @@ static int get_pci_config_device(QEMUFile *f, void *pv, size_t size,
         pci_bridge_update_mappings(PCI_BRIDGE(s));
     }
 
-    memory_region_set_enabled(&s->bus_master_enable_region,
-                              pci_get_word(s->config + PCI_COMMAND)
-                              & PCI_COMMAND_MASTER);
+    pci_set_master(s, pci_get_word(s->config + PCI_COMMAND)
+                      & PCI_COMMAND_MASTER);
 
     g_free(config);
     return 0;
@@ -1787,9 +1792,8 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int
 
     if (ranges_overlap(addr, l, PCI_COMMAND, 2)) {
         pci_update_irq_disabled(d, was_irq_disabled);
-        memory_region_set_enabled(&d->bus_master_enable_region,
-                                  (pci_get_word(d->config + PCI_COMMAND)
-                                   & PCI_COMMAND_MASTER) && d->enabled);
+        pci_set_master(d, (pci_get_word(d->config + PCI_COMMAND) &
+                          PCI_COMMAND_MASTER) && d->enabled);
     }
 
     msi_write_config(d, addr, val_in, l);
@@ -3100,9 +3104,8 @@ void pci_set_enabled(PCIDevice *d, bool state)
 
     d->enabled = state;
     pci_update_mappings(d);
-    memory_region_set_enabled(&d->bus_master_enable_region,
-                              (pci_get_word(d->config + PCI_COMMAND)
-                               & PCI_COMMAND_MASTER) && d->enabled);
+    pci_set_master(d, (pci_get_word(d->config + PCI_COMMAND)
+                      & PCI_COMMAND_MASTER) && d->enabled);
     if (qdev_is_realized(&d->qdev)) {
         pci_device_reset(d);
     }
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index e41d95b0b0..eee0338568 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -90,6 +90,7 @@ struct PCIDevice {
     char name[64];
     PCIIORegion io_regions[PCI_NUM_REGIONS];
     AddressSpace bus_master_as;
+    bool is_master;
     MemoryRegion bus_master_container_region;
     MemoryRegion bus_master_enable_region;
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 07/11] pci: Add an API to get IOMMU's min page size and virtual address width
  2025-05-20  7:18 [PATCH 00/11] SVM API declaration for emulated devices CLEMENT MATHIEU--DRIF
                   ` (5 preceding siblings ...)
  2025-05-20  7:18 ` [PATCH 06/11] pci: Cache the bus mastering status in the device CLEMENT MATHIEU--DRIF
@ 2025-05-20  7:18 ` CLEMENT MATHIEU--DRIF
  2025-05-20  7:19 ` [PATCH 08/11] memory: Store user data pointer in the IOMMU notifiers CLEMENT MATHIEU--DRIF
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-05-20  7:18 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: pbonzini@redhat.com, peterx@redhat.com, david@redhat.com,
	philmd@linaro.org, mst@redhat.com, marcel.apfelbaum@gmail.com,
	CLEMENT MATHIEU--DRIF

This kind of information is needed by devices implementing ATS in order
to initialize their translation cache.

Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/pci/pci.c         | 17 +++++++++++++++++
 include/hw/pci/pci.h | 26 ++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 1114ba8529..fc4954ac81 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2970,6 +2970,23 @@ void pci_device_unset_iommu_device(PCIDevice *dev)
     }
 }
 
+int pci_iommu_get_iotlb_info(PCIDevice *dev, uint8_t *addr_width,
+                             uint32_t *min_page_size)
+{
+    PCIBus *bus;
+    PCIBus *iommu_bus;
+    int devfn;
+
+    pci_device_get_iommu_bus_devfn(dev, &bus, &iommu_bus, &devfn);
+    if (iommu_bus && iommu_bus->iommu_ops->get_iotlb_info) {
+        iommu_bus->iommu_ops->get_iotlb_info(iommu_bus->iommu_opaque,
+                                             addr_width, min_page_size);
+        return 0;
+    }
+
+    return -ENODEV;
+}
+
 void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque)
 {
     /*
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index c2fe6caa2c..d67ffe12db 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -429,6 +429,19 @@ typedef struct PCIIOMMUOps {
      * @devfn: device and function number of the PCI device.
      */
     void (*unset_iommu_device)(PCIBus *bus, void *opaque, int devfn);
+    /**
+     * @get_iotlb_info: get properties required to initialize a device IOTLB.
+     *
+     * Callback required if devices are allowed to cache translations.
+     *
+     * @opaque: the data passed to pci_setup_iommu().
+     *
+     * @addr_width: the address width of the IOMMU (output parameter).
+     *
+     * @min_page_size: the page size of the IOMMU (output parameter).
+     */
+    void (*get_iotlb_info)(void *opaque, uint8_t *addr_width,
+                           uint32_t *min_page_size);
 } PCIIOMMUOps;
 
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
@@ -436,6 +449,19 @@ bool pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *hiod,
                                  Error **errp);
 void pci_device_unset_iommu_device(PCIDevice *dev);
 
+/**
+ * pci_iommu_get_iotlb_info: get properties required to initialize a
+ * device IOTLB.
+ *
+ * Returns 0 on success, or a negative errno otherwise.
+ *
+ * @dev: the device that wants to get the information.
+ * @addr_width: the address width of the IOMMU (output parameter).
+ * @min_page_size: the page size of the IOMMU (output parameter).
+ */
+int pci_iommu_get_iotlb_info(PCIDevice *dev, uint8_t *addr_width,
+                             uint32_t *min_page_size);
+
 /**
  * pci_setup_iommu: Initialize specific IOMMU handlers for a PCIBus
  *
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 08/11] memory: Store user data pointer in the IOMMU notifiers
  2025-05-20  7:18 [PATCH 00/11] SVM API declaration for emulated devices CLEMENT MATHIEU--DRIF
                   ` (6 preceding siblings ...)
  2025-05-20  7:18 ` [PATCH 07/11] pci: Add an API to get IOMMU's min page size and virtual address width CLEMENT MATHIEU--DRIF
@ 2025-05-20  7:19 ` CLEMENT MATHIEU--DRIF
  2025-05-20  7:19 ` [PATCH 09/11] pci: Add a pci-level initialization function for " CLEMENT MATHIEU--DRIF
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-05-20  7:19 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: pbonzini@redhat.com, peterx@redhat.com, david@redhat.com,
	philmd@linaro.org, mst@redhat.com, marcel.apfelbaum@gmail.com,
	CLEMENT MATHIEU--DRIF

This will help developers of ATS-capable devices to track a state.

Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 include/system/memory.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/system/memory.h b/include/system/memory.h
index fbbf4cf911..fc35a0dcad 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -183,6 +183,7 @@ struct IOMMUNotifier {
     hwaddr start;
     hwaddr end;
     int iommu_idx;
+    void *opaque;
     QLIST_ENTRY(IOMMUNotifier) node;
 };
 typedef struct IOMMUNotifier IOMMUNotifier;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 09/11] pci: Add a pci-level initialization function for IOMMU notifiers
  2025-05-20  7:18 [PATCH 00/11] SVM API declaration for emulated devices CLEMENT MATHIEU--DRIF
                   ` (7 preceding siblings ...)
  2025-05-20  7:19 ` [PATCH 08/11] memory: Store user data pointer in the IOMMU notifiers CLEMENT MATHIEU--DRIF
@ 2025-05-20  7:19 ` CLEMENT MATHIEU--DRIF
  2025-05-20  7:19 ` [PATCH 10/11] pci: Add a pci-level API for ATS CLEMENT MATHIEU--DRIF
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-05-20  7:19 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: pbonzini@redhat.com, peterx@redhat.com, david@redhat.com,
	philmd@linaro.org, mst@redhat.com, marcel.apfelbaum@gmail.com,
	CLEMENT MATHIEU--DRIF

This is meant to be used by ATS-capable devices.

Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/pci/pci.c         | 17 +++++++++++++++++
 include/hw/pci/pci.h | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index fc4954ac81..dfa5a0259e 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2939,6 +2939,23 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
     return &address_space_memory;
 }
 
+int pci_iommu_init_iotlb_notifier(PCIDevice *dev, IOMMUNotifier *n,
+                                  IOMMUNotify fn, void *opaque)
+{
+    PCIBus *bus;
+    PCIBus *iommu_bus;
+    int devfn;
+
+    pci_device_get_iommu_bus_devfn(dev, &bus, &iommu_bus, &devfn);
+    if (iommu_bus && iommu_bus->iommu_ops->init_iotlb_notifier) {
+        iommu_bus->iommu_ops->init_iotlb_notifier(bus, iommu_bus->iommu_opaque,
+                                                  devfn, n, fn, opaque);
+        return 0;
+    }
+
+    return -ENODEV;
+}
+
 bool pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *hiod,
                                  Error **errp)
 {
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index d67ffe12db..f3016fd76f 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -442,6 +442,26 @@ typedef struct PCIIOMMUOps {
      */
     void (*get_iotlb_info)(void *opaque, uint8_t *addr_width,
                            uint32_t *min_page_size);
+    /**
+     * @init_iotlb_notifier: initialize an IOMMU notifier.
+     *
+     * Optional callback.
+     *
+     * @bus: the #PCIBus of the PCI device.
+     *
+     * @opaque: the data passed to pci_setup_iommu().
+     *
+     * @devfn: device and function number of the PCI device.
+     *
+     * @n: the notifier to be initialized.
+     *
+     * @fn: the callback to be installed.
+     *
+     * @user_opaque: a user pointer that can be used to track a state.
+     */
+    void (*init_iotlb_notifier)(PCIBus *bus, void *opaque, int devfn,
+                                IOMMUNotifier *n, IOMMUNotify fn,
+                                void *user_opaque);
 } PCIIOMMUOps;
 
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
@@ -462,6 +482,19 @@ void pci_device_unset_iommu_device(PCIDevice *dev);
 int pci_iommu_get_iotlb_info(PCIDevice *dev, uint8_t *addr_width,
                              uint32_t *min_page_size);
 
+/**
+ * pci_iommu_init_iotlb_notifier: initialize an IOMMU notifier.
+ *
+ * This function is used by devices before registering an IOTLB notifier.
+ *
+ * @dev: the device.
+ * @n: the notifier to be initialized.
+ * @fn: the callback to be installed.
+ * @opaque: a user pointer that can be used to track a state.
+ */
+int pci_iommu_init_iotlb_notifier(PCIDevice *dev, IOMMUNotifier *n,
+                                  IOMMUNotify fn, void *opaque);
+
 /**
  * pci_setup_iommu: Initialize specific IOMMU handlers for a PCIBus
  *
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 10/11] pci: Add a pci-level API for ATS
  2025-05-20  7:18 [PATCH 00/11] SVM API declaration for emulated devices CLEMENT MATHIEU--DRIF
                   ` (8 preceding siblings ...)
  2025-05-20  7:19 ` [PATCH 09/11] pci: Add a pci-level initialization function for " CLEMENT MATHIEU--DRIF
@ 2025-05-20  7:19 ` CLEMENT MATHIEU--DRIF
  2025-05-20  7:19 ` [PATCH 11/11] pci: Add a PCI-level API for PRI CLEMENT MATHIEU--DRIF
  2025-06-05  5:08 ` [PATCH 00/11] SVM API declaration for emulated devices CLEMENT MATHIEU--DRIF
  11 siblings, 0 replies; 13+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-05-20  7:19 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: pbonzini@redhat.com, peterx@redhat.com, david@redhat.com,
	philmd@linaro.org, mst@redhat.com, marcel.apfelbaum@gmail.com,
	CLEMENT MATHIEU--DRIF, Ethan MILON

Devices implementing ATS can send translation requests using
pci_ats_request_translation. The invalidation events are sent
back to the device using the iommu notifier managed with
pci_iommu_register_iotlb_notifier / pci_iommu_unregister_iotlb_notifier.

Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com>
Co-authored-by: Ethan Milon <ethan.milon@eviden.com>
---
 hw/pci/pci.c         |  81 ++++++++++++++++++++++++++++
 include/hw/pci/pci.h | 126 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 207 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index dfa5a0259e..0c63cb4bbe 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2987,6 +2987,87 @@ void pci_device_unset_iommu_device(PCIDevice *dev)
     }
 }
 
+ssize_t pci_ats_request_translation(PCIDevice *dev, uint32_t pasid,
+                                    bool priv_req, bool exec_req,
+                                    hwaddr addr, size_t length,
+                                    bool no_write, IOMMUTLBEntry *result,
+                                    size_t result_length,
+                                    uint32_t *err_count)
+{
+    PCIBus *bus;
+    PCIBus *iommu_bus;
+    int devfn;
+
+    if (!dev->is_master ||
+            ((pasid != PCI_NO_PASID) && !pcie_pasid_enabled(dev))) {
+        return -EPERM;
+    }
+
+    if (result_length == 0) {
+        return -ENOSPC;
+    }
+
+    if (!pcie_ats_enabled(dev)) {
+        return -EPERM;
+    }
+
+    pci_device_get_iommu_bus_devfn(dev, &bus, &iommu_bus, &devfn);
+    if (iommu_bus && iommu_bus->iommu_ops->ats_request_translation) {
+        return iommu_bus->iommu_ops->ats_request_translation(bus,
+                                                     iommu_bus->iommu_opaque,
+                                                     devfn, pasid, priv_req,
+                                                     exec_req, addr, length,
+                                                     no_write, result,
+                                                     result_length, err_count);
+    }
+
+    return -ENODEV;
+}
+
+int pci_iommu_register_iotlb_notifier(PCIDevice *dev, uint32_t pasid,
+                                      IOMMUNotifier *n)
+{
+    PCIBus *bus;
+    PCIBus *iommu_bus;
+    int devfn;
+
+    if ((pasid != PCI_NO_PASID) && !pcie_pasid_enabled(dev)) {
+        return -EPERM;
+    }
+
+    pci_device_get_iommu_bus_devfn(dev, &bus, &iommu_bus, &devfn);
+    if (iommu_bus && iommu_bus->iommu_ops->register_iotlb_notifier) {
+        iommu_bus->iommu_ops->register_iotlb_notifier(bus,
+                                           iommu_bus->iommu_opaque, devfn,
+                                           pasid, n);
+        return 0;
+    }
+
+    return -ENODEV;
+}
+
+int pci_iommu_unregister_iotlb_notifier(PCIDevice *dev, uint32_t pasid,
+                                        IOMMUNotifier *n)
+{
+    PCIBus *bus;
+    PCIBus *iommu_bus;
+    int devfn;
+
+    if ((pasid != PCI_NO_PASID) && !pcie_pasid_enabled(dev)) {
+        return -EPERM;
+    }
+
+    pci_device_get_iommu_bus_devfn(dev, &bus, &iommu_bus, &devfn);
+    if (iommu_bus && iommu_bus->iommu_ops->unregister_iotlb_notifier) {
+        iommu_bus->iommu_ops->unregister_iotlb_notifier(bus,
+                                                        iommu_bus->iommu_opaque,
+                                                        devfn, pasid, n);
+        return 0;
+    }
+
+    return -ENODEV;
+}
+
 int pci_iommu_get_iotlb_info(PCIDevice *dev, uint8_t *addr_width,
                              uint32_t *min_page_size)
 {
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index f3016fd76f..5d72607ed5 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -462,6 +462,80 @@ typedef struct PCIIOMMUOps {
     void (*init_iotlb_notifier)(PCIBus *bus, void *opaque, int devfn,
                                 IOMMUNotifier *n, IOMMUNotify fn,
                                 void *user_opaque);
+    /**
+     * @register_iotlb_notifier: setup an IOTLB invalidation notifier.
+     *
+     * Callback required if devices are allowed to cache translations.
+     *
+     * @bus: the #PCIBus of the PCI device.
+     *
+     * @opaque: the data passed to pci_setup_iommu().
+     *
+     * @devfn: device and function number of the PCI device.
+     *
+     * @pasid: the pasid of the address space to watch.
+     *
+     * @n: the notifier to register.
+     */
+    void (*register_iotlb_notifier)(PCIBus *bus, void *opaque, int devfn,
+                                    uint32_t pasid, IOMMUNotifier *n);
+    /**
+     * @unregister_iotlb_notifier: remove an IOTLB invalidation notifier.
+     *
+     * Callback required if devices are allowed to cache translations.
+     *
+     * @bus: the #PCIBus of the PCI device.
+     *
+     * @opaque: the data passed to pci_setup_iommu().
+     *
+     * @devfn: device and function number of the PCI device.
+     *
+     * @pasid: the pasid of the address space to stop watching.
+     *
+     * @n: the notifier to unregister.
+     */
+    void (*unregister_iotlb_notifier)(PCIBus *bus, void *opaque, int devfn,
+                                      uint32_t pasid, IOMMUNotifier *n);
+    /**
+     * @ats_request_translation: issue an ATS request.
+     *
+     * Callback required if devices are allowed to use the address
+     * translation service.
+     *
+     * @bus: the #PCIBus of the PCI device.
+     *
+     * @opaque: the data passed to pci_setup_iommu().
+     *
+     * @devfn: device and function number of the PCI device.
+     *
+     * @pasid: the pasid of the address space to use for the request.
+     *
+     * @priv_req: privileged mode bit (PASID TLP).
+     *
+     * @exec_req: execute request bit (PASID TLP).
+     *
+     * @addr: start address of the memory range to be translated.
+     *
+     * @length: length of the memory range in bytes.
+     *
+     * @no_write: request a read-only translation (if supported).
+     *
+     * @result: buffer in which the TLB entries will be stored.
+     *
+     * @result_length: result buffer length.
+     *
+     * @err_count: number of untranslated subregions.
+     *
+     * Returns: the number of translations stored in the result buffer, or
+     * -ENOMEM if the buffer is not large enough.
+     */
+    ssize_t (*ats_request_translation)(PCIBus *bus, void *opaque, int devfn,
+                                       uint32_t pasid, bool priv_req,
+                                       bool exec_req, hwaddr addr,
+                                       size_t length, bool no_write,
+                                       IOMMUTLBEntry *result,
+                                       size_t result_length,
+                                       uint32_t *err_count);
 } PCIIOMMUOps;
 
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
@@ -495,6 +569,58 @@ int pci_iommu_get_iotlb_info(PCIDevice *dev, uint8_t *addr_width,
 int pci_iommu_init_iotlb_notifier(PCIDevice *dev, IOMMUNotifier *n,
                                   IOMMUNotify fn, void *opaque);
 
+/**
+ * pci_ats_request_translation: perform an ATS request.
+ *
+ * Returns the number of translations stored in @result in case of success,
+ * a negative error code otherwise.
+ * -ENOMEM is returned when the result buffer is not large enough to store
+ * all the translations.
+ *
+ * @dev: the ATS-capable PCI device.
+ * @pasid: the pasid of the address space in which the translation will be done.
+ * @priv_req: privileged mode bit (PASID TLP).
+ * @exec_req: execute request bit (PASID TLP).
+ * @addr: start address of the memory range to be translated.
+ * @length: length of the memory range in bytes.
+ * @no_write: request a read-only translation (if supported).
+ * @result: buffer in which the TLB entries will be stored.
+ * @result_length: result buffer length.
+ * @err_count: number of untranslated subregions.
+ */
+ssize_t pci_ats_request_translation(PCIDevice *dev, uint32_t pasid,
+                                    bool priv_req, bool exec_req,
+                                    hwaddr addr, size_t length,
+                                    bool no_write, IOMMUTLBEntry *result,
+                                    size_t result_length,
+                                    uint32_t *err_count);
+
+/**
+ * pci_iommu_register_iotlb_notifier: register a notifier for changes to
+ * IOMMU translation entries in a specific address space.
+ *
+ * Returns 0 on success, or a negative errno otherwise.
+ *
+ * @dev: the device that wants to get notified.
+ * @pasid: the pasid of the address space to track.
+ * @n: the notifier to register.
+ */
+int pci_iommu_register_iotlb_notifier(PCIDevice *dev, uint32_t pasid,
+                                      IOMMUNotifier *n);
+
+/**
+ * pci_iommu_unregister_iotlb_notifier: unregister a notifier that has been
+ * registerd with pci_iommu_register_iotlb_notifier.
+ *
+ * Returns 0 on success, or a negative errno otherwise.
+ *
+ * @dev: the device that wants to stop notifications.
+ * @pasid: the pasid of the address space to stop tracking.
+ * @n: the notifier to unregister.
+ */
+int pci_iommu_unregister_iotlb_notifier(PCIDevice *dev, uint32_t pasid,
+                                        IOMMUNotifier *n);
+
 /**
  * pci_setup_iommu: Initialize specific IOMMU handlers for a PCIBus
  *
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 11/11] pci: Add a PCI-level API for PRI
  2025-05-20  7:18 [PATCH 00/11] SVM API declaration for emulated devices CLEMENT MATHIEU--DRIF
                   ` (9 preceding siblings ...)
  2025-05-20  7:19 ` [PATCH 10/11] pci: Add a pci-level API for ATS CLEMENT MATHIEU--DRIF
@ 2025-05-20  7:19 ` CLEMENT MATHIEU--DRIF
  2025-06-05  5:08 ` [PATCH 00/11] SVM API declaration for emulated devices CLEMENT MATHIEU--DRIF
  11 siblings, 0 replies; 13+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-05-20  7:19 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: pbonzini@redhat.com, peterx@redhat.com, david@redhat.com,
	philmd@linaro.org, mst@redhat.com, marcel.apfelbaum@gmail.com,
	CLEMENT MATHIEU--DRIF, Ethan MILON

A device can send a PRI request to the IOMMU using pci_pri_request_page.
The PRI response is sent back using the notifier managed with
pci_pri_register_notifier and pci_pri_unregister_notifier.

Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com>
Co-authored-by: Ethan Milon <ethan.milon@eviden.com>
---
 hw/pci/pci.c         |  66 ++++++++++++++++++++++
 include/hw/pci/pci.h | 130 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 196 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 0c63cb4bbe..c6b5768f3a 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2987,6 +2987,72 @@ void pci_device_unset_iommu_device(PCIDevice *dev)
     }
 }
 
+int pci_pri_request_page(PCIDevice *dev, uint32_t pasid, bool priv_req,
+                         bool exec_req, hwaddr addr, bool lpig,
+                         uint16_t prgi, bool is_read, bool is_write)
+{
+    PCIBus *bus;
+    PCIBus *iommu_bus;
+    int devfn;
+
+    if (!dev->is_master ||
+            ((pasid != PCI_NO_PASID) && !pcie_pasid_enabled(dev))) {
+        return -EPERM;
+    }
+
+    if (!pcie_pri_enabled(dev)) {
+        return -EPERM;
+    }
+
+    pci_device_get_iommu_bus_devfn(dev, &bus, &iommu_bus, &devfn);
+    if (iommu_bus && iommu_bus->iommu_ops->pri_request_page) {
+        return iommu_bus->iommu_ops->pri_request_page(bus,
+                                                     iommu_bus->iommu_opaque,
+                                                     devfn, pasid, priv_req,
+                                                     exec_req, addr, lpig, prgi,
+                                                     is_read, is_write);
+    }
+
+    return -ENODEV;
+}
+
+int pci_pri_register_notifier(PCIDevice *dev, uint32_t pasid,
+                              IOMMUPRINotifier *notifier)
+{
+    PCIBus *bus;
+    PCIBus *iommu_bus;
+    int devfn;
+
+    if (!dev->is_master ||
+            ((pasid != PCI_NO_PASID) && !pcie_pasid_enabled(dev))) {
+        return -EPERM;
+    }
+
+    pci_device_get_iommu_bus_devfn(dev, &bus, &iommu_bus, &devfn);
+    if (iommu_bus && iommu_bus->iommu_ops->pri_register_notifier) {
+        iommu_bus->iommu_ops->pri_register_notifier(bus,
+                                                    iommu_bus->iommu_opaque,
+                                                    devfn, pasid, notifier);
+        return 0;
+    }
+
+    return -ENODEV;
+}
+
+void pci_pri_unregister_notifier(PCIDevice *dev, uint32_t pasid)
+{
+    PCIBus *bus;
+    PCIBus *iommu_bus;
+    int devfn;
+
+    pci_device_get_iommu_bus_devfn(dev, &bus, &iommu_bus, &devfn);
+    if (iommu_bus && iommu_bus->iommu_ops->pri_unregister_notifier) {
+        iommu_bus->iommu_ops->pri_unregister_notifier(bus,
+                                                      iommu_bus->iommu_opaque,
+                                                      devfn, pasid);
+    }
+}
+
 ssize_t pci_ats_request_translation(PCIDevice *dev, uint32_t pasid,
                                     bool priv_req, bool exec_req,
                                     hwaddr addr, size_t length,
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 5d72607ed5..a6854dad2b 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -375,6 +375,28 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range);
 
 void pci_device_deassert_intx(PCIDevice *dev);
 
+/* Page Request Interface */
+typedef enum {
+    IOMMU_PRI_RESP_SUCCESS,
+    IOMMU_PRI_RESP_INVALID_REQUEST,
+    IOMMU_PRI_RESP_FAILURE,
+} IOMMUPRIResponseCode;
+
+typedef struct IOMMUPRIResponse {
+    IOMMUPRIResponseCode response_code;
+    uint16_t prgi;
+} IOMMUPRIResponse;
+
+struct IOMMUPRINotifier;
+
+typedef void (*IOMMUPRINotify)(struct IOMMUPRINotifier *notifier,
+                               IOMMUPRIResponse *response);
+
+typedef struct IOMMUPRINotifier {
+    IOMMUPRINotify notify;
+} IOMMUPRINotifier;
+
+#define PCI_PRI_PRGI_MASK 0x1ffU
 
 /**
  * struct PCIIOMMUOps: callbacks structure for specific IOMMU handlers
@@ -536,6 +558,72 @@ typedef struct PCIIOMMUOps {
                                        IOMMUTLBEntry *result,
                                        size_t result_length,
                                        uint32_t *err_count);
+    /**
+     * @pri_register_notifier: setup the PRI completion callback.
+     *
+     * Callback required if devices are allowed to use the page request
+     * interface.
+     *
+     * @bus: the #PCIBus of the PCI device.
+     *
+     * @opaque: the data passed to pci_setup_iommu().
+     *
+     * @devfn: device and function number of the PCI device.
+     *
+     * @pasid: the pasid of the address space to track.
+     *
+     * @notifier: the notifier to register.
+     */
+    void (*pri_register_notifier)(PCIBus *bus, void *opaque, int devfn,
+                                  uint32_t pasid, IOMMUPRINotifier *notifier);
+    /**
+     * @pri_unregister_notifier: remove the PRI completion callback.
+     *
+     * Callback required if devices are allowed to use the page request
+     * interface.
+     *
+     * @bus: the #PCIBus of the PCI device.
+     *
+     * @opaque: the data passed to pci_setup_iommu().
+     *
+     * @devfn: device and function number of the PCI device.
+     *
+     * @pasid: the pasid of the address space to stop tracking.
+     */
+    void (*pri_unregister_notifier)(PCIBus *bus, void *opaque, int devfn,
+                                    uint32_t pasid);
+    /**
+     * @pri_request_page: issue a PRI request.
+     *
+     * Callback required if devices are allowed to use the page request
+     * interface.
+     *
+     * @bus: the #PCIBus of the PCI device.
+     *
+     * @opaque: the data passed to pci_setup_iommu().
+     *
+     * @devfn: device and function number of the PCI device.
+     *
+     * @pasid: the pasid of the address space to use for the request.
+     *
+     * @priv_req: privileged mode bit (PASID TLP).
+     *
+     * @exec_req: execute request bit (PASID TLP).
+     *
+     * @addr: untranslated address of the requested page.
+     *
+     * @lpig: last page in group.
+     *
+     * @prgi: page request group index.
+     *
+     * @is_read: request read access.
+     *
+     * @is_write: request write access.
+     */
+    int (*pri_request_page)(PCIBus *bus, void *opaque, int devfn,
+                            uint32_t pasid, bool priv_req, bool exec_req,
+                            hwaddr addr, bool lpig, uint16_t prgi, bool is_read,
+                            bool is_write);
 } PCIIOMMUOps;
 
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
@@ -595,6 +683,48 @@ ssize_t pci_ats_request_translation(PCIDevice *dev, uint32_t pasid,
                                     size_t result_length,
                                     uint32_t *err_count);
 
+/**
+ * pci_pri_request_page: perform a PRI request.
+ *
+ * Returns 0 if the PRI request has been sent to the guest OS,
+ * an error code otherwise.
+ *
+ * @dev: the PRI-capable PCI device.
+ * @pasid: the pasid of the address space in which the translation will be done.
+ * @priv_req: privileged mode bit (PASID TLP).
+ * @exec_req: execute request bit (PASID TLP).
+ * @addr: untranslated address of the requested page.
+ * @lpig: last page in group.
+ * @prgi: page request group index.
+ * @is_read: request read access.
+ * @is_write: request write access.
+ */
+int pci_pri_request_page(PCIDevice *dev, uint32_t pasid, bool priv_req,
+                         bool exec_req, hwaddr addr, bool lpig,
+                         uint16_t prgi, bool is_read, bool is_write);
+
+/**
+ * pci_pri_register_notifier: register the PRI callback for a given address
+ * space.
+ *
+ * Returns 0 on success, an error code otherwise.
+ *
+ * @dev: the PRI-capable PCI device.
+ * @pasid: the pasid of the address space to track.
+ * @notifier: the notifier to register.
+ */
+int pci_pri_register_notifier(PCIDevice *dev, uint32_t pasid,
+                              IOMMUPRINotifier *notifier);
+
+/**
+ * pci_pri_unregister_notifier: remove the PRI callback from a given address
+ * space.
+ *
+ * @dev: the PRI-capable PCI device.
+ * @pasid: the pasid of the address space to stop tracking.
+ */
+void pci_pri_unregister_notifier(PCIDevice *dev, uint32_t pasid);
+
 /**
  * pci_iommu_register_iotlb_notifier: register a notifier for changes to
  * IOMMU translation entries in a specific address space.
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 00/11] SVM API declaration for emulated devices
  2025-05-20  7:18 [PATCH 00/11] SVM API declaration for emulated devices CLEMENT MATHIEU--DRIF
                   ` (10 preceding siblings ...)
  2025-05-20  7:19 ` [PATCH 11/11] pci: Add a PCI-level API for PRI CLEMENT MATHIEU--DRIF
@ 2025-06-05  5:08 ` CLEMENT MATHIEU--DRIF
  11 siblings, 0 replies; 13+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-06-05  5:08 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: pbonzini@redhat.com, peterx@redhat.com, david@redhat.com,
	philmd@linaro.org, mst@redhat.com, marcel.apfelbaum@gmail.com,
	jason.chien@sifive.com, Tomasz Jeznach, dbarboza@ventanamicro.com

Just cc'ing riscv team

Thanks
 >cmd

On 20/05/2025 9:18 am, CLEMENT MATHIEU--DRIF wrote:
> This patch set belongs to a list of series that add SVM support in VT-d.
>
> Here we focus on introducing a common PCI-level API for ATS and PRI to be
> used by virtual devices.
>
> The API introduced in this series is mainly based on the PCIe Gen 5 spec.
>
> What is ATS?
> ''''''''''''
>
> ATS (Address Translation Service) is a PCIe-level protocol that
> enables PCIe devices to query an IOMMU for virtual to physical
> address translations in a specific address space (such as a userspace
> process address space). When a device receives translation responses
> from an IOMMU, it may decide to store them in an internal cache,
> often known as "ATC" (Address Translation Cache) or "Device IOTLB".
> To keep page tables and caches consistent, the IOMMU is allowed to
> send asynchronous invalidation requests to its client devices.
>
> What is PRI?
> ''''''''''''
>
> PRI (Page Request Interface) is a PCIe-level protocol that
> enables PCIe devices to request page fault resolutions to
> the kernel through an IOMMU. PRI combined with ATS are the
> 2 cornerstones of a technology called SVM (Shared Virtual
> Memory) or SVA (Shared Virtual Addressing) which allows
> PCIe devices to read to and write from the memory of
> userspace applications without requiring page pinning.
>
> Here is a link to our GitHub repository that contains:
>      - Qemu with all the patches for SVM in VT-d
>          - ATS
>          - PRI
>          - Device IOTLB invalidations
>          - Requests with already pre-translated addresses
>      - A demo device
>      - A simple driver for the demo device
>      - A userspace program (for testing and demonstration purposes)
>
> https://github.com/BullSequana/Qemu-in-guest-SVM-demo
>
> Clement Mathieu--Drif (11):
>    pcie: Add helper to declare PASID capability for a pcie device
>    pcie: Helper functions to check if PASID is enabled
>    pcie: Helper function to check if ATS is enabled
>    pcie: Add a helper to declare the PRI capability for a pcie device
>    pcie: Helper functions to check to check if PRI is enabled
>    pci: Cache the bus mastering status in the device
>    pci: Add an API to get IOMMU's min page size and virtual address width
>    memory: Store user data pointer in the IOMMU notifiers
>    pci: Add a pci-level initialization function for IOMMU notifiers
>    pci: Add a pci-level API for ATS
>    pci: Add a PCI-level API for PRI
>
>   hw/pci/pci.c                | 204 +++++++++++++++++++++--
>   hw/pci/pcie.c               |  78 +++++++++
>   include/hw/pci/pci.h        | 315 ++++++++++++++++++++++++++++++++++++
>   include/hw/pci/pci_device.h |   1 +
>   include/hw/pci/pcie.h       |  13 +-
>   include/hw/pci/pcie_regs.h  |   8 +
>   include/system/memory.h     |   1 +
>   7 files changed, 609 insertions(+), 11 deletions(-)
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-06-05  5:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-20  7:18 [PATCH 00/11] SVM API declaration for emulated devices CLEMENT MATHIEU--DRIF
2025-05-20  7:18 ` [PATCH 02/11] pcie: Helper functions to check if PASID is enabled CLEMENT MATHIEU--DRIF
2025-05-20  7:18 ` [PATCH 01/11] pcie: Add helper to declare PASID capability for a pcie device CLEMENT MATHIEU--DRIF
2025-05-20  7:18 ` [PATCH 03/11] pcie: Helper function to check if ATS is enabled CLEMENT MATHIEU--DRIF
2025-05-20  7:18 ` [PATCH 04/11] pcie: Add a helper to declare the PRI capability for a pcie device CLEMENT MATHIEU--DRIF
2025-05-20  7:18 ` [PATCH 05/11] pcie: Helper functions to check to check if PRI is enabled CLEMENT MATHIEU--DRIF
2025-05-20  7:18 ` [PATCH 06/11] pci: Cache the bus mastering status in the device CLEMENT MATHIEU--DRIF
2025-05-20  7:18 ` [PATCH 07/11] pci: Add an API to get IOMMU's min page size and virtual address width CLEMENT MATHIEU--DRIF
2025-05-20  7:19 ` [PATCH 08/11] memory: Store user data pointer in the IOMMU notifiers CLEMENT MATHIEU--DRIF
2025-05-20  7:19 ` [PATCH 09/11] pci: Add a pci-level initialization function for " CLEMENT MATHIEU--DRIF
2025-05-20  7:19 ` [PATCH 10/11] pci: Add a pci-level API for ATS CLEMENT MATHIEU--DRIF
2025-05-20  7:19 ` [PATCH 11/11] pci: Add a PCI-level API for PRI CLEMENT MATHIEU--DRIF
2025-06-05  5:08 ` [PATCH 00/11] SVM API declaration for emulated devices CLEMENT MATHIEU--DRIF

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).