qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH ats_vtd v5 00/22] ATS support for VT-d
@ 2024-06-03  5:59 CLEMENT MATHIEU--DRIF
  2024-07-01 20:02 ` Michael S. Tsirkin
  0 siblings, 1 reply; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-06-03  5:59 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

This series belongs to a list of series that add SVM support for VT-d.

As a starting point, we use the series called 'intel_iommu: Enable stage-1 translation' (rfc2) by Zhenzhong Duan and Yi Liu.

Here we focus on the implementation of ATS support in the IOMMU and on a PCI-level
API for ATS to be used by virtual devices.

This work is based on the VT-d specification version 4.1 (March 2023).
Here is a link to a GitHub repository where you can find the following elements :
    - Qemu with all the patches for SVM
        - ATS
        - PRI
        - Device IOTLB invalidations
        - Requests with already translated addresses
    - A demo device
    - A simple driver for the demo device
    - A userspace program (for testing and demonstration purposes)

https://github.com/BullSequana/Qemu-in-guest-SVM-demo

v2
    - handle huge pages better by detecting the page table level at which the translation errors occur
    - Changes after review by ZhenZhong Duan :
    	- Set the access bit after checking permissions
    	- helper for PASID and ATS : make the commit message more accurate ('present' replaced with 'enabled')
    	- pcie_pasid_init: add PCI_PASID_CAP_WIDTH_SHIFT and use it instead of PCI_EXT_CAP_PASID_SIZEOF for shifting the pasid width when preparing the capability register
    	- pci: do not check pci_bus_bypass_iommu after calling pci_device_get_iommu_bus_devfn
    	- do not alter formatting of IOMMUTLBEntry declaration
    	- vtd_iova_fl_check_canonical : directly use s->aw_bits instead of aw for the sake of clarity

v3
    - rebase on new version of Zhenzhong's flts implementation
    - fix the atc lookup operation (check the mask before returning an entry)
    - add a unit test for the ATC
    - store a user pointer in the iommu notifiers to simplify the implementation of svm devices
    Changes after review by Zhenzhong :
    	- store the input pasid instead of rid2pasid when returning an entry after a translation
    	- split the ATC implementation and its unit tests

v4
    Changes after internal review
    	- Fix the nowrite optimization, an ATS translation without the nowrite flag should not fail when the write permission is not set

v5
    Changes after review by Philippe :
    	- change the type of 'level' to unsigned in vtd_lookup_iotlb



Clément Mathieu--Drif (22):
  intel_iommu: fix FRCD construction macro.
  intel_iommu: make types match
  intel_iommu: return page walk level even when the translation fails
  intel_iommu: do not consider wait_desc as an invalid descriptor
  memory: add permissions in IOMMUAccessFlags
  pcie: add helper to declare PASID capability for a pcie device
  pcie: helper functions to check if PASID and ATS are enabled
  intel_iommu: declare supported PASID size
  pci: cache the bus mastering status in the device
  pci: add IOMMU operations to get address spaces and memory regions
    with PASID
  memory: store user data pointer in the IOMMU notifiers
  pci: add a pci-level initialization function for iommu notifiers
  intel_iommu: implement the get_address_space_pasid iommu operation
  intel_iommu: implement the get_memory_region_pasid iommu operation
  memory: Allow to store the PASID in IOMMUTLBEntry
  intel_iommu: fill the PASID field when creating an instance of
    IOMMUTLBEntry
  atc: generic ATC that can be used by PCIe devices that support SVM
  atc: add unit tests
  memory: add an API for ATS support
  pci: add a pci-level API for ATS
  intel_iommu: set the address mask even when a translation fails
  intel_iommu: add support for ATS

 hw/i386/intel_iommu.c                     | 142 +++++-
 hw/i386/intel_iommu_internal.h            |   6 +-
 hw/pci/pci.c                              | 127 +++++-
 hw/pci/pcie.c                             |  42 ++
 include/exec/memory.h                     |  51 ++-
 include/hw/i386/intel_iommu.h             |   2 +-
 include/hw/pci/pci.h                      | 101 +++++
 include/hw/pci/pci_device.h               |   1 +
 include/hw/pci/pcie.h                     |   9 +-
 include/hw/pci/pcie_regs.h                |   3 +
 include/standard-headers/linux/pci_regs.h |   1 +
 system/memory.c                           |  20 +
 tests/unit/meson.build                    |   1 +
 tests/unit/test-atc.c                     | 527 ++++++++++++++++++++++
 util/atc.c                                | 211 +++++++++
 util/atc.h                                | 117 +++++
 util/meson.build                          |   1 +
 17 files changed, 1330 insertions(+), 32 deletions(-)
 create mode 100644 tests/unit/test-atc.c
 create mode 100644 util/atc.c
 create mode 100644 util/atc.h

-- 
2.45.1

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
  2024-06-03  5:59 CLEMENT MATHIEU--DRIF
@ 2024-07-01 20:02 ` Michael S. Tsirkin
  2024-07-02  5:57   ` CLEMENT MATHIEU--DRIF
  0 siblings, 1 reply; 61+ messages in thread
From: Michael S. Tsirkin @ 2024-07-01 20:02 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com,
	zhenzhong.duan@intel.com, kevin.tian@intel.com,
	yi.l.liu@intel.com, joao.m.martins@oracle.com, peterx@redhat.com

On Mon, Jun 03, 2024 at 05:59:38AM +0000, CLEMENT MATHIEU--DRIF wrote:
> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> 
> This series belongs to a list of series that add SVM support for VT-d.
> 
> As a starting point, we use the series called 'intel_iommu: Enable stage-1 translation' (rfc2) by Zhenzhong Duan and Yi Liu.
> 
> Here we focus on the implementation of ATS support in the IOMMU and on a PCI-level
> API for ATS to be used by virtual devices.
> 
> This work is based on the VT-d specification version 4.1 (March 2023).
> Here is a link to a GitHub repository where you can find the following elements :
>     - Qemu with all the patches for SVM
>         - ATS
>         - PRI
>         - Device IOTLB invalidations
>         - Requests with already translated addresses
>     - A demo device
>     - A simple driver for the demo device
>     - A userspace program (for testing and demonstration purposes)
> 
> https://github.com/BullSequana/Qemu-in-guest-SVM-demo

I will merge, but could you please resend this using git format-patch
for formatting?  The patches have trailing CRs and don't show which sha1
they are for, which makes re-applying them after each change painful.


> v2
>     - handle huge pages better by detecting the page table level at which the translation errors occur
>     - Changes after review by ZhenZhong Duan :
>     	- Set the access bit after checking permissions
>     	- helper for PASID and ATS : make the commit message more accurate ('present' replaced with 'enabled')
>     	- pcie_pasid_init: add PCI_PASID_CAP_WIDTH_SHIFT and use it instead of PCI_EXT_CAP_PASID_SIZEOF for shifting the pasid width when preparing the capability register
>     	- pci: do not check pci_bus_bypass_iommu after calling pci_device_get_iommu_bus_devfn
>     	- do not alter formatting of IOMMUTLBEntry declaration
>     	- vtd_iova_fl_check_canonical : directly use s->aw_bits instead of aw for the sake of clarity
> 
> v3
>     - rebase on new version of Zhenzhong's flts implementation
>     - fix the atc lookup operation (check the mask before returning an entry)
>     - add a unit test for the ATC
>     - store a user pointer in the iommu notifiers to simplify the implementation of svm devices
>     Changes after review by Zhenzhong :
>     	- store the input pasid instead of rid2pasid when returning an entry after a translation
>     	- split the ATC implementation and its unit tests
> 
> v4
>     Changes after internal review
>     	- Fix the nowrite optimization, an ATS translation without the nowrite flag should not fail when the write permission is not set
> 
> v5
>     Changes after review by Philippe :
>     	- change the type of 'level' to unsigned in vtd_lookup_iotlb
> 
> 
> 
> Clément Mathieu--Drif (22):
>   intel_iommu: fix FRCD construction macro.
>   intel_iommu: make types match
>   intel_iommu: return page walk level even when the translation fails
>   intel_iommu: do not consider wait_desc as an invalid descriptor
>   memory: add permissions in IOMMUAccessFlags
>   pcie: add helper to declare PASID capability for a pcie device
>   pcie: helper functions to check if PASID and ATS are enabled
>   intel_iommu: declare supported PASID size
>   pci: cache the bus mastering status in the device
>   pci: add IOMMU operations to get address spaces and memory regions
>     with PASID
>   memory: store user data pointer in the IOMMU notifiers
>   pci: add a pci-level initialization function for iommu notifiers
>   intel_iommu: implement the get_address_space_pasid iommu operation
>   intel_iommu: implement the get_memory_region_pasid iommu operation
>   memory: Allow to store the PASID in IOMMUTLBEntry
>   intel_iommu: fill the PASID field when creating an instance of
>     IOMMUTLBEntry
>   atc: generic ATC that can be used by PCIe devices that support SVM
>   atc: add unit tests
>   memory: add an API for ATS support
>   pci: add a pci-level API for ATS
>   intel_iommu: set the address mask even when a translation fails
>   intel_iommu: add support for ATS
> 
>  hw/i386/intel_iommu.c                     | 142 +++++-
>  hw/i386/intel_iommu_internal.h            |   6 +-
>  hw/pci/pci.c                              | 127 +++++-
>  hw/pci/pcie.c                             |  42 ++
>  include/exec/memory.h                     |  51 ++-
>  include/hw/i386/intel_iommu.h             |   2 +-
>  include/hw/pci/pci.h                      | 101 +++++
>  include/hw/pci/pci_device.h               |   1 +
>  include/hw/pci/pcie.h                     |   9 +-
>  include/hw/pci/pcie_regs.h                |   3 +
>  include/standard-headers/linux/pci_regs.h |   1 +
>  system/memory.c                           |  20 +
>  tests/unit/meson.build                    |   1 +
>  tests/unit/test-atc.c                     | 527 ++++++++++++++++++++++
>  util/atc.c                                | 211 +++++++++
>  util/atc.h                                | 117 +++++
>  util/meson.build                          |   1 +
>  17 files changed, 1330 insertions(+), 32 deletions(-)
>  create mode 100644 tests/unit/test-atc.c
>  create mode 100644 util/atc.c
>  create mode 100644 util/atc.h
> 
> -- 
> 2.45.1



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 00/22] ATS support for VT-d
@ 2024-07-02  5:52 CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 01/22] intel_iommu: fix FRCD construction macro CLEMENT MATHIEU--DRIF
                   ` (24 more replies)
  0 siblings, 25 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	Clement Mathieu--Drif

From: Clement Mathieu--Drif <cmdetu@gmail.com>

This series belongs to a list of series that add SVM support for VT-d.

As a starting point, we use the series called 'intel_iommu: Enable stage-1 translation' (rfc2) by Zhenzhong Duan and Yi Liu.

Here we focus on the implementation of ATS support in the IOMMU and on a PCI-level
API for ATS to be used by virtual devices.

This work is based on the VT-d specification version 4.1 (March 2023).
Here is a link to a GitHub repository where you can find the following elements :
    - Qemu with all the patches for SVM
        - ATS
        - PRI
        - Device IOTLB invalidations
        - Requests with already translated addresses
    - A demo device
    - A simple driver for the demo device
    - A userspace program (for testing and demonstration purposes)

https://github.com/BullSequana/Qemu-in-guest-SVM-demo

v2
    - handle huge pages better by detecting the page table level at which the translation errors occur
    - Changes after review by ZhenZhong Duan :
    	- Set the access bit after checking permissions
    	- helper for PASID and ATS : make the commit message more accurate ('present' replaced with 'enabled')
    	- pcie_pasid_init: add PCI_PASID_CAP_WIDTH_SHIFT and use it instead of PCI_EXT_CAP_PASID_SIZEOF for shifting the pasid width when preparing the capability register
    	- pci: do not check pci_bus_bypass_iommu after calling pci_device_get_iommu_bus_devfn
    	- do not alter formatting of IOMMUTLBEntry declaration
    	- vtd_iova_fl_check_canonical : directly use s->aw_bits instead of aw for the sake of clarity

v3
    - rebase on new version of Zhenzhong's flts implementation
    - fix the atc lookup operation (check the mask before returning an entry)
    - add a unit test for the ATC
    - store a user pointer in the iommu notifiers to simplify the implementation of svm devices
    Changes after review by Zhenzhong :
    	- store the input pasid instead of rid2pasid when returning an entry after a translation
    	- split the ATC implementation and its unit tests

v4
    Changes after internal review
    	- Fix the nowrite optimization, an ATS translation without the nowrite flag should not fail when the write permission is not set

v5
    Changes after review by Philippe :
    	- change the type of 'level' to unsigned in vtd_lookup_iotlb

Clément Mathieu--Drif (22):
  intel_iommu: fix FRCD construction macro.
  intel_iommu: make types match
  intel_iommu: return page walk level even when the translation fails
  intel_iommu: do not consider wait_desc as an invalid descriptor
  memory: add permissions in IOMMUAccessFlags
  pcie: add helper to declare PASID capability for a pcie device
  pcie: helper functions to check if PASID and ATS are enabled
  intel_iommu: declare supported PASID size
  pci: cache the bus mastering status in the device
  pci: add IOMMU operations to get address spaces and memory regions
    with PASID
  memory: store user data pointer in the IOMMU notifiers
  pci: add a pci-level initialization function for iommu notifiers
  intel_iommu: implement the get_address_space_pasid iommu operation
  intel_iommu: implement the get_memory_region_pasid iommu operation
  memory: Allow to store the PASID in IOMMUTLBEntry
  intel_iommu: fill the PASID field when creating an instance of
    IOMMUTLBEntry
  atc: generic ATC that can be used by PCIe devices that support SVM
  atc: add unit tests
  memory: add an API for ATS support
  pci: add a pci-level API for ATS
  intel_iommu: set the address mask even when a translation fails
  intel_iommu: add support for ATS

 hw/i386/intel_iommu.c                     | 146 +++++-
 hw/i386/intel_iommu_internal.h            |   6 +-
 hw/pci/pci.c                              | 127 +++++-
 hw/pci/pcie.c                             |  42 ++
 include/exec/memory.h                     |  51 ++-
 include/hw/i386/intel_iommu.h             |   2 +-
 include/hw/pci/pci.h                      | 101 +++++
 include/hw/pci/pci_device.h               |   1 +
 include/hw/pci/pcie.h                     |   9 +-
 include/hw/pci/pcie_regs.h                |   3 +
 include/standard-headers/linux/pci_regs.h |   1 +
 system/memory.c                           |  20 +
 tests/unit/meson.build                    |   1 +
 tests/unit/test-atc.c                     | 527 ++++++++++++++++++++++
 util/atc.c                                | 211 +++++++++
 util/atc.h                                | 117 +++++
 util/meson.build                          |   1 +
 17 files changed, 1332 insertions(+), 34 deletions(-)
 create mode 100644 tests/unit/test-atc.c
 create mode 100644 util/atc.c
 create mode 100644 util/atc.h

-- 
2.45.2

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 01/22] intel_iommu: fix FRCD construction macro.
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02 13:01   ` Yi Liu
  2024-07-02  5:52 ` [PATCH ats_vtd v5 02/22] intel_iommu: make types match CLEMENT MATHIEU--DRIF
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

The constant must be unsigned, otherwise the two's complement
overrides the other fields when a PASID is present

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/i386/intel_iommu_internal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index e8396575eb..b19f14ef63 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -272,7 +272,7 @@
 /* For the low 64-bit of 128-bit */
 #define VTD_FRCD_FI(val)        ((val) & ~0xfffULL)
 #define VTD_FRCD_PV(val)        (((val) & 0xffffULL) << 40)
-#define VTD_FRCD_PP(val)        (((val) & 0x1) << 31)
+#define VTD_FRCD_PP(val)        (((val) & 0x1ULL) << 31)
 #define VTD_FRCD_IR_IDX(val)    (((val) & 0xffffULL) << 48)
 
 /* DMA Remapping Fault Conditions */
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 02/22] intel_iommu: make types match
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 01/22] intel_iommu: fix FRCD construction macro CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02 13:20   ` Yi Liu
  2024-07-02  5:52 ` [PATCH ats_vtd v5 03/22] intel_iommu: return page walk level even when the translation fails CLEMENT MATHIEU--DRIF
                   ` (22 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

The 'level' field in vtd_iotlb_key is an unsigned integer.
We don't need to store level as an int in vtd_lookup_iotlb.

VTDIOTLBPageInvInfo.mask is used in binary operations with addresses.

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/i386/intel_iommu.c          | 2 +-
 hw/i386/intel_iommu_internal.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c3c0ecca71..c6474ae735 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -417,7 +417,7 @@ static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
 {
     struct vtd_iotlb_key key;
     VTDIOTLBEntry *entry;
-    int level;
+    unsigned level;
 
     for (level = VTD_PT_LEVEL; level < VTD_PML4_LEVEL; level++) {
         key.gfn = vtd_get_iotlb_gfn(addr, level);
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index b19f14ef63..bd20746318 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -506,7 +506,7 @@ struct VTDIOTLBPageInvInfo {
     uint16_t domain_id;
     uint32_t pasid;
     uint64_t addr;
-    uint8_t mask;
+    uint64_t mask;
 };
 typedef struct VTDIOTLBPageInvInfo VTDIOTLBPageInvInfo;
 
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 03/22] intel_iommu: return page walk level even when the translation fails
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 01/22] intel_iommu: fix FRCD construction macro CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 02/22] intel_iommu: make types match CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-03 11:59   ` Yi Liu
  2024-07-02  5:52 ` [PATCH ats_vtd v5 05/22] memory: add permissions in IOMMUAccessFlags CLEMENT MATHIEU--DRIF
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

We use this information in vtd_do_iommu_translate to populate the
IOMMUTLBEntry and indicate the correct page mask. This prevents ATS
devices from sending many useless translation requests when a megapage
or gigapage iova is not mapped to a physical address.

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/i386/intel_iommu.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c6474ae735..98996ededc 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2096,9 +2096,9 @@ static int vtd_iova_to_flpte(IntelIOMMUState *s, VTDContextEntry *ce,
                              uint32_t pasid)
 {
     dma_addr_t addr = vtd_get_iova_pgtbl_base(s, ce, pasid);
-    uint32_t level = vtd_get_iova_level(s, ce, pasid);
     uint32_t offset;
     uint64_t flpte;
+    *flpte_level = vtd_get_iova_level(s, ce, pasid);
 
     if (!vtd_iova_fl_check_canonical(s, iova, ce, pasid)) {
         error_report_once("%s: detected non canonical IOVA (iova=0x%" PRIx64 ","
@@ -2107,11 +2107,11 @@ static int vtd_iova_to_flpte(IntelIOMMUState *s, VTDContextEntry *ce,
     }
 
     while (true) {
-        offset = vtd_iova_level_offset(iova, level);
+        offset = vtd_iova_level_offset(iova, *flpte_level);
         flpte = vtd_get_pte(addr, offset);
 
         if (flpte == (uint64_t)-1) {
-            if (level == vtd_get_iova_level(s, ce, pasid)) {
+            if (*flpte_level == vtd_get_iova_level(s, ce, pasid)) {
                 /* Invalid programming of context-entry */
                 return -VTD_FR_CONTEXT_ENTRY_INV;
             } else {
@@ -2128,11 +2128,11 @@ static int vtd_iova_to_flpte(IntelIOMMUState *s, VTDContextEntry *ce,
         if (is_write && !(flpte & VTD_FL_RW_MASK)) {
             return -VTD_FR_WRITE;
         }
-        if (vtd_flpte_nonzero_rsvd(flpte, level)) {
+        if (vtd_flpte_nonzero_rsvd(flpte, *flpte_level)) {
             error_report_once("%s: detected flpte reserved non-zero "
                               "iova=0x%" PRIx64 ", level=0x%" PRIx32
                               "flpte=0x%" PRIx64 ", pasid=0x%" PRIX32 ")",
-                              __func__, iova, level, flpte, pasid);
+                              __func__, iova, *flpte_level, flpte, pasid);
             return -VTD_FR_PAGING_ENTRY_RSVD;
         }
 
@@ -2140,19 +2140,18 @@ static int vtd_iova_to_flpte(IntelIOMMUState *s, VTDContextEntry *ce,
             return -VTD_FR_FS_BIT_UPDATE_FAILED;
         }
 
-        if (vtd_is_last_pte(flpte, level)) {
+        if (vtd_is_last_pte(flpte, *flpte_level)) {
             if (is_write &&
                 (vtd_set_flag_in_pte(addr, offset, flpte, VTD_FL_D) !=
                                                                     MEMTX_OK)) {
                     return -VTD_FR_FS_BIT_UPDATE_FAILED;
             }
             *flptep = flpte;
-            *flpte_level = level;
             return 0;
         }
 
         addr = vtd_get_pte_addr(flpte, aw_bits);
-        level--;
+        (*flpte_level)--;
     }
 }
 
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 04/22] intel_iommu: do not consider wait_desc as an invalid descriptor
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (3 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 05/22] memory: add permissions in IOMMUAccessFlags CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02 13:33   ` Yi Liu
  2024-07-02  5:52 ` [PATCH ats_vtd v5 06/22] pcie: add helper to declare PASID capability for a pcie device CLEMENT MATHIEU--DRIF
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/i386/intel_iommu.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 98996ededc..71cebe2fd3 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3500,6 +3500,11 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
     } else if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
         /* Interrupt flag */
         vtd_generate_completion_event(s);
+    } else if (inv_desc->lo & VTD_INV_DESC_WAIT_FN) {
+        /*
+         * SW = 0, IF = 0, FN = 1
+         * Nothing to do as we process the events sequentially
+         */
     } else {
         error_report_once("%s: invalid wait desc: hi=%"PRIx64", lo=%"PRIx64
                           " (unknown type)", __func__, inv_desc->hi,
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 05/22] memory: add permissions in IOMMUAccessFlags
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (2 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 03/22] intel_iommu: return page walk level even when the translation fails CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 04/22] intel_iommu: do not consider wait_desc as an invalid descriptor CLEMENT MATHIEU--DRIF
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

This will be necessary for devices implementing ATS.
We also define a new macro IOMMU_ACCESS_FLAG_FULL in addition to
IOMMU_ACCESS_FLAG to support more access flags.
IOMMU_ACCESS_FLAG is kept for convenience and backward compatibility.

Here are the flags added (defined by the PCIe 5 specification) :
    - Execute Requested
    - Privileged Mode Requested
    - Global
    - Untranslated Only

IOMMU_ACCESS_FLAG sets the additional flags to 0

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 include/exec/memory.h | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 1be58f694c..aa8e114e77 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -110,15 +110,34 @@ struct MemoryRegionSection {
 
 typedef struct IOMMUTLBEntry IOMMUTLBEntry;
 
-/* See address_space_translate: bit 0 is read, bit 1 is write.  */
+/*
+ * See address_space_translate:
+ *      - bit 0 : read
+ *      - bit 1 : write
+ *      - bit 2 : exec
+ *      - bit 3 : priv
+ *      - bit 4 : global
+ *      - bit 5 : untranslated only
+ */
 typedef enum {
     IOMMU_NONE = 0,
     IOMMU_RO   = 1,
     IOMMU_WO   = 2,
     IOMMU_RW   = 3,
+    IOMMU_EXEC = 4,
+    IOMMU_PRIV = 8,
+    IOMMU_GLOBAL = 16,
+    IOMMU_UNTRANSLATED_ONLY = 32,
 } IOMMUAccessFlags;
 
-#define IOMMU_ACCESS_FLAG(r, w) (((r) ? IOMMU_RO : 0) | ((w) ? IOMMU_WO : 0))
+#define IOMMU_ACCESS_FLAG(r, w)     (((r) ? IOMMU_RO : 0) | \
+                                    ((w) ? IOMMU_WO : 0))
+#define IOMMU_ACCESS_FLAG_FULL(r, w, x, p, g, uo) \
+                                    (IOMMU_ACCESS_FLAG(r, w) | \
+                                    ((x) ? IOMMU_EXEC : 0) | \
+                                    ((p) ? IOMMU_PRIV : 0) | \
+                                    ((g) ? IOMMU_GLOBAL : 0) | \
+                                    ((uo) ? IOMMU_UNTRANSLATED_ONLY : 0))
 
 struct IOMMUTLBEntry {
     AddressSpace    *target_as;
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 06/22] pcie: add helper to declare PASID capability for a pcie device
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (4 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 04/22] intel_iommu: do not consider wait_desc as an invalid descriptor CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-03 12:04   ` Yi Liu
  2024-07-02  5:52 ` [PATCH ats_vtd v5 07/22] pcie: helper functions to check if PASID and ATS are enabled CLEMENT MATHIEU--DRIF
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/pci/pcie.c                             | 24 +++++++++++++++++++++++
 include/hw/pci/pcie.h                     |  6 +++++-
 include/hw/pci/pcie_regs.h                |  3 +++
 include/standard-headers/linux/pci_regs.h |  1 +
 4 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 4b2f0805c6..d6a052b616 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -1177,3 +1177,27 @@ void pcie_acs_reset(PCIDevice *dev)
         pci_set_word(dev->config + dev->exp.acs_cap + PCI_ACS_CTRL, 0);
     }
 }
+
+/* PASID */
+void pcie_pasid_init(PCIDevice *dev, uint16_t offset, uint8_t pasid_width,
+                     bool exec_perm, bool priv_mod)
+{
+    assert(pasid_width <= PCI_EXT_CAP_PASID_MAX_WIDTH);
+    static const uint16_t control_reg_rw_mask = 0x07;
+    uint16_t capability_reg = pasid_width;
+
+    pcie_add_capability(dev, PCI_EXT_CAP_ID_PASID, PCI_PASID_VER, offset,
+                        PCI_EXT_CAP_PASID_SIZEOF);
+
+    capability_reg <<= PCI_PASID_CAP_WIDTH_SHIFT;
+    capability_reg |= exec_perm ? PCI_PASID_CAP_EXEC : 0;
+    capability_reg |= priv_mod  ? PCI_PASID_CAP_PRIV : 0;
+    pci_set_word(dev->config + offset + PCI_PASID_CAP, capability_reg);
+
+    /* Everything is disabled by default */
+    pci_set_word(dev->config + offset + PCI_PASID_CTRL, 0);
+
+    pci_set_word(dev->wmask + offset + PCI_PASID_CTRL, control_reg_rw_mask);
+
+    dev->exp.pasid_cap = offset;
+}
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 5eddb90976..b870958c99 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -72,8 +72,9 @@ struct PCIExpressDevice {
     uint16_t aer_cap;
     PCIEAERLog aer_log;
 
-    /* Offset of ATS capability in config space */
+    /* Offset of ATS and PASID capabilities in config space */
     uint16_t ats_cap;
+    uint16_t pasid_cap;
 
     /* ACS */
     uint16_t acs_cap;
@@ -150,4 +151,7 @@ void pcie_cap_slot_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
                              Error **errp);
 void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev,
                                      DeviceState *dev, Error **errp);
+
+void pcie_pasid_init(PCIDevice *dev, uint16_t offset, uint8_t pasid_width,
+                     bool exec_perm, bool priv_mod);
 #endif /* QEMU_PCIE_H */
diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
index 9d3b6868dc..0a86598f80 100644
--- a/include/hw/pci/pcie_regs.h
+++ b/include/hw/pci/pcie_regs.h
@@ -86,6 +86,9 @@ typedef enum PCIExpLinkWidth {
 #define PCI_ARI_VER                     1
 #define PCI_ARI_SIZEOF                  8
 
+/* PASID */
+#define PCI_PASID_VER                   1
+#define PCI_EXT_CAP_PASID_MAX_WIDTH     20
 /* AER */
 #define PCI_ERR_VER                     2
 #define PCI_ERR_SIZEOF                  0x48
diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
index a39193213f..406dce8e82 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -935,6 +935,7 @@
 #define  PCI_PASID_CAP_EXEC	0x0002	/* Exec permissions Supported */
 #define  PCI_PASID_CAP_PRIV	0x0004	/* Privilege Mode Supported */
 #define  PCI_PASID_CAP_WIDTH	0x1f00
+#define  PCI_PASID_CAP_WIDTH_SHIFT  8
 #define PCI_PASID_CTRL		0x06    /* PASID control register */
 #define  PCI_PASID_CTRL_ENABLE	0x0001	/* Enable bit */
 #define  PCI_PASID_CTRL_EXEC	0x0002	/* Exec permissions Enable */
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 08/22] intel_iommu: declare supported PASID size
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (6 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 07/22] pcie: helper functions to check if PASID and ATS are enabled CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 09/22] pci: cache the bus mastering status in the device CLEMENT MATHIEU--DRIF
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/i386/intel_iommu.c          | 2 +-
 hw/i386/intel_iommu_internal.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 71cebe2fd3..2a78fc823f 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -5860,7 +5860,7 @@ static void vtd_cap_init(IntelIOMMUState *s)
     }
 
     if (s->pasid) {
-        s->ecap |= VTD_ECAP_PASID;
+        s->ecap |= VTD_ECAP_PASID | VTD_ECAP_PSS;
     }
 }
 
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index bd20746318..117dc96d22 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -194,6 +194,7 @@
 #define VTD_ECAP_MHMV               (15ULL << 20)
 #define VTD_ECAP_NEST               (1ULL << 26)
 #define VTD_ECAP_SRS                (1ULL << 31)
+#define VTD_ECAP_PSS                (19ULL << 35)
 #define VTD_ECAP_PASID              (1ULL << 40)
 #define VTD_ECAP_SMTS               (1ULL << 43)
 #define VTD_ECAP_SLTS               (1ULL << 46)
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 07/22] pcie: helper functions to check if PASID and ATS are enabled
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (5 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 06/22] pcie: add helper to declare PASID capability for a pcie device CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 08/22] intel_iommu: declare supported PASID size CLEMENT MATHIEU--DRIF
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

ats_enabled and pasid_enabled check whether the capabilities are
present or not. If so, we read the configuration space to get
the status of the feature (enabled or not).

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/pci/pcie.c         | 18 ++++++++++++++++++
 include/hw/pci/pcie.h |  3 +++
 2 files changed, 21 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index d6a052b616..4efd84fed5 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -1201,3 +1201,21 @@ void pcie_pasid_init(PCIDevice *dev, uint16_t offset, uint8_t pasid_width,
 
     dev->exp.pasid_cap = offset;
 }
+
+bool pcie_pasid_enabled(const PCIDevice *dev)
+{
+    if (!pci_is_express(dev) || !dev->exp.pasid_cap) {
+        return false;
+    }
+    return (pci_get_word(dev->config + dev->exp.pasid_cap + PCI_PASID_CTRL) &
+                PCI_PASID_CTRL_ENABLE) != 0;
+}
+
+bool pcie_ats_enabled(const PCIDevice *dev)
+{
+    if (!pci_is_express(dev) || !dev->exp.ats_cap) {
+        return false;
+    }
+    return (pci_get_word(dev->config + dev->exp.ats_cap + PCI_ATS_CTRL) &
+                PCI_ATS_CTRL_ENABLE) != 0;
+}
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index b870958c99..0c127b29dc 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -154,4 +154,7 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev,
 
 void pcie_pasid_init(PCIDevice *dev, uint16_t offset, uint8_t pasid_width,
                      bool exec_perm, bool priv_mod);
+
+bool pcie_pasid_enabled(const PCIDevice *dev);
+bool pcie_ats_enabled(const PCIDevice *dev);
 #endif /* QEMU_PCIE_H */
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 09/22] pci: cache the bus mastering status in the device
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (7 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 08/22] intel_iommu: declare supported PASID size CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 10/22] pci: add IOMMU operations to get address spaces and memory regions with PASID CLEMENT MATHIEU--DRIF
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/pci/pci.c                | 24 ++++++++++++++----------
 include/hw/pci/pci_device.h |  1 +
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index c8a8aab306..51feede3cf 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -116,6 +116,12 @@ static GSequence *pci_acpi_index_list(void)
     return used_acpi_index_list;
 }
 
+static void pci_set_master(PCIDevice *d, bool enable)
+{
+    memory_region_set_enabled(&d->bus_master_enable_region, enable);
+    d->is_master = enable; /* cache the status */
+}
+
 static void pci_init_bus_master(PCIDevice *pci_dev)
 {
     AddressSpace *dma_as = pci_device_iommu_address_space(pci_dev);
@@ -123,7 +129,7 @@ static void pci_init_bus_master(PCIDevice *pci_dev)
     memory_region_init_alias(&pci_dev->bus_master_enable_region,
                              OBJECT(pci_dev), "bus master",
                              dma_as->root, 0, memory_region_size(dma_as->root));
-    memory_region_set_enabled(&pci_dev->bus_master_enable_region, false);
+    pci_set_master(pci_dev, false);
     memory_region_add_subregion(&pci_dev->bus_master_container_region, 0,
                                 &pci_dev->bus_master_enable_region);
 }
@@ -657,9 +663,8 @@ static int get_pci_config_device(QEMUFile *f, void *pv, size_t size,
         pci_bridge_update_mappings(PCI_BRIDGE(s));
     }
 
-    memory_region_set_enabled(&s->bus_master_enable_region,
-                              pci_get_word(s->config + PCI_COMMAND)
-                              & PCI_COMMAND_MASTER);
+    pci_set_master(s,
+                   pci_get_word(s->config + PCI_COMMAND) & PCI_COMMAND_MASTER);
 
     g_free(config);
     return 0;
@@ -1611,9 +1616,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int
 
     if (ranges_overlap(addr, l, PCI_COMMAND, 2)) {
         pci_update_irq_disabled(d, was_irq_disabled);
-        memory_region_set_enabled(&d->bus_master_enable_region,
-                                  (pci_get_word(d->config + PCI_COMMAND)
-                                   & PCI_COMMAND_MASTER) && d->has_power);
+        pci_set_master(d,
+                      (pci_get_word(d->config + PCI_COMMAND) &
+                            PCI_COMMAND_MASTER) && d->has_power);
     }
 
     msi_write_config(d, addr, val_in, l);
@@ -2888,9 +2893,8 @@ void pci_set_power(PCIDevice *d, bool state)
 
     d->has_power = state;
     pci_update_mappings(d);
-    memory_region_set_enabled(&d->bus_master_enable_region,
-                              (pci_get_word(d->config + PCI_COMMAND)
-                               & PCI_COMMAND_MASTER) && d->has_power);
+    pci_set_master(d, (pci_get_word(d->config + PCI_COMMAND)
+                        & PCI_COMMAND_MASTER) && d->has_power);
     if (!d->has_power) {
         pci_device_reset(d);
     }
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index d3dd0f64b2..7fa501569a 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -87,6 +87,7 @@ struct PCIDevice {
     char name[64];
     PCIIORegion io_regions[PCI_NUM_REGIONS];
     AddressSpace bus_master_as;
+    bool is_master;
     MemoryRegion bus_master_container_region;
     MemoryRegion bus_master_enable_region;
 
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 11/22] memory: store user data pointer in the IOMMU notifiers
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (9 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 10/22] pci: add IOMMU operations to get address spaces and memory regions with PASID CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 12/22] pci: add a pci-level initialization function for iommu notifiers CLEMENT MATHIEU--DRIF
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

This will help developers of svm devices to track a state

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 include/exec/memory.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index aa8e114e77..bf91c4bed7 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -203,6 +203,7 @@ struct IOMMUNotifier {
     hwaddr start;
     hwaddr end;
     int iommu_idx;
+    void *opaque;
     QLIST_ENTRY(IOMMUNotifier) node;
 };
 typedef struct IOMMUNotifier IOMMUNotifier;
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 10/22] pci: add IOMMU operations to get address spaces and memory regions with PASID
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (8 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 09/22] pci: cache the bus mastering status in the device CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 11/22] memory: store user data pointer in the IOMMU notifiers CLEMENT MATHIEU--DRIF
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/pci/pci.c         | 19 +++++++++++++++++++
 include/hw/pci/pci.h | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 51feede3cf..3fe47d4002 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2747,6 +2747,25 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
     return &address_space_memory;
 }
 
+AddressSpace *pci_device_iommu_address_space_pasid(PCIDevice *dev,
+                                                   uint32_t pasid)
+{
+    PCIBus *bus;
+    PCIBus *iommu_bus;
+    int devfn;
+
+    if (!dev->is_master || !pcie_pasid_enabled(dev) || pasid == PCI_NO_PASID) {
+        return NULL;
+    }
+
+    pci_device_get_iommu_bus_devfn(dev, &bus, &iommu_bus, &devfn);
+    if (iommu_bus && iommu_bus->iommu_ops->get_address_space_pasid) {
+        return iommu_bus->iommu_ops->get_address_space_pasid(bus,
+                                    iommu_bus->iommu_opaque, devfn, pasid);
+    }
+    return NULL;
+}
+
 bool pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *hiod,
                                  Error **errp)
 {
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index eb26cac810..ad7bd2ade5 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -385,6 +385,38 @@ typedef struct PCIIOMMUOps {
      * @devfn: device and function number
      */
     AddressSpace * (*get_address_space)(PCIBus *bus, void *opaque, int devfn);
+    /**
+     * @get_address_space_pasid: same as get_address_space but returns an
+     * address space with the requested PASID
+     *
+     * This callback is required for PASID-based operations
+     *
+     * @bus: the #PCIBus being accessed.
+     *
+     * @opaque: the data passed to pci_setup_iommu().
+     *
+     * @devfn: device and function number
+     *
+     * @pasid: the pasid associated with the requested memory region
+     */
+    AddressSpace * (*get_address_space_pasid)(PCIBus *bus, void *opaque,
+                                              int devfn, uint32_t pasid);
+    /**
+     * @get_memory_region_pasid: get the iommu memory region for a given
+     * device and pasid
+     *
+     * @bus: the #PCIBus being accessed.
+     *
+     * @opaque: the data passed to pci_setup_iommu().
+     *
+     * @devfn: device and function number
+     *
+     * @pasid: the pasid associated with the requested memory region
+     */
+    IOMMUMemoryRegion * (*get_memory_region_pasid)(PCIBus *bus,
+                                                   void *opaque,
+                                                   int devfn,
+                                                   uint32_t pasid);
     /**
      * @set_iommu_device: attach a HostIOMMUDevice to a vIOMMU
      *
@@ -420,6 +452,8 @@ typedef struct PCIIOMMUOps {
 } PCIIOMMUOps;
 
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
+AddressSpace *pci_device_iommu_address_space_pasid(PCIDevice *dev,
+                                                   uint32_t pasid);
 bool pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *hiod,
                                  Error **errp);
 void pci_device_unset_iommu_device(PCIDevice *dev);
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 12/22] pci: add a pci-level initialization function for iommu notifiers
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (10 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 11/22] memory: store user data pointer in the IOMMU notifiers CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 13/22] intel_iommu: implement the get_address_space_pasid iommu operation CLEMENT MATHIEU--DRIF
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

We add a convenient way to initialize an device-iotlb notifier.
This is meant to be used by ATS-capable devices.

pci_device_iommu_memory_region_pasid is introduces in this commit and
will be used in several other SVM-related functions exposed in
the PCI API.

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/pci/pci.c         | 40 ++++++++++++++++++++++++++++++++++++++++
 include/hw/pci/pci.h | 15 +++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 3fe47d4002..7a483dd05d 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2747,6 +2747,46 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
     return &address_space_memory;
 }
 
+static IOMMUMemoryRegion *pci_device_iommu_memory_region_pasid(PCIDevice *dev,
+                                                               uint32_t pasid)
+{
+    PCIBus *bus;
+    PCIBus *iommu_bus;
+    int devfn;
+
+    /*
+     * This function is for internal use in the module,
+     * we can call it with PCI_NO_PASID
+     */
+    if (!dev->is_master ||
+            ((pasid != PCI_NO_PASID) && !pcie_pasid_enabled(dev))) {
+        return NULL;
+    }
+
+    pci_device_get_iommu_bus_devfn(dev, &bus, &iommu_bus, &devfn);
+    if (iommu_bus && iommu_bus->iommu_ops->get_memory_region_pasid) {
+        return iommu_bus->iommu_ops->get_memory_region_pasid(bus,
+                                 iommu_bus->iommu_opaque, devfn, pasid);
+    }
+    return NULL;
+}
+
+bool pci_iommu_init_iotlb_notifier(PCIDevice *dev, uint32_t pasid,
+                                   IOMMUNotifier *n, IOMMUNotify fn,
+                                   void *opaque)
+{
+    IOMMUMemoryRegion *iommu_mr = pci_device_iommu_memory_region_pasid(dev,
+                                                                        pasid);
+    if (!iommu_mr) {
+        return false;
+    }
+    iommu_notifier_init(n, fn, IOMMU_NOTIFIER_DEVIOTLB_EVENTS, 0, HWADDR_MAX,
+                        memory_region_iommu_attrs_to_index(iommu_mr,
+                                                       MEMTXATTRS_UNSPECIFIED));
+    n->opaque = opaque;
+    return true;
+}
+
 AddressSpace *pci_device_iommu_address_space_pasid(PCIDevice *dev,
                                                    uint32_t pasid)
 {
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index ad7bd2ade5..b2a9ed7782 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -458,6 +458,21 @@ bool pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *hiod,
                                  Error **errp);
 void pci_device_unset_iommu_device(PCIDevice *dev);
 
+/**
+ * pci_iommu_init_iotlb_notifier: initialize an IOMMU notifier
+ *
+ * This function is used by devices before registering an IOTLB notifier
+ *
+ * @dev: the device
+ * @pasid: the pasid of the address space to watch
+ * @n: the notifier to initialize
+ * @fn: the callback to be installed
+ * @opaque: user pointer that can be used to store a state
+ */
+bool pci_iommu_init_iotlb_notifier(PCIDevice *dev, uint32_t pasid,
+                                   IOMMUNotifier *n, IOMMUNotify fn,
+                                   void *opaque);
+
 /**
  * pci_setup_iommu: Initialize specific IOMMU handlers for a PCIBus
  *
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 13/22] intel_iommu: implement the get_address_space_pasid iommu operation
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (11 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 12/22] pci: add a pci-level initialization function for iommu notifiers CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 15/22] memory: Allow to store the PASID in IOMMUTLBEntry CLEMENT MATHIEU--DRIF
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/i386/intel_iommu.c         | 13 ++++++++++---
 include/hw/i386/intel_iommu.h |  2 +-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 2a78fc823f..e047d2ca83 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -5438,7 +5438,7 @@ static const MemoryRegionOps vtd_mem_ir_fault_ops = {
 };
 
 VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
-                                 int devfn, unsigned int pasid)
+                                 int devfn, uint32_t pasid)
 {
     /*
      * We can't simply use sid here since the bus number might not be
@@ -5995,19 +5995,26 @@ static void vtd_reset(DeviceState *dev)
     vtd_refresh_pasid_bind(s);
 }
 
-static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
+static AddressSpace *vtd_host_dma_iommu_pasid(PCIBus *bus, void *opaque,
+                                              int devfn, uint32_t pasid)
 {
     IntelIOMMUState *s = opaque;
     VTDAddressSpace *vtd_as;
 
     assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
 
-    vtd_as = vtd_find_add_as(s, bus, devfn, PCI_NO_PASID);
+    vtd_as = vtd_find_add_as(s, bus, devfn, pasid);
     return &vtd_as->as;
 }
 
+static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
+{
+    return vtd_host_dma_iommu_pasid(bus, opaque, devfn, PCI_NO_PASID);
+}
+
 static PCIIOMMUOps vtd_iommu_ops = {
     .get_address_space = vtd_host_dma_iommu,
+    .get_address_space_pasid = vtd_host_dma_iommu_pasid,
     .set_iommu_device = vtd_dev_set_iommu_device,
     .unset_iommu_device = vtd_dev_unset_iommu_device,
 };
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index b32d711802..e334a3de6d 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -325,6 +325,6 @@ struct IntelIOMMUState {
  * create a new one if none exists
  */
 VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
-                                 int devfn, unsigned int pasid);
+                                 int devfn, uint32_t pasid);
 
 #endif
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 15/22] memory: Allow to store the PASID in IOMMUTLBEntry
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (12 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 13/22] intel_iommu: implement the get_address_space_pasid iommu operation CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 14/22] intel_iommu: implement the get_memory_region_pasid iommu operation CLEMENT MATHIEU--DRIF
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

This will be useful for devices that support ATS

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 include/exec/memory.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index bf91c4bed7..003ee06610 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -145,6 +145,7 @@ struct IOMMUTLBEntry {
     hwaddr           translated_addr;
     hwaddr           addr_mask;  /* 0xfff = 4k translation */
     IOMMUAccessFlags perm;
+    uint32_t         pasid;
 };
 
 /*
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 14/22] intel_iommu: implement the get_memory_region_pasid iommu operation
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (13 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 15/22] memory: Allow to store the PASID in IOMMUTLBEntry CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 16/22] intel_iommu: fill the PASID field when creating an instance of IOMMUTLBEntry CLEMENT MATHIEU--DRIF
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/i386/intel_iommu.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index e047d2ca83..2e4f535dd1 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -6012,9 +6012,24 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
     return vtd_host_dma_iommu_pasid(bus, opaque, devfn, PCI_NO_PASID);
 }
 
+static IOMMUMemoryRegion *vtd_get_memory_region_pasid(PCIBus *bus,
+                                                      void *opaque,
+                                                      int devfn,
+                                                      uint32_t pasid)
+{
+    IntelIOMMUState *s = opaque;
+    VTDAddressSpace *vtd_as;
+
+    assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
+
+    vtd_as = vtd_find_add_as(s, bus, devfn, pasid);
+    return &vtd_as->iommu;
+}
+
 static PCIIOMMUOps vtd_iommu_ops = {
     .get_address_space = vtd_host_dma_iommu,
     .get_address_space_pasid = vtd_host_dma_iommu_pasid,
+    .get_memory_region_pasid = vtd_get_memory_region_pasid,
     .set_iommu_device = vtd_dev_set_iommu_device,
     .unset_iommu_device = vtd_dev_unset_iommu_device,
 };
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 16/22] intel_iommu: fill the PASID field when creating an instance of IOMMUTLBEntry
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (14 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 14/22] intel_iommu: implement the get_memory_region_pasid iommu operation CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 17/22] atc: generic ATC that can be used by PCIe devices that support SVM CLEMENT MATHIEU--DRIF
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/i386/intel_iommu.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 2e4f535dd1..f77972130f 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2210,6 +2210,9 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
 
     vtd_iommu_lock(s);
 
+    /* fill the pasid before getting rid2pasid */
+    entry->pasid = pasid;
+
     cc_entry = &vtd_as->context_cache_entry;
 
     /* Try to fetch pte form IOTLB, we don't need RID2PASID logic */
@@ -2328,6 +2331,7 @@ out:
     entry->translated_addr = vtd_get_pte_addr(pte, s->aw_bits) & page_mask;
     entry->addr_mask = ~page_mask;
     entry->perm = access_flags;
+    /* pasid already set */
     return true;
 
 error:
@@ -2336,6 +2340,7 @@ error:
     entry->translated_addr = 0;
     entry->addr_mask = 0;
     entry->perm = IOMMU_NONE;
+    entry->pasid = PCI_NO_PASID;
     return false;
 }
 
@@ -3697,6 +3702,7 @@ static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
             event.entry.target_as = &address_space_memory;
             event.entry.iova = addr;
             event.entry.perm = IOMMU_NONE;
+            event.entry.pasid = pasid;
             event.entry.addr_mask = size - 1;
             event.entry.translated_addr = 0;
             memory_region_notify_iommu(&vtd_as->iommu, 0, event);
@@ -4344,6 +4350,7 @@ static void do_invalidate_device_tlb(VTDAddressSpace *vtd_dev_as,
     event.entry.iova = addr;
     event.entry.perm = IOMMU_NONE;
     event.entry.translated_addr = 0;
+    event.entry.pasid = vtd_dev_as->pasid;
     memory_region_notify_iommu(&vtd_dev_as->iommu, 0, event);
 }
 
@@ -4920,6 +4927,7 @@ static IOMMUTLBEntry vtd_iommu_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
     IOMMUTLBEntry iotlb = {
         /* We'll fill in the rest later. */
         .target_as = &address_space_memory,
+        .pasid = vtd_as->pasid,
     };
     bool success;
 
@@ -4932,6 +4940,7 @@ static IOMMUTLBEntry vtd_iommu_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
         iotlb.translated_addr = addr & VTD_PAGE_MASK_4K;
         iotlb.addr_mask = ~VTD_PAGE_MASK_4K;
         iotlb.perm = IOMMU_RW;
+        iotlb.pasid = PCI_NO_PASID;
         success = true;
     }
 
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 18/22] atc: add unit tests
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (16 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 17/22] atc: generic ATC that can be used by PCIe devices that support SVM CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 19/22] memory: add an API for ATS support CLEMENT MATHIEU--DRIF
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 tests/unit/meson.build |   1 +
 tests/unit/test-atc.c  | 527 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 528 insertions(+)
 create mode 100644 tests/unit/test-atc.c

diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index 26c109c968..d6c6c574de 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -47,6 +47,7 @@ tests = {
   'test-logging': [],
   'test-qapi-util': [],
   'test-interval-tree': [],
+  'test-atc': []
 }
 
 if have_system or have_tools
diff --git a/tests/unit/test-atc.c b/tests/unit/test-atc.c
new file mode 100644
index 0000000000..89378f7f63
--- /dev/null
+++ b/tests/unit/test-atc.c
@@ -0,0 +1,527 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "util/atc.h"
+
+static inline bool tlb_entry_equal(IOMMUTLBEntry *e1, IOMMUTLBEntry *e2)
+{
+    if (!e1 || !e2) {
+        return !e1 && !e2;
+    }
+    return e1->iova == e2->iova &&
+            e1->addr_mask == e2->addr_mask &&
+            e1->pasid == e2->pasid &&
+            e1->perm == e2->perm &&
+            e1->target_as == e2->target_as &&
+            e1->translated_addr == e2->translated_addr;
+}
+
+static void assert_lookup_equals(ATC *atc, IOMMUTLBEntry *target,
+                                 uint32_t pasid, hwaddr iova)
+{
+    IOMMUTLBEntry *result;
+    result = atc_lookup(atc, pasid, iova);
+    g_assert(tlb_entry_equal(result, target));
+}
+
+static void check_creation(uint64_t page_size, uint8_t address_width,
+                           uint8_t levels, uint8_t level_offset,
+                           bool should_work) {
+    ATC *atc = atc_new(page_size, address_width);
+    if (atc) {
+        if (atc->levels != levels || atc->level_offset != level_offset) {
+            g_assert(false); /* ATC created but invalid configuration : fail */
+        }
+        atc_destroy(atc);
+        g_assert(should_work);
+    } else {
+        g_assert(!should_work);
+    }
+}
+
+static void test_creation_parameters(void)
+{
+    check_creation(8, 39, 3, 9, false);
+    check_creation(4095, 39, 3, 9, false);
+    check_creation(4097, 39, 3, 9, false);
+    check_creation(8192, 48, 0, 0, false);
+
+    check_creation(4096, 38, 0, 0, false);
+    check_creation(4096, 39, 3, 9, true);
+    check_creation(4096, 40, 0, 0, false);
+    check_creation(4096, 47, 0, 0, false);
+    check_creation(4096, 48, 4, 9, true);
+    check_creation(4096, 49, 0, 0, false);
+    check_creation(4096, 56, 0, 0, false);
+    check_creation(4096, 57, 5, 9, true);
+    check_creation(4096, 58, 0, 0, false);
+
+    check_creation(16384, 35, 0, 0, false);
+    check_creation(16384, 36, 2, 11, true);
+    check_creation(16384, 37, 0, 0, false);
+    check_creation(16384, 46, 0, 0, false);
+    check_creation(16384, 47, 3, 11, true);
+    check_creation(16384, 48, 0, 0, false);
+    check_creation(16384, 57, 0, 0, false);
+    check_creation(16384, 58, 4, 11, true);
+    check_creation(16384, 59, 0, 0, false);
+}
+
+static void test_single_entry(void)
+{
+    IOMMUTLBEntry entry = {
+        .iova = 0x123456789000ULL,
+        .addr_mask = 0xfffULL,
+        .pasid = 5,
+        .perm = IOMMU_RW,
+        .translated_addr = 0xdeadbeefULL,
+    };
+
+    ATC *atc = atc_new(4096, 48);
+    g_assert(atc);
+
+    assert_lookup_equals(atc, NULL, entry.pasid,
+                         entry.iova + (entry.addr_mask / 2));
+
+    atc_create_address_space_cache(atc, entry.pasid);
+    g_assert(atc_update(atc, &entry) == 0);
+
+    assert_lookup_equals(atc, NULL, entry.pasid + 1,
+                         entry.iova + (entry.addr_mask / 2));
+    assert_lookup_equals(atc, &entry, entry.pasid,
+                         entry.iova + (entry.addr_mask / 2));
+
+    atc_destroy(atc);
+}
+
+static void test_single_entry_2(void)
+{
+    static uint64_t page_size = 4096;
+    IOMMUTLBEntry e1 = {
+        .iova = 0xabcdef200000ULL,
+        .addr_mask = 0xfffULL,
+        .pasid = 1,
+        .perm = IOMMU_RW,
+        .translated_addr = 0x5eedULL,
+    };
+
+    ATC *atc = atc_new(page_size , 48);
+    atc_create_address_space_cache(atc, e1.pasid);
+    atc_update(atc, &e1);
+
+    assert_lookup_equals(atc, NULL, e1.pasid, 0xabcdef201000ULL);
+
+    atc_destroy(atc);
+}
+
+static void test_page_boundaries(void)
+{
+    static const uint32_t pasid = 5;
+    static const hwaddr page_size = 4096;
+
+    /* 2 consecutive entries */
+    IOMMUTLBEntry e1 = {
+        .iova = 0x123456789000ULL,
+        .addr_mask = page_size - 1,
+        .pasid = pasid,
+        .perm = IOMMU_RW,
+        .translated_addr = 0xdeadbeefULL,
+    };
+    IOMMUTLBEntry e2 = {
+        .iova = e1.iova + page_size,
+        .addr_mask = page_size - 1,
+        .pasid = pasid,
+        .perm = IOMMU_RW,
+        .translated_addr = 0x900df00dULL,
+    };
+
+    ATC *atc = atc_new(page_size, 48);
+
+    atc_create_address_space_cache(atc, e1.pasid);
+    /* creating the address space twice should not be a problem */
+    atc_create_address_space_cache(atc, e1.pasid);
+
+    atc_update(atc, &e1);
+    atc_update(atc, &e2);
+
+    assert_lookup_equals(atc, NULL, e1.pasid, e1.iova - 1);
+    assert_lookup_equals(atc, &e1, e1.pasid, e1.iova);
+    assert_lookup_equals(atc, &e1, e1.pasid, e1.iova + e1.addr_mask);
+    g_assert((e1.iova + e1.addr_mask + 1) == e2.iova);
+    assert_lookup_equals(atc, &e2, e2.pasid, e2.iova);
+    assert_lookup_equals(atc, &e2, e2.pasid, e2.iova + e2.addr_mask);
+    assert_lookup_equals(atc, NULL, e2.pasid, e2.iova + e2.addr_mask + 1);
+
+    assert_lookup_equals(atc, NULL, e1.pasid + 10, e1.iova);
+    assert_lookup_equals(atc, NULL, e2.pasid + 10, e2.iova);
+    atc_destroy(atc);
+}
+
+static void test_huge_page(void)
+{
+    static const uint32_t pasid = 5;
+    static const hwaddr page_size = 4096;
+    IOMMUTLBEntry e1 = {
+        .iova = 0x123456600000ULL,
+        .addr_mask = 0x1fffffULL,
+        .pasid = pasid,
+        .perm = IOMMU_RW,
+        .translated_addr = 0xdeadbeefULL,
+    };
+    hwaddr addr;
+
+    ATC *atc = atc_new(page_size, 48);
+
+    atc_create_address_space_cache(atc, e1.pasid);
+    atc_update(atc, &e1);
+
+    for (addr = e1.iova; addr <= e1.iova + e1.addr_mask; addr += page_size) {
+        assert_lookup_equals(atc, &e1, e1.pasid, addr);
+    }
+    /* addr is now out of the huge page */
+    assert_lookup_equals(atc, NULL, e1.pasid, addr);
+    atc_destroy(atc);
+}
+
+static void test_pasid(void)
+{
+    hwaddr addr = 0xaaaaaaaaa000ULL;
+    IOMMUTLBEntry e1 = {
+        .iova = addr,
+        .addr_mask = 0xfffULL,
+        .pasid = 8,
+        .perm = IOMMU_RW,
+        .translated_addr = 0xdeadbeefULL,
+    };
+    IOMMUTLBEntry e2 = {
+        .iova = addr,
+        .addr_mask = 0xfffULL,
+        .pasid = 2,
+        .perm = IOMMU_RW,
+        .translated_addr = 0xb001ULL,
+    };
+    uint16_t i;
+
+    ATC *atc = atc_new(4096, 48);
+
+    atc_create_address_space_cache(atc, e1.pasid);
+    atc_create_address_space_cache(atc, e2.pasid);
+    atc_update(atc, &e1);
+    atc_update(atc, &e2);
+
+    for (i = 0; i <= MAX(e1.pasid, e2.pasid) + 1; ++i) {
+        if (i == e1.pasid || i == e2.pasid) {
+            continue;
+        }
+        assert_lookup_equals(atc, NULL, i, addr);
+    }
+    assert_lookup_equals(atc, &e1, e1.pasid, addr);
+    assert_lookup_equals(atc, &e1, e1.pasid, addr);
+    atc_destroy(atc);
+}
+
+static void test_large_address(void)
+{
+    IOMMUTLBEntry e1 = {
+        .iova = 0xaaaaaaaaa000ULL,
+        .addr_mask = 0xfffULL,
+        .pasid = 8,
+        .perm = IOMMU_RW,
+        .translated_addr = 0x5eeeeeedULL,
+    };
+    IOMMUTLBEntry e2 = {
+        .iova = 0x1f00baaaaabf000ULL,
+        .addr_mask = 0xfffULL,
+        .pasid = e1.pasid,
+        .perm = IOMMU_RW,
+        .translated_addr = 0xdeadbeefULL,
+    };
+
+    ATC *atc = atc_new(4096, 57);
+
+    atc_create_address_space_cache(atc, e1.pasid);
+    atc_update(atc, &e1);
+    atc_update(atc, &e2);
+
+    assert_lookup_equals(atc, &e1, e1.pasid, e1.iova);
+    assert_lookup_equals(atc, &e2, e2.pasid, e2.iova);
+    atc_destroy(atc);
+}
+
+static void test_bigger_page(void)
+{
+    IOMMUTLBEntry e1 = {
+        .iova = 0xaabbccdde000ULL,
+        .addr_mask = 0x1fffULL,
+        .pasid = 1,
+        .perm = IOMMU_RW,
+        .translated_addr = 0x5eeeeeedULL,
+    };
+    hwaddr i;
+
+    ATC *atc = atc_new(8192, 43);
+
+    atc_create_address_space_cache(atc, e1.pasid);
+    atc_update(atc, &e1);
+
+    i = e1.iova & (~e1.addr_mask);
+    assert_lookup_equals(atc, NULL, e1.pasid, i - 1);
+    while (i <= e1.iova + e1.addr_mask) {
+        assert_lookup_equals(atc, &e1, e1.pasid, i);
+        ++i;
+    }
+    assert_lookup_equals(atc, NULL, e1.pasid, i);
+    atc_destroy(atc);
+}
+
+static void test_unknown_pasid(void)
+{
+    IOMMUTLBEntry e1 = {
+        .iova = 0xaabbccfff000ULL,
+        .addr_mask = 0xfffULL,
+        .pasid = 1,
+        .perm = IOMMU_RW,
+        .translated_addr = 0x5eeeeeedULL,
+    };
+
+    ATC *atc = atc_new(4096, 48);
+    g_assert(atc_update(atc, &e1) != 0);
+    assert_lookup_equals(atc, NULL, e1.pasid, e1.iova);
+    atc_destroy(atc);
+}
+
+static void test_invalidation(void)
+{
+    static uint64_t page_size = 4096;
+    IOMMUTLBEntry e1 = {
+        .iova = 0xaabbccddf000ULL,
+        .addr_mask = 0xfffULL,
+        .pasid = 1,
+        .perm = IOMMU_RW,
+        .translated_addr = 0x5eeeeeedULL,
+    };
+    IOMMUTLBEntry e2 = {
+        .iova = 0xffe00000ULL,
+        .addr_mask = 0x1fffffULL,
+        .pasid = 1,
+        .perm = IOMMU_RW,
+        .translated_addr = 0xb000001ULL,
+    };
+    IOMMUTLBEntry e3;
+
+    ATC *atc = atc_new(page_size , 48);
+    atc_create_address_space_cache(atc, e1.pasid);
+
+    atc_update(atc, &e1);
+    assert_lookup_equals(atc, &e1, e1.pasid, e1.iova);
+    atc_invalidate(atc, &e1);
+    assert_lookup_equals(atc, NULL, e1.pasid, e1.iova);
+
+    atc_update(atc, &e1);
+    atc_update(atc, &e2);
+    assert_lookup_equals(atc, &e1, e1.pasid, e1.iova);
+    assert_lookup_equals(atc, &e2, e2.pasid, e2.iova);
+    atc_invalidate(atc, &e2);
+    assert_lookup_equals(atc, &e1, e1.pasid, e1.iova);
+    assert_lookup_equals(atc, NULL, e2.pasid, e2.iova);
+
+    /* invalidate a huge page by invalidating a small region */
+    for (hwaddr addr = e2.iova; addr <= (e2.iova + e2.addr_mask);
+         addr += page_size) {
+        atc_update(atc, &e2);
+        assert_lookup_equals(atc, &e2, e2.pasid, e2.iova);
+        e3 = (IOMMUTLBEntry){
+            .iova = addr,
+            .addr_mask = page_size - 1,
+            .pasid = e2.pasid,
+            .perm = IOMMU_RW,
+            .translated_addr = 0,
+        };
+        atc_invalidate(atc, &e3);
+        assert_lookup_equals(atc, NULL, e2.pasid, e2.iova);
+    }
+    atc_destroy(atc);
+}
+
+static void test_delete_address_space_cache(void)
+{
+    static uint64_t page_size = 4096;
+    IOMMUTLBEntry e1 = {
+        .iova = 0xaabbccddf000ULL,
+        .addr_mask = 0xfffULL,
+        .pasid = 1,
+        .perm = IOMMU_RW,
+        .translated_addr = 0x5eeeeeedULL,
+    };
+    IOMMUTLBEntry e2 = {
+        .iova = e1.iova,
+        .addr_mask = 0xfffULL,
+        .pasid = 2,
+        .perm = IOMMU_RW,
+        .translated_addr = 0x5eeeeeedULL,
+    };
+
+    ATC *atc = atc_new(page_size , 48);
+    atc_create_address_space_cache(atc, e1.pasid);
+
+    atc_update(atc, &e1);
+    assert_lookup_equals(atc, &e1, e1.pasid, e1.iova);
+    atc_invalidate(atc, &e2); /* unkown pasid : is a nop*/
+    assert_lookup_equals(atc, &e1, e1.pasid, e1.iova);
+
+    atc_create_address_space_cache(atc, e2.pasid);
+    atc_update(atc, &e2);
+    assert_lookup_equals(atc, &e1, e1.pasid, e1.iova);
+    assert_lookup_equals(atc, &e2, e2.pasid, e2.iova);
+    atc_invalidate(atc, &e1);
+    /* e1 has been removed but e2 is still there */
+    assert_lookup_equals(atc, NULL, e1.pasid, e1.iova);
+    assert_lookup_equals(atc, &e2, e2.pasid, e2.iova);
+
+    atc_update(atc, &e1);
+    assert_lookup_equals(atc, &e1, e1.pasid, e1.iova);
+    assert_lookup_equals(atc, &e2, e2.pasid, e2.iova);
+
+    atc_delete_address_space_cache(atc, e2.pasid);
+    assert_lookup_equals(atc, &e1, e1.pasid, e1.iova);
+    assert_lookup_equals(atc, NULL, e2.pasid, e2.iova);
+    atc_destroy(atc);
+}
+
+static void test_invalidate_entire_address_space(void)
+{
+    static uint64_t page_size = 4096;
+    IOMMUTLBEntry e1 = {
+        .iova = 0x1000ULL,
+        .addr_mask = 0xfffULL,
+        .pasid = 1,
+        .perm = IOMMU_RW,
+        .translated_addr = 0x5eedULL,
+    };
+    IOMMUTLBEntry e2 = {
+        .iova = 0xfffffffff000ULL,
+        .addr_mask = 0xfffULL,
+        .pasid = 1,
+        .perm = IOMMU_RW,
+        .translated_addr = 0xbeefULL,
+    };
+    IOMMUTLBEntry e3 = {
+        .iova = 0,
+        .addr_mask = 0xffffffffffffffffULL,
+        .pasid = 1,
+        .perm = IOMMU_RW,
+        .translated_addr = 0,
+    };
+
+    ATC *atc = atc_new(page_size , 48);
+    atc_create_address_space_cache(atc, e1.pasid);
+
+    atc_update(atc, &e1);
+    atc_update(atc, &e2);
+    assert_lookup_equals(atc, &e1, e1.pasid, e1.iova);
+    assert_lookup_equals(atc, &e2, e2.pasid, e2.iova);
+    atc_invalidate(atc, &e3);
+    /* e1 has been removed but e2 is still there */
+    assert_lookup_equals(atc, NULL, e1.pasid, e1.iova);
+    assert_lookup_equals(atc, NULL, e2.pasid, e2.iova);
+
+    atc_destroy(atc);
+}
+
+static void test_reset(void)
+{
+    static uint64_t page_size = 4096;
+    IOMMUTLBEntry e1 = {
+        .iova = 0x1000ULL,
+        .addr_mask = 0xfffULL,
+        .pasid = 1,
+        .perm = IOMMU_RW,
+        .translated_addr = 0x5eedULL,
+    };
+    IOMMUTLBEntry e2 = {
+        .iova = 0xfffffffff000ULL,
+        .addr_mask = 0xfffULL,
+        .pasid = 2,
+        .perm = IOMMU_RW,
+        .translated_addr = 0xbeefULL,
+    };
+
+    ATC *atc = atc_new(page_size , 48);
+    atc_create_address_space_cache(atc, e1.pasid);
+    atc_create_address_space_cache(atc, e2.pasid);
+    atc_update(atc, &e1);
+    atc_update(atc, &e2);
+
+    assert_lookup_equals(atc, &e1, e1.pasid, e1.iova);
+    assert_lookup_equals(atc, &e2, e2.pasid, e2.iova);
+
+    atc_reset(atc);
+
+    assert_lookup_equals(atc, NULL, e1.pasid, e1.iova);
+    assert_lookup_equals(atc, NULL, e2.pasid, e2.iova);
+    atc_destroy(atc);
+}
+
+static void test_get_max_number_of_pages(void)
+{
+    static uint64_t page_size = 4096;
+    hwaddr base = 0xc0fee000; /* aligned */
+    ATC *atc = atc_new(page_size , 48);
+    g_assert(atc_get_max_number_of_pages(atc, base, page_size / 2) == 1);
+    g_assert(atc_get_max_number_of_pages(atc, base, page_size) == 1);
+    g_assert(atc_get_max_number_of_pages(atc, base, page_size + 1) == 2);
+
+    g_assert(atc_get_max_number_of_pages(atc, base + 10, 1) == 1);
+    g_assert(atc_get_max_number_of_pages(atc, base + 10, page_size - 10) == 1);
+    g_assert(atc_get_max_number_of_pages(atc, base + 10,
+                                         page_size - 10 + 1) == 2);
+    g_assert(atc_get_max_number_of_pages(atc, base + 10,
+                                         page_size - 10 + 2) == 2);
+
+    g_assert(atc_get_max_number_of_pages(atc, base + page_size - 1, 1) == 1);
+    g_assert(atc_get_max_number_of_pages(atc, base + page_size - 1, 2) == 2);
+    g_assert(atc_get_max_number_of_pages(atc, base + page_size - 1, 3) == 2);
+
+    g_assert(atc_get_max_number_of_pages(atc, base + 10, page_size * 20) == 21);
+    g_assert(atc_get_max_number_of_pages(atc, base + 10,
+                                         (page_size * 20) + (page_size - 10))
+                                          == 21);
+    g_assert(atc_get_max_number_of_pages(atc, base + 10,
+                                         (page_size * 20) +
+                                         (page_size - 10 + 1)) == 22);
+}
+
+int main(int argc, char **argv)
+{
+    g_test_init(&argc, &argv, NULL);
+    g_test_add_func("/atc/test_creation_parameters", test_creation_parameters);
+    g_test_add_func("/atc/test_single_entry", test_single_entry);
+    g_test_add_func("/atc/test_single_entry_2", test_single_entry_2);
+    g_test_add_func("/atc/test_page_boundaries", test_page_boundaries);
+    g_test_add_func("/atc/test_huge_page", test_huge_page);
+    g_test_add_func("/atc/test_pasid", test_pasid);
+    g_test_add_func("/atc/test_large_address", test_large_address);
+    g_test_add_func("/atc/test_bigger_page", test_bigger_page);
+    g_test_add_func("/atc/test_unknown_pasid", test_unknown_pasid);
+    g_test_add_func("/atc/test_invalidation", test_invalidation);
+    g_test_add_func("/atc/test_delete_address_space_cache",
+                    test_delete_address_space_cache);
+    g_test_add_func("/atc/test_invalidate_entire_address_space",
+                    test_invalidate_entire_address_space);
+    g_test_add_func("/atc/test_reset", test_reset);
+    g_test_add_func("/atc/test_get_max_number_of_pages",
+                    test_get_max_number_of_pages);
+    return g_test_run();
+}
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 17/22] atc: generic ATC that can be used by PCIe devices that support SVM
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (15 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 16/22] intel_iommu: fill the PASID field when creating an instance of IOMMUTLBEntry CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 18/22] atc: add unit tests CLEMENT MATHIEU--DRIF
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

As the SVM-capable devices will need to cache translations, we provide
an first implementation.

This cache uses a two-level design based on hash tables.
The first level is indexed by a PASID and the second by a virtual addresse.

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 util/atc.c       | 211 +++++++++++++++++++++++++++++++++++++++++++++++
 util/atc.h       | 117 ++++++++++++++++++++++++++
 util/meson.build |   1 +
 3 files changed, 329 insertions(+)
 create mode 100644 util/atc.c
 create mode 100644 util/atc.h

diff --git a/util/atc.c b/util/atc.c
new file mode 100644
index 0000000000..584ce045db
--- /dev/null
+++ b/util/atc.c
@@ -0,0 +1,211 @@
+/*
+ * QEMU emulation of an ATC
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "util/atc.h"
+
+
+#define PAGE_TABLE_ENTRY_SIZE 8
+
+/* a pasid is hashed using the identity function */
+static guint atc_pasid_key_hash(gconstpointer v)
+{
+    return (guint)(uintptr_t)v; /* pasid */
+}
+
+/* pasid equality */
+static gboolean atc_pasid_key_equal(gconstpointer v1, gconstpointer v2)
+{
+    return v1 == v2;
+}
+
+/* Hash function for IOTLB entries */
+static guint atc_addr_key_hash(gconstpointer v)
+{
+    hwaddr addr = (hwaddr)v;
+    return (guint)((addr >> 32) ^ (addr & 0xffffffffU));
+}
+
+/* Equality test for IOTLB entries */
+static gboolean atc_addr_key_equal(gconstpointer v1, gconstpointer v2)
+{
+    return (hwaddr)v1 == (hwaddr)v2;
+}
+
+static void atc_address_space_free(void *as)
+{
+    g_hash_table_unref(as);
+}
+
+/* return log2(val), or UINT8_MAX if val is not a power of 2 */
+static uint8_t ilog2(uint64_t val)
+{
+    uint8_t result = 0;
+    while (val != 1) {
+        if (val & 1) {
+            return UINT8_MAX;
+        }
+
+        val >>= 1;
+        result += 1;
+    }
+    return result;
+}
+
+ATC *atc_new(uint64_t page_size, uint8_t address_width)
+{
+    ATC *atc;
+    uint8_t log_page_size = ilog2(page_size);
+    /* number of bits each used to store all the intermediate indexes */
+    uint64_t addr_lookup_indexes_size;
+
+    if (log_page_size == UINT8_MAX) {
+        return NULL;
+    }
+    /*
+     * We only support page table entries of 8 (PAGE_TABLE_ENTRY_SIZE) bytes
+     * log2(page_size / 8) = log2(page_size) - 3
+     * is the level offset
+     */
+    if (log_page_size <= 3) {
+        return NULL;
+    }
+
+    atc = g_new0(ATC, 1);
+    atc->address_spaces = g_hash_table_new_full(atc_pasid_key_hash,
+                                                atc_pasid_key_equal,
+                                                NULL, atc_address_space_free);
+    atc->level_offset = log_page_size - 3;
+    /* at this point, we know that page_size is a power of 2 */
+    atc->min_addr_mask = page_size - 1;
+    addr_lookup_indexes_size = address_width - log_page_size;
+    if ((addr_lookup_indexes_size % atc->level_offset) != 0) {
+        goto error;
+    }
+    atc->levels = addr_lookup_indexes_size / atc->level_offset;
+    atc->page_size = page_size;
+    return atc;
+
+error:
+    g_free(atc);
+    return NULL;
+}
+
+static inline GHashTable *atc_get_address_space_cache(ATC *atc, uint32_t pasid)
+{
+    return g_hash_table_lookup(atc->address_spaces,
+                               (gconstpointer)(uintptr_t)pasid);
+}
+
+void atc_create_address_space_cache(ATC *atc, uint32_t pasid)
+{
+    GHashTable *as_cache;
+
+    as_cache = atc_get_address_space_cache(atc, pasid);
+    if (!as_cache) {
+        as_cache = g_hash_table_new_full(atc_addr_key_hash,
+                                         atc_addr_key_equal,
+                                         NULL, g_free);
+        g_hash_table_replace(atc->address_spaces,
+                             (gpointer)(uintptr_t)pasid, as_cache);
+    }
+}
+
+void atc_delete_address_space_cache(ATC *atc, uint32_t pasid)
+{
+    g_hash_table_remove(atc->address_spaces, (gpointer)(uintptr_t)pasid);
+}
+
+int atc_update(ATC *atc, IOMMUTLBEntry *entry)
+{
+    IOMMUTLBEntry *value;
+    GHashTable *as_cache = atc_get_address_space_cache(atc, entry->pasid);
+    if (!as_cache) {
+        return -ENODEV;
+    }
+    value = g_memdup2(entry, sizeof(*value));
+    g_hash_table_replace(as_cache, (gpointer)(entry->iova), value);
+    return 0;
+}
+
+IOMMUTLBEntry *atc_lookup(ATC *atc, uint32_t pasid, hwaddr addr)
+{
+    IOMMUTLBEntry *entry;
+    hwaddr mask = atc->min_addr_mask;
+    hwaddr key = addr & (~mask);
+    GHashTable *as_cache = atc_get_address_space_cache(atc, pasid);
+
+    if (!as_cache) {
+        return NULL;
+    }
+
+    /*
+     * Iterate over the possible page sizes and try to find a hit
+     */
+    for (uint8_t level = 0; level < atc->levels; ++level) {
+        entry = g_hash_table_lookup(as_cache, (gconstpointer)key);
+        if (entry && (mask == entry->addr_mask)) {
+            return entry;
+        }
+        mask = (mask << atc->level_offset) | ((1 << atc->level_offset) - 1);
+        key = addr & (~mask);
+    }
+
+    return NULL;
+}
+
+static gboolean atc_invalidate_entry_predicate(gpointer key, gpointer value,
+                                               gpointer user_data)
+{
+    IOMMUTLBEntry *entry = (IOMMUTLBEntry *)value;
+    IOMMUTLBEntry *target = (IOMMUTLBEntry *)user_data;
+    hwaddr target_mask = ~target->addr_mask;
+    hwaddr entry_mask = ~entry->addr_mask;
+    return ((target->iova & target_mask) == (entry->iova & target_mask)) ||
+           ((target->iova & entry_mask) == (entry->iova & entry_mask));
+}
+
+void atc_invalidate(ATC *atc, IOMMUTLBEntry *entry)
+{
+    GHashTable *as_cache = atc_get_address_space_cache(atc, entry->pasid);
+    if (!as_cache) {
+        return;
+    }
+    g_hash_table_foreach_remove(as_cache,
+                                atc_invalidate_entry_predicate,
+                                entry);
+}
+
+void atc_destroy(ATC *atc)
+{
+    g_hash_table_unref(atc->address_spaces);
+}
+
+size_t atc_get_max_number_of_pages(ATC *atc, hwaddr addr, size_t length)
+{
+    hwaddr page_mask = ~(atc->min_addr_mask);
+    size_t result = (length / atc->page_size);
+    if ((((addr & page_mask) + length - 1) & page_mask) !=
+        ((addr + length - 1) & page_mask)) {
+        result += 1;
+    }
+    return result + (length % atc->page_size != 0 ? 1 : 0);
+}
+
+void atc_reset(ATC *atc)
+{
+    g_hash_table_remove_all(atc->address_spaces);
+}
diff --git a/util/atc.h b/util/atc.h
new file mode 100644
index 0000000000..8be95f5cca
--- /dev/null
+++ b/util/atc.h
@@ -0,0 +1,117 @@
+/*
+ * QEMU emulation of an ATC
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef UTIL_ATC_H
+#define UTIL_ATC_H
+
+#include "qemu/osdep.h"
+#include "exec/memory.h"
+
+typedef struct ATC {
+    GHashTable *address_spaces; /* Key : pasid, value : GHashTable */
+    hwaddr min_addr_mask;
+    uint64_t page_size;
+    uint8_t levels;
+    uint8_t level_offset;
+} ATC;
+
+/*
+ * atc_new: Create an ATC.
+ *
+ * Return an ATC or NULL if the creation failed
+ *
+ * @page_size: #PCIDevice doing the memory access
+ * @address_width: width of the virtual addresses used by the IOMMU (in bits)
+ */
+ATC *atc_new(uint64_t page_size, uint8_t address_width);
+
+/*
+ * atc_update: Insert or update an entry in the cache
+ *
+ * Return 0 if the operation succeeds, a negative error code otherwise
+ *
+ * The insertion will fail if the address space associated with this pasid
+ * has not been created with atc_create_address_space_cache
+ *
+ * @atc: the ATC to update
+ * @entry: the tlb entry to insert into the cache
+ */
+int atc_update(ATC *atc, IOMMUTLBEntry *entry);
+
+/*
+ * atc_create_address_space_cache: delare a new address space
+ * identified by a PASID
+ *
+ * @atc: the ATC to update
+ * @pasid: the pasid of the address space to be created
+ */
+void atc_create_address_space_cache(ATC *atc, uint32_t pasid);
+
+/*
+ * atc_delete_address_space_cache: delete an address space
+ * identified by a PASID
+ *
+ * @atc: the ATC to update
+ * @pasid: the pasid of the address space to be deleted
+ */
+void atc_delete_address_space_cache(ATC *atc, uint32_t pasid);
+
+/*
+ * atc_lookup: query the cache in a given address space
+ *
+ * @atc: the ATC to query
+ * @pasid: the pasid of the address space to query
+ * @addr: the virtual address to translate
+ */
+IOMMUTLBEntry *atc_lookup(ATC *atc, uint32_t pasid, hwaddr addr);
+
+/*
+ * atc_invalidate: invalidate an entry in the cache
+ *
+ * @atc: the ATC to update
+ * @entry: the entry to invalidate
+ */
+void atc_invalidate(ATC *atc, IOMMUTLBEntry *entry);
+
+/*
+ * atc_destroy: delete an ATC
+ *
+ * @atc: the cache to be deleted
+ */
+void atc_destroy(ATC *atc);
+
+/*
+ * atc_get_max_number_of_pages: get the number of pages a memory operation
+ * will access if all the pages concerned have the minimum size.
+ *
+ * This function can be used to determine the size of the result array to be
+ * allocated when issuing an ATS request.
+ *
+ * @atc: the cache
+ * @addr: start address
+ * @length: number of bytes accessed from addr
+ */
+size_t atc_get_max_number_of_pages(ATC *atc, hwaddr addr, size_t length);
+
+/*
+ * atc_reset: invalidates all the entries stored in the ATC
+ *
+ * @atc: the cache
+ */
+void atc_reset(ATC *atc);
+
+#endif
diff --git a/util/meson.build b/util/meson.build
index 72b505df11..2273f8176a 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -93,6 +93,7 @@ if have_block
   util_ss.add(files('hbitmap.c'))
   util_ss.add(files('hexdump.c'))
   util_ss.add(files('iova-tree.c'))
+  util_ss.add(files('atc.c'))
   util_ss.add(files('iov.c'))
   util_ss.add(files('nvdimm-utils.c'))
   util_ss.add(files('block-helpers.c'))
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 19/22] memory: add an API for ATS support
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (17 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 18/22] atc: add unit tests CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-03 12:14   ` Yi Liu
  2024-07-02  5:52 ` [PATCH ats_vtd v5 20/22] pci: add a pci-level API for ATS CLEMENT MATHIEU--DRIF
                   ` (5 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

IOMMU have to implement iommu_ats_request_translation to support ATS.

Devices can use IOMMU_TLB_ENTRY_TRANSLATION_ERROR to check the tlb
entries returned by a translation request.

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 include/exec/memory.h | 26 ++++++++++++++++++++++++++
 system/memory.c       | 20 ++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 003ee06610..48555c87c6 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -148,6 +148,10 @@ struct IOMMUTLBEntry {
     uint32_t         pasid;
 };
 
+/* Check if an IOMMU TLB entry indicates a translation error */
+#define IOMMU_TLB_ENTRY_TRANSLATION_ERROR(entry) ((((entry)->perm) & IOMMU_RW) \
+                                                    == IOMMU_NONE)
+
 /*
  * Bitmap for different IOMMUNotifier capabilities. Each notifier can
  * register with one or multiple IOMMU Notifier capability bit(s).
@@ -571,6 +575,20 @@ struct IOMMUMemoryRegionClass {
      int (*iommu_set_iova_ranges)(IOMMUMemoryRegion *iommu,
                                   GList *iova_ranges,
                                   Error **errp);
+
+    /**
+     * @iommu_ats_request_translation:
+     * This method must be implemented if the IOMMU has ATS enabled
+     *
+     * @see pci_ats_request_translation_pasid
+     */
+    ssize_t (*iommu_ats_request_translation)(IOMMUMemoryRegion *iommu,
+                                             bool priv_req, bool exec_req,
+                                             hwaddr addr, size_t length,
+                                             bool no_write,
+                                             IOMMUTLBEntry *result,
+                                             size_t result_length,
+                                             uint32_t *err_count);
 };
 
 typedef struct RamDiscardListener RamDiscardListener;
@@ -1926,6 +1944,14 @@ void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n);
 void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
                                              IOMMUNotifier *n);
 
+ssize_t memory_region_iommu_ats_request_translation(IOMMUMemoryRegion *iommu_mr,
+                                                bool priv_req, bool exec_req,
+                                                hwaddr addr, size_t length,
+                                                bool no_write,
+                                                IOMMUTLBEntry *result,
+                                                size_t result_length,
+                                                uint32_t *err_count);
+
 /**
  * memory_region_iommu_get_attr: return an IOMMU attr if get_attr() is
  * defined on the IOMMU.
diff --git a/system/memory.c b/system/memory.c
index 74cd73ebc7..8268df7bf5 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2005,6 +2005,26 @@ void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
     memory_region_update_iommu_notify_flags(iommu_mr, NULL);
 }
 
+ssize_t memory_region_iommu_ats_request_translation(IOMMUMemoryRegion *iommu_mr,
+                                                    bool priv_req,
+                                                    bool exec_req,
+                                                    hwaddr addr, size_t length,
+                                                    bool no_write,
+                                                    IOMMUTLBEntry *result,
+                                                    size_t result_length,
+                                                    uint32_t *err_count)
+{
+    IOMMUMemoryRegionClass *imrc = memory_region_get_iommu_class_nocheck(iommu_mr);
+
+    if (!imrc->iommu_ats_request_translation) {
+        return -ENODEV;
+    }
+
+    return imrc->iommu_ats_request_translation(iommu_mr, priv_req, exec_req,
+                                               addr, length, no_write, result,
+                                               result_length, err_count);
+}
+
 void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
                                     IOMMUTLBEvent *event)
 {
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 20/22] pci: add a pci-level API for ATS
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (18 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 19/22] memory: add an API for ATS support CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-09 10:15   ` Minwoo Im
  2024-07-02  5:52 ` [PATCH ats_vtd v5 21/22] intel_iommu: set the address mask even when a translation fails CLEMENT MATHIEU--DRIF
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

Devices implementing ATS can send translation requests using
pci_ats_request_translation_pasid.

The invalidation events are sent back to the device using the iommu
notifier managed with pci_register_iommu_tlb_event_notifier and
pci_unregister_iommu_tlb_event_notifier

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/pci/pci.c         | 44 +++++++++++++++++++++++++++++++++++++
 include/hw/pci/pci.h | 52 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 7a483dd05d..93b816aff2 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2833,6 +2833,50 @@ void pci_device_unset_iommu_device(PCIDevice *dev)
     }
 }
 
+ssize_t pci_ats_request_translation_pasid(PCIDevice *dev, uint32_t pasid,
+                                          bool priv_req, bool exec_req,
+                                          hwaddr addr, size_t length,
+                                          bool no_write, IOMMUTLBEntry *result,
+                                          size_t result_length,
+                                          uint32_t *err_count)
+{
+    assert(result_length);
+    IOMMUMemoryRegion *iommu_mr = pci_device_iommu_memory_region_pasid(dev,
+                                                                        pasid);
+    if (!iommu_mr || !pcie_ats_enabled(dev)) {
+        return -EPERM;
+    }
+    return memory_region_iommu_ats_request_translation(iommu_mr, priv_req,
+                                                       exec_req, addr, length,
+                                                       no_write, result,
+                                                       result_length,
+                                                       err_count);
+}
+
+int pci_register_iommu_tlb_event_notifier(PCIDevice *dev, uint32_t pasid,
+                                          IOMMUNotifier *n)
+{
+    IOMMUMemoryRegion *iommu_mr = pci_device_iommu_memory_region_pasid(dev,
+                                                                        pasid);
+    if (!iommu_mr) {
+        return -EPERM;
+    }
+    return memory_region_register_iommu_notifier(MEMORY_REGION(iommu_mr), n,
+                                                 &error_fatal);
+}
+
+int pci_unregister_iommu_tlb_event_notifier(PCIDevice *dev, uint32_t pasid,
+                                             IOMMUNotifier *n)
+{
+    IOMMUMemoryRegion *iommu_mr = pci_device_iommu_memory_region_pasid(dev,
+                                                                        pasid);
+    if (!iommu_mr) {
+        return -EPERM;
+    }
+    memory_region_unregister_iommu_notifier(MEMORY_REGION(iommu_mr), n);
+    return 0;
+}
+
 void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque)
 {
     /*
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index b2a9ed7782..d656f2656a 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -473,6 +473,58 @@ bool pci_iommu_init_iotlb_notifier(PCIDevice *dev, uint32_t pasid,
                                    IOMMUNotifier *n, IOMMUNotify fn,
                                    void *opaque);
 
+/**
+ * pci_ats_request_translation_pasid: perform an ATS request
+ *
+ * Return the number of translations stored in @result in case of success,
+ * a negative error code otherwise.
+ * -ENOMEM is returned when the result buffer is not large enough to store
+ * all the translations
+ *
+ * @dev: the ATS-capable PCI device
+ * @pasid: the pasid of the address space in which the translation will be made
+ * @priv_req: privileged mode bit (PASID TLP)
+ * @exec_req: execute request bit (PASID TLP)
+ * @addr: start address of the memory range to be translated
+ * @length: length of the memory range in bytes
+ * @no_write: request a read-only access translation (if supported by the IOMMU)
+ * @result: buffer in which the TLB entries will be stored
+ * @result_length: result buffer length
+ * @err_count: number of untranslated subregions
+ */
+ssize_t pci_ats_request_translation_pasid(PCIDevice *dev, uint32_t pasid,
+                                          bool priv_req, bool exec_req,
+                                          hwaddr addr, size_t length,
+                                          bool no_write, IOMMUTLBEntry *result,
+                                          size_t result_length,
+                                          uint32_t *err_count);
+
+/**
+ * pci_register_iommu_tlb_event_notifier: register a notifier for changes to
+ * IOMMU translation entries in a specific address space.
+ *
+ * Returns 0 on success, or a negative errno otherwise.
+ *
+ * @dev: the device that wants to get notified
+ * @pasid: the pasid of the address space to track
+ * @n: the notifier to register
+ */
+int pci_register_iommu_tlb_event_notifier(PCIDevice *dev, uint32_t pasid,
+                                          IOMMUNotifier *n);
+
+/**
+ * pci_unregister_iommu_tlb_event_notifier: unregister a notifier that has been
+ * registerd with pci_register_iommu_tlb_event_notifier
+ *
+ * Returns 0 on success, or a negative errno otherwise.
+ *
+ * @dev: the device that wants to unsubscribe
+ * @pasid: the pasid of the address space to be untracked
+ * @n: the notifier to unregister
+ */
+int pci_unregister_iommu_tlb_event_notifier(PCIDevice *dev, uint32_t pasid,
+                                            IOMMUNotifier *n);
+
 /**
  * pci_setup_iommu: Initialize specific IOMMU handlers for a PCIBus
  *
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 21/22] intel_iommu: set the address mask even when a translation fails
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (19 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 20/22] pci: add a pci-level API for ATS CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02  5:52 ` [PATCH ats_vtd v5 22/22] intel_iommu: add support for ATS CLEMENT MATHIEU--DRIF
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

Implements the behavior defined in section 10.2.3.5 of PCIe spec rev 5.
This is needed by devices that support ATS.

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/i386/intel_iommu.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index f77972130f..9a1bce9ae2 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2192,7 +2192,8 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
     uint8_t bus_num = pci_bus_num(bus);
     VTDContextCacheEntry *cc_entry;
     uint64_t pte, page_mask;
-    uint32_t level, pasid = vtd_as->pasid;
+    uint32_t level = UINT32_MAX;
+    uint32_t pasid = vtd_as->pasid;
     uint16_t source_id = PCI_BUILD_BDF(bus_num, devfn);
     int ret_fr;
     bool is_fpd_set = false;
@@ -2338,7 +2339,12 @@ error:
     vtd_iommu_unlock(s);
     entry->iova = 0;
     entry->translated_addr = 0;
-    entry->addr_mask = 0;
+    /*
+     * Set the mask for ATS (the range must be present even when the
+     * translation fails : PCIe rev 5 10.2.3.5)
+     */
+    entry->addr_mask = (level != UINT32_MAX) ?
+                       (~vtd_pt_level_page_mask(level)) : (~VTD_PAGE_MASK_4K);
     entry->perm = IOMMU_NONE;
     entry->pasid = PCI_NO_PASID;
     return false;
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH ats_vtd v5 22/22] intel_iommu: add support for ATS
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (20 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 21/22] intel_iommu: set the address mask even when a translation fails CLEMENT MATHIEU--DRIF
@ 2024-07-02  5:52 ` CLEMENT MATHIEU--DRIF
  2024-07-02 12:16 ` [PATCH ats_vtd v5 00/22] ATS support for VT-d Michael S. Tsirkin
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:52 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, yi.l.liu@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com, mst@redhat.com,
	CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
---
 hw/i386/intel_iommu.c          | 75 ++++++++++++++++++++++++++++++++--
 hw/i386/intel_iommu_internal.h |  1 +
 2 files changed, 73 insertions(+), 3 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 9a1bce9ae2..191d7cf0a9 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -5405,12 +5405,10 @@ static void vtd_report_ir_illegal_access(VTDAddressSpace *vtd_as,
     bool is_fpd_set = false;
     VTDContextEntry ce;
 
-    assert(vtd_as->pasid != PCI_NO_PASID);
-
     /* Try out best to fetch FPD, we can't do anything more */
     if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {
         is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
-        if (!is_fpd_set && s->root_scalable) {
+        if (!is_fpd_set && s->root_scalable && vtd_as->pasid != PCI_NO_PASID) {
             vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set, vtd_as->pasid);
         }
     }
@@ -6041,6 +6039,75 @@ static IOMMUMemoryRegion *vtd_get_memory_region_pasid(PCIBus *bus,
     return &vtd_as->iommu;
 }
 
+static IOMMUTLBEntry vtd_iommu_ats_do_translate(IOMMUMemoryRegion *iommu,
+                                                hwaddr addr,
+                                                IOMMUAccessFlags flags,
+                                                int iommu_idx)
+{
+    IOMMUTLBEntry entry;
+    VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
+
+    if (vtd_is_interrupt_addr(addr)) {
+        vtd_report_ir_illegal_access(vtd_as, addr, flags & IOMMU_WO);
+        entry.iova = 0;
+        entry.translated_addr = 0;
+        entry.addr_mask = ~VTD_PAGE_MASK_4K;
+        entry.perm = IOMMU_NONE;
+        entry.pasid = PCI_NO_PASID;
+    } else {
+        entry = vtd_iommu_translate(iommu, addr, flags, iommu_idx);
+    }
+    return entry;
+}
+
+static ssize_t vtd_iommu_ats_request_translation(IOMMUMemoryRegion *iommu,
+                                                 bool priv_req, bool exec_req,
+                                                 hwaddr addr, size_t length,
+                                                 bool no_write,
+                                                 IOMMUTLBEntry *result,
+                                                 size_t result_length,
+                                                 uint32_t *err_count)
+{
+    IOMMUAccessFlags flags = IOMMU_ACCESS_FLAG_FULL(true, !no_write, exec_req,
+                                                    priv_req, false, false);
+    ssize_t res_index = 0;
+    hwaddr target_address = addr + length;
+    IOMMUTLBEntry entry;
+
+    *err_count = 0;
+
+    while ((addr < target_address) && (res_index < result_length)) {
+        entry = vtd_iommu_ats_do_translate(iommu, addr, flags, 0);
+        if (!IOMMU_TLB_ENTRY_TRANSLATION_ERROR(&entry)) { /* Translation done */
+            if (no_write) {
+                /* The device should not use this entry for a write access */
+                entry.perm &= ~IOMMU_WO;
+            }
+            /*
+             * 4.1.2 : Global Mapping (G) : Remapping hardware provides a value
+             * of 0 in this field
+             */
+            entry.perm &= ~IOMMU_GLOBAL;
+        } else {
+            *err_count += 1;
+        }
+        result[res_index] = entry;
+        res_index += 1;
+        addr = (addr & (~entry.addr_mask)) + (entry.addr_mask + 1);
+    }
+
+    /* Buffer too small */
+    if (addr < target_address) {
+        return -ENOMEM;
+    }
+    return res_index;
+}
+
+static uint64_t vtd_get_min_page_size(IOMMUMemoryRegion *iommu)
+{
+    return VTD_PAGE_SIZE;
+}
+
 static PCIIOMMUOps vtd_iommu_ops = {
     .get_address_space = vtd_host_dma_iommu,
     .get_address_space_pasid = vtd_host_dma_iommu_pasid,
@@ -6246,6 +6313,8 @@ static void vtd_iommu_memory_region_class_init(ObjectClass *klass,
     imrc->translate = vtd_iommu_translate;
     imrc->notify_flag_changed = vtd_iommu_notify_flag_changed;
     imrc->replay = vtd_iommu_replay;
+    imrc->iommu_ats_request_translation = vtd_iommu_ats_request_translation;
+    imrc->get_min_page_size = vtd_get_min_page_size;
 }
 
 static const TypeInfo vtd_iommu_memory_region_info = {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 117dc96d22..d4831522ed 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -194,6 +194,7 @@
 #define VTD_ECAP_MHMV               (15ULL << 20)
 #define VTD_ECAP_NEST               (1ULL << 26)
 #define VTD_ECAP_SRS                (1ULL << 31)
+#define VTD_ECAP_NWFS               (1ULL << 33)
 #define VTD_ECAP_PSS                (19ULL << 35)
 #define VTD_ECAP_PASID              (1ULL << 40)
 #define VTD_ECAP_SMTS               (1ULL << 43)
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
  2024-07-01 20:02 ` Michael S. Tsirkin
@ 2024-07-02  5:57   ` CLEMENT MATHIEU--DRIF
  2024-07-02 12:15     ` Michael S. Tsirkin
  0 siblings, 1 reply; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02  5:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com,
	zhenzhong.duan@intel.com, kevin.tian@intel.com,
	yi.l.liu@intel.com, joao.m.martins@oracle.com, peterx@redhat.com

[-- Attachment #1: Type: text/plain, Size: 6660 bytes --]



________________________________
From: Michael S. Tsirkin <mst@redhat.com>
Sent: 01 July 2024 22:02
To: CLEMENT MATHIEU--DRIF <clement.mathieu--drif@eviden.com>
Cc: qemu-devel@nongnu.org <qemu-devel@nongnu.org>; jasowang@redhat.com <jasowang@redhat.com>; zhenzhong.duan@intel.com <zhenzhong.duan@intel.com>; kevin.tian@intel.com <kevin.tian@intel.com>; yi.l.liu@intel.com <yi.l.liu@intel.com>; joao.m.martins@oracle.com <joao.m.martins@oracle.com>; peterx@redhat.com <peterx@redhat.com>
Subject: Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d

Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.


On Mon, Jun 03, 2024 at 05:59:38AM +0000, CLEMENT MATHIEU--DRIF wrote:
> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>
> This series belongs to a list of series that add SVM support for VT-d.
>
> As a starting point, we use the series called 'intel_iommu: Enable stage-1 translation' (rfc2) by Zhenzhong Duan and Yi Liu.
>
> Here we focus on the implementation of ATS support in the IOMMU and on a PCI-level
> API for ATS to be used by virtual devices.
>
> This work is based on the VT-d specification version 4.1 (March 2023).
> Here is a link to a GitHub repository where you can find the following elements :
>     - Qemu with all the patches for SVM
>         - ATS
>         - PRI
>         - Device IOTLB invalidations
>         - Requests with already translated addresses
>     - A demo device
>     - A simple driver for the demo device
>     - A userspace program (for testing and demonstration purposes)
>
> https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FBullSequana%2FQemu-in-guest-SVM-demo&data=05%7C02%7Cclement.mathieu--drif%40eviden.com%7Cf5759aefcc5f4e7d4e6c08dc9a08d29a%7C7d1c77852d8a437db8421ed5d8fbe00a%7C0%7C0%7C638554609882544195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=2Gza1VD7hKr1Sx3fOLoRh6tk3taSPKTn5nfimhPLz70%3D&reserved=0<https://github.com/BullSequana/Qemu-in-guest-SVM-demo>

I will merge, but could you please resend this using git format-patch
for formatting?  The patches have trailing CRs and don't show which sha1
they are for, which makes re-applying them after each change painful.



Hi Michael,
I sent the series again without the trailing new line.
Tell me if it's better.

Is Zhenzhong's FLTS series merged? If not, it might the cause of the sha1 problem you are facing

Thanks
>cmd


> v2
>     - handle huge pages better by detecting the page table level at which the translation errors occur
>     - Changes after review by ZhenZhong Duan :
>       - Set the access bit after checking permissions
>       - helper for PASID and ATS : make the commit message more accurate ('present' replaced with 'enabled')
>       - pcie_pasid_init: add PCI_PASID_CAP_WIDTH_SHIFT and use it instead of PCI_EXT_CAP_PASID_SIZEOF for shifting the pasid width when preparing the capability register
>       - pci: do not check pci_bus_bypass_iommu after calling pci_device_get_iommu_bus_devfn
>       - do not alter formatting of IOMMUTLBEntry declaration
>       - vtd_iova_fl_check_canonical : directly use s->aw_bits instead of aw for the sake of clarity
>
> v3
>     - rebase on new version of Zhenzhong's flts implementation
>     - fix the atc lookup operation (check the mask before returning an entry)
>     - add a unit test for the ATC
>     - store a user pointer in the iommu notifiers to simplify the implementation of svm devices
>     Changes after review by Zhenzhong :
>       - store the input pasid instead of rid2pasid when returning an entry after a translation
>       - split the ATC implementation and its unit tests
>
> v4
>     Changes after internal review
>       - Fix the nowrite optimization, an ATS translation without the nowrite flag should not fail when the write permission is not set
>
> v5
>     Changes after review by Philippe :
>       - change the type of 'level' to unsigned in vtd_lookup_iotlb
>
>
>
> Clément Mathieu--Drif (22):
>   intel_iommu: fix FRCD construction macro.
>   intel_iommu: make types match
>   intel_iommu: return page walk level even when the translation fails
>   intel_iommu: do not consider wait_desc as an invalid descriptor
>   memory: add permissions in IOMMUAccessFlags
>   pcie: add helper to declare PASID capability for a pcie device
>   pcie: helper functions to check if PASID and ATS are enabled
>   intel_iommu: declare supported PASID size
>   pci: cache the bus mastering status in the device
>   pci: add IOMMU operations to get address spaces and memory regions
>     with PASID
>   memory: store user data pointer in the IOMMU notifiers
>   pci: add a pci-level initialization function for iommu notifiers
>   intel_iommu: implement the get_address_space_pasid iommu operation
>   intel_iommu: implement the get_memory_region_pasid iommu operation
>   memory: Allow to store the PASID in IOMMUTLBEntry
>   intel_iommu: fill the PASID field when creating an instance of
>     IOMMUTLBEntry
>   atc: generic ATC that can be used by PCIe devices that support SVM
>   atc: add unit tests
>   memory: add an API for ATS support
>   pci: add a pci-level API for ATS
>   intel_iommu: set the address mask even when a translation fails
>   intel_iommu: add support for ATS
>
>  hw/i386/intel_iommu.c                     | 142 +++++-
>  hw/i386/intel_iommu_internal.h            |   6 +-
>  hw/pci/pci.c                              | 127 +++++-
>  hw/pci/pcie.c                             |  42 ++
>  include/exec/memory.h                     |  51 ++-
>  include/hw/i386/intel_iommu.h             |   2 +-
>  include/hw/pci/pci.h                      | 101 +++++
>  include/hw/pci/pci_device.h               |   1 +
>  include/hw/pci/pcie.h                     |   9 +-
>  include/hw/pci/pcie_regs.h                |   3 +
>  include/standard-headers/linux/pci_regs.h |   1 +
>  system/memory.c                           |  20 +
>  tests/unit/meson.build                    |   1 +
>  tests/unit/test-atc.c                     | 527 ++++++++++++++++++++++
>  util/atc.c                                | 211 +++++++++
>  util/atc.h                                | 117 +++++
>  util/meson.build                          |   1 +
>  17 files changed, 1330 insertions(+), 32 deletions(-)
>  create mode 100644 tests/unit/test-atc.c
>  create mode 100644 util/atc.c
>  create mode 100644 util/atc.h
>
> --
> 2.45.1


[-- Attachment #2: Type: text/html, Size: 13061 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
  2024-07-02  5:57   ` CLEMENT MATHIEU--DRIF
@ 2024-07-02 12:15     ` Michael S. Tsirkin
  2024-07-02 13:42       ` Yi Liu
  0 siblings, 1 reply; 61+ messages in thread
From: Michael S. Tsirkin @ 2024-07-02 12:15 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com,
	zhenzhong.duan@intel.com, kevin.tian@intel.com,
	yi.l.liu@intel.com, joao.m.martins@oracle.com, peterx@redhat.com

On Tue, Jul 02, 2024 at 05:57:57AM +0000, CLEMENT MATHIEU--DRIF wrote:
> 
> 
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: 01 July 2024 22:02
> To: CLEMENT MATHIEU--DRIF <clement.mathieu--drif@eviden.com>
> Cc: qemu-devel@nongnu.org <qemu-devel@nongnu.org>; jasowang@redhat.com
> <jasowang@redhat.com>; zhenzhong.duan@intel.com <zhenzhong.duan@intel.com>;
> kevin.tian@intel.com <kevin.tian@intel.com>; yi.l.liu@intel.com
> <yi.l.liu@intel.com>; joao.m.martins@oracle.com <joao.m.martins@oracle.com>;
> peterx@redhat.com <peterx@redhat.com>
> Subject: Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
>  
> Caution: External email. Do not open attachments or click links, unless this
> email comes from a known sender and you know the content is safe.
> 
> 
> On Mon, Jun 03, 2024 at 05:59:38AM +0000, CLEMENT MATHIEU--DRIF wrote:
> > From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> >
> > This series belongs to a list of series that add SVM support for VT-d.
> >
> > As a starting point, we use the series called 'intel_iommu: Enable stage-1
> translation' (rfc2) by Zhenzhong Duan and Yi Liu.
> >
> > Here we focus on the implementation of ATS support in the IOMMU and on a
> PCI-level
> > API for ATS to be used by virtual devices.
> >
> > This work is based on the VT-d specification version 4.1 (March 2023).
> > Here is a link to a GitHub repository where you can find the following
> elements :
> >     - Qemu with all the patches for SVM
> >         - ATS
> >         - PRI
> >         - Device IOTLB invalidations
> >         - Requests with already translated addresses
> >     - A demo device
> >     - A simple driver for the demo device
> >     - A userspace program (for testing and demonstration purposes)
> >
> > https://eur06.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fgithub.com%2FBullSequana%2FQemu-in-guest-SVM-demo&data=
> 05%7C02%7Cclement.mathieu--drif%40eviden.com%7Cf5759aefcc5f4e7d4e6c08dc9a08d29a%7C7d1c77852d8a437db8421ed5d8fbe00a%7C0%7C0%7C638554609882544195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C
> &sdata=2Gza1VD7hKr1Sx3fOLoRh6tk3taSPKTn5nfimhPLz70%3D&reserved=0
> 
> I will merge, but could you please resend this using git format-patch
> for formatting?  The patches have trailing CRs and don't show which sha1
> they are for, which makes re-applying them after each change painful.
> 
> 
> 
> Hi Michael,
> I sent the series again without the trailing new line.
> Tell me if it's better.
> 
> Is Zhenzhong's FLTS series merged? If not, it might the cause of the sha1
> problem you are facing

I don't think I have FLTS in any queue.

If your series has a dependency please specify this in
the cover letter.

Alternatively just include the dependency in the posting.





> Thanks
> >cmd
> 
> 
> > v2
> >     - handle huge pages better by detecting the page table level at which the
> translation errors occur
> >     - Changes after review by ZhenZhong Duan :
> >       - Set the access bit after checking permissions
> >       - helper for PASID and ATS : make the commit message more accurate
> ('present' replaced with 'enabled')
> >       - pcie_pasid_init: add PCI_PASID_CAP_WIDTH_SHIFT and use it instead of
> PCI_EXT_CAP_PASID_SIZEOF for shifting the pasid width when preparing the
> capability register
> >       - pci: do not check pci_bus_bypass_iommu after calling
> pci_device_get_iommu_bus_devfn
> >       - do not alter formatting of IOMMUTLBEntry declaration
> >       - vtd_iova_fl_check_canonical : directly use s->aw_bits instead of aw
> for the sake of clarity
> >
> > v3
> >     - rebase on new version of Zhenzhong's flts implementation
> >     - fix the atc lookup operation (check the mask before returning an entry)
> >     - add a unit test for the ATC
> >     - store a user pointer in the iommu notifiers to simplify the
> implementation of svm devices
> >     Changes after review by Zhenzhong :
> >       - store the input pasid instead of rid2pasid when returning an entry
> after a translation
> >       - split the ATC implementation and its unit tests
> >
> > v4
> >     Changes after internal review
> >       - Fix the nowrite optimization, an ATS translation without the nowrite
> flag should not fail when the write permission is not set
> >
> > v5
> >     Changes after review by Philippe :
> >       - change the type of 'level' to unsigned in vtd_lookup_iotlb
> >
> >
> >
> > Clément Mathieu--Drif (22):
> >   intel_iommu: fix FRCD construction macro.
> >   intel_iommu: make types match
> >   intel_iommu: return page walk level even when the translation fails
> >   intel_iommu: do not consider wait_desc as an invalid descriptor
> >   memory: add permissions in IOMMUAccessFlags
> >   pcie: add helper to declare PASID capability for a pcie device
> >   pcie: helper functions to check if PASID and ATS are enabled
> >   intel_iommu: declare supported PASID size
> >   pci: cache the bus mastering status in the device
> >   pci: add IOMMU operations to get address spaces and memory regions
> >     with PASID
> >   memory: store user data pointer in the IOMMU notifiers
> >   pci: add a pci-level initialization function for iommu notifiers
> >   intel_iommu: implement the get_address_space_pasid iommu operation
> >   intel_iommu: implement the get_memory_region_pasid iommu operation
> >   memory: Allow to store the PASID in IOMMUTLBEntry
> >   intel_iommu: fill the PASID field when creating an instance of
> >     IOMMUTLBEntry
> >   atc: generic ATC that can be used by PCIe devices that support SVM
> >   atc: add unit tests
> >   memory: add an API for ATS support
> >   pci: add a pci-level API for ATS
> >   intel_iommu: set the address mask even when a translation fails
> >   intel_iommu: add support for ATS
> >
> >  hw/i386/intel_iommu.c                     | 142 +++++-
> >  hw/i386/intel_iommu_internal.h            |   6 +-
> >  hw/pci/pci.c                              | 127 +++++-
> >  hw/pci/pcie.c                             |  42 ++
> >  include/exec/memory.h                     |  51 ++-
> >  include/hw/i386/intel_iommu.h             |   2 +-
> >  include/hw/pci/pci.h                      | 101 +++++
> >  include/hw/pci/pci_device.h               |   1 +
> >  include/hw/pci/pcie.h                     |   9 +-
> >  include/hw/pci/pcie_regs.h                |   3 +
> >  include/standard-headers/linux/pci_regs.h |   1 +
> >  system/memory.c                           |  20 +
> >  tests/unit/meson.build                    |   1 +
> >  tests/unit/test-atc.c                     | 527 ++++++++++++++++++++++
> >  util/atc.c                                | 211 +++++++++
> >  util/atc.h                                | 117 +++++
> >  util/meson.build                          |   1 +
> >  17 files changed, 1330 insertions(+), 32 deletions(-)
> >  create mode 100644 tests/unit/test-atc.c
> >  create mode 100644 util/atc.c
> >  create mode 100644 util/atc.h
> >
> > --
> > 2.45.1
> 



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (21 preceding siblings ...)
  2024-07-02  5:52 ` [PATCH ats_vtd v5 22/22] intel_iommu: add support for ATS CLEMENT MATHIEU--DRIF
@ 2024-07-02 12:16 ` Michael S. Tsirkin
  2024-07-02 15:09   ` CLEMENT MATHIEU--DRIF
  2024-07-02 13:44 ` Yi Liu
  2024-07-03 12:32 ` Yi Liu
  24 siblings, 1 reply; 61+ messages in thread
From: Michael S. Tsirkin @ 2024-07-02 12:16 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com,
	zhenzhong.duan@intel.com, kevin.tian@intel.com,
	yi.l.liu@intel.com, joao.m.martins@oracle.com, peterx@redhat.com,
	Clement Mathieu--Drif

On Tue, Jul 02, 2024 at 05:52:29AM +0000, CLEMENT MATHIEU--DRIF wrote:
> From: Clement Mathieu--Drif <cmdetu@gmail.com>
> 
> This series belongs to a list of series that add SVM support for VT-d.

You don't need ats_vtd as a tag, I think, so if it's helpful
for someone, I don't mind. What you do need is "repost" so
people know how it's related to your previous v5 of the
same patchset.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 01/22] intel_iommu: fix FRCD construction macro.
  2024-07-02  5:52 ` [PATCH ats_vtd v5 01/22] intel_iommu: fix FRCD construction macro CLEMENT MATHIEU--DRIF
@ 2024-07-02 13:01   ` Yi Liu
  2024-07-02 15:10     ` CLEMENT MATHIEU--DRIF
  0 siblings, 1 reply; 61+ messages in thread
From: Yi Liu @ 2024-07-02 13:01 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com

On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> 
> The constant must be unsigned, otherwise the two's complement
> overrides the other fields when a PASID is present

does it need a fix tag since it overrides the other fields?

Reviewed-by: Yi Liu <yi.l.liu@intel.com>

> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> ---
>   hw/i386/intel_iommu_internal.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index e8396575eb..b19f14ef63 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -272,7 +272,7 @@
>   /* For the low 64-bit of 128-bit */
>   #define VTD_FRCD_FI(val)        ((val) & ~0xfffULL)
>   #define VTD_FRCD_PV(val)        (((val) & 0xffffULL) << 40)
> -#define VTD_FRCD_PP(val)        (((val) & 0x1) << 31)
> +#define VTD_FRCD_PP(val)        (((val) & 0x1ULL) << 31)
>   #define VTD_FRCD_IR_IDX(val)    (((val) & 0xffffULL) << 48)
>   
>   /* DMA Remapping Fault Conditions */

-- 
Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 02/22] intel_iommu: make types match
  2024-07-02  5:52 ` [PATCH ats_vtd v5 02/22] intel_iommu: make types match CLEMENT MATHIEU--DRIF
@ 2024-07-02 13:20   ` Yi Liu
  0 siblings, 0 replies; 61+ messages in thread
From: Yi Liu @ 2024-07-02 13:20 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com

On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> 
> The 'level' field in vtd_iotlb_key is an unsigned integer.
> We don't need to store level as an int in vtd_lookup_iotlb.
> 
> VTDIOTLBPageInvInfo.mask is used in binary operations with addresses.

Reviewed-by: Yi Liu <yi.l.liu@intel.com>

> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> ---
>   hw/i386/intel_iommu.c          | 2 +-
>   hw/i386/intel_iommu_internal.h | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index c3c0ecca71..c6474ae735 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -417,7 +417,7 @@ static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
>   {
>       struct vtd_iotlb_key key;
>       VTDIOTLBEntry *entry;
> -    int level;
> +    unsigned level;
>   
>       for (level = VTD_PT_LEVEL; level < VTD_PML4_LEVEL; level++) {
>           key.gfn = vtd_get_iotlb_gfn(addr, level);
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index b19f14ef63..bd20746318 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -506,7 +506,7 @@ struct VTDIOTLBPageInvInfo {
>       uint16_t domain_id;
>       uint32_t pasid;
>       uint64_t addr;
> -    uint8_t mask;
> +    uint64_t mask;
>   };
>   typedef struct VTDIOTLBPageInvInfo VTDIOTLBPageInvInfo;
>   

-- 
Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 04/22] intel_iommu: do not consider wait_desc as an invalid descriptor
  2024-07-02  5:52 ` [PATCH ats_vtd v5 04/22] intel_iommu: do not consider wait_desc as an invalid descriptor CLEMENT MATHIEU--DRIF
@ 2024-07-02 13:33   ` Yi Liu
  2024-07-02 15:29     ` CLEMENT MATHIEU--DRIF
  0 siblings, 1 reply; 61+ messages in thread
From: Yi Liu @ 2024-07-02 13:33 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com

On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> 
> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   hw/i386/intel_iommu.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 98996ededc..71cebe2fd3 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3500,6 +3500,11 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
>       } else if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
>           /* Interrupt flag */
>           vtd_generate_completion_event(s);
> +    } else if (inv_desc->lo & VTD_INV_DESC_WAIT_FN) {
> +        /*
> +         * SW = 0, IF = 0, FN = 1
> +         * Nothing to do as we process the events sequentially
> +         */

This code looks a bit weird. SW field does not co-exist with IF. But either 
SW or IF can co-exist with FN flag. Is it? Have you already seen a wait 
descriptor that only has FN flag set but no SW nor IF flag?

>       } else {
>           error_report_once("%s: invalid wait desc: hi=%"PRIx64", lo=%"PRIx64
>                             " (unknown type)", __func__, inv_desc->hi,

-- 
Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
  2024-07-02 12:15     ` Michael S. Tsirkin
@ 2024-07-02 13:42       ` Yi Liu
  2024-07-02 15:27         ` CLEMENT MATHIEU--DRIF
  0 siblings, 1 reply; 61+ messages in thread
From: Yi Liu @ 2024-07-02 13:42 UTC (permalink / raw)
  To: Michael S. Tsirkin, CLEMENT MATHIEU--DRIF
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com,
	zhenzhong.duan@intel.com, kevin.tian@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com

On 2024/7/2 20:15, Michael S. Tsirkin wrote:
> On Tue, Jul 02, 2024 at 05:57:57AM +0000, CLEMENT MATHIEU--DRIF wrote:
>>
>>
>> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>> From: Michael S. Tsirkin <mst@redhat.com>
>> Sent: 01 July 2024 22:02
>> To: CLEMENT MATHIEU--DRIF <clement.mathieu--drif@eviden.com>
>> Cc: qemu-devel@nongnu.org <qemu-devel@nongnu.org>; jasowang@redhat.com
>> <jasowang@redhat.com>; zhenzhong.duan@intel.com <zhenzhong.duan@intel.com>;
>> kevin.tian@intel.com <kevin.tian@intel.com>; yi.l.liu@intel.com
>> <yi.l.liu@intel.com>; joao.m.martins@oracle.com <joao.m.martins@oracle.com>;
>> peterx@redhat.com <peterx@redhat.com>
>> Subject: Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
>>   
>> Caution: External email. Do not open attachments or click links, unless this
>> email comes from a known sender and you know the content is safe.
>>
>>
>> On Mon, Jun 03, 2024 at 05:59:38AM +0000, CLEMENT MATHIEU--DRIF wrote:
>>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>>
>>> This series belongs to a list of series that add SVM support for VT-d.
>>>
>>> As a starting point, we use the series called 'intel_iommu: Enable stage-1
>> translation' (rfc2) by Zhenzhong Duan and Yi Liu.
>>>
>>> Here we focus on the implementation of ATS support in the IOMMU and on a
>> PCI-level
>>> API for ATS to be used by virtual devices.
>>>
>>> This work is based on the VT-d specification version 4.1 (March 2023).
>>> Here is a link to a GitHub repository where you can find the following
>> elements :
>>>      - Qemu with all the patches for SVM
>>>          - ATS
>>>          - PRI
>>>          - Device IOTLB invalidations
>>>          - Requests with already translated addresses
>>>      - A demo device
>>>      - A simple driver for the demo device
>>>      - A userspace program (for testing and demonstration purposes)
>>>
>>> https://eur06.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fgithub.com%2FBullSequana%2FQemu-in-guest-SVM-demo&data=
>> 05%7C02%7Cclement.mathieu--drif%40eviden.com%7Cf5759aefcc5f4e7d4e6c08dc9a08d29a%7C7d1c77852d8a437db8421ed5d8fbe00a%7C0%7C0%7C638554609882544195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C
>> &sdata=2Gza1VD7hKr1Sx3fOLoRh6tk3taSPKTn5nfimhPLz70%3D&reserved=0
>>
>> I will merge, but could you please resend this using git format-patch
>> for formatting?  The patches have trailing CRs and don't show which sha1
>> they are for, which makes re-applying them after each change painful.
>>
>>
>>
>> Hi Michael,
>> I sent the series again without the trailing new line.
>> Tell me if it's better.
>>
>> Is Zhenzhong's FLTS series merged? If not, it might the cause of the sha1
>> problem you are facing
> 
> I don't think I have FLTS in any queue.
> 
> If your series has a dependency please specify this in
> the cover letter.
> 
> Alternatively just include the dependency in the posting.

seems this is the dependency.

https://lore.kernel.org/qemu-devel/20240522062313.453317-1-zhenzhong.duan@intel.com/#t

> 
> 
> 
> 
>> Thanks
>>> cmd
>>
>>
>>> v2
>>>      - handle huge pages better by detecting the page table level at which the
>> translation errors occur
>>>      - Changes after review by ZhenZhong Duan :
>>>        - Set the access bit after checking permissions
>>>        - helper for PASID and ATS : make the commit message more accurate
>> ('present' replaced with 'enabled')
>>>        - pcie_pasid_init: add PCI_PASID_CAP_WIDTH_SHIFT and use it instead of
>> PCI_EXT_CAP_PASID_SIZEOF for shifting the pasid width when preparing the
>> capability register
>>>        - pci: do not check pci_bus_bypass_iommu after calling
>> pci_device_get_iommu_bus_devfn
>>>        - do not alter formatting of IOMMUTLBEntry declaration
>>>        - vtd_iova_fl_check_canonical : directly use s->aw_bits instead of aw
>> for the sake of clarity
>>>
>>> v3
>>>      - rebase on new version of Zhenzhong's flts implementation
>>>      - fix the atc lookup operation (check the mask before returning an entry)
>>>      - add a unit test for the ATC
>>>      - store a user pointer in the iommu notifiers to simplify the
>> implementation of svm devices
>>>      Changes after review by Zhenzhong :
>>>        - store the input pasid instead of rid2pasid when returning an entry
>> after a translation
>>>        - split the ATC implementation and its unit tests
>>>
>>> v4
>>>      Changes after internal review
>>>        - Fix the nowrite optimization, an ATS translation without the nowrite
>> flag should not fail when the write permission is not set
>>>
>>> v5
>>>      Changes after review by Philippe :
>>>        - change the type of 'level' to unsigned in vtd_lookup_iotlb
>>>
>>>
>>>
>>> Clément Mathieu--Drif (22):
>>>    intel_iommu: fix FRCD construction macro.
>>>    intel_iommu: make types match
>>>    intel_iommu: return page walk level even when the translation fails
>>>    intel_iommu: do not consider wait_desc as an invalid descriptor
>>>    memory: add permissions in IOMMUAccessFlags
>>>    pcie: add helper to declare PASID capability for a pcie device
>>>    pcie: helper functions to check if PASID and ATS are enabled
>>>    intel_iommu: declare supported PASID size
>>>    pci: cache the bus mastering status in the device
>>>    pci: add IOMMU operations to get address spaces and memory regions
>>>      with PASID
>>>    memory: store user data pointer in the IOMMU notifiers
>>>    pci: add a pci-level initialization function for iommu notifiers
>>>    intel_iommu: implement the get_address_space_pasid iommu operation
>>>    intel_iommu: implement the get_memory_region_pasid iommu operation
>>>    memory: Allow to store the PASID in IOMMUTLBEntry
>>>    intel_iommu: fill the PASID field when creating an instance of
>>>      IOMMUTLBEntry
>>>    atc: generic ATC that can be used by PCIe devices that support SVM
>>>    atc: add unit tests
>>>    memory: add an API for ATS support
>>>    pci: add a pci-level API for ATS
>>>    intel_iommu: set the address mask even when a translation fails
>>>    intel_iommu: add support for ATS
>>>
>>>   hw/i386/intel_iommu.c                     | 142 +++++-
>>>   hw/i386/intel_iommu_internal.h            |   6 +-
>>>   hw/pci/pci.c                              | 127 +++++-
>>>   hw/pci/pcie.c                             |  42 ++
>>>   include/exec/memory.h                     |  51 ++-
>>>   include/hw/i386/intel_iommu.h             |   2 +-
>>>   include/hw/pci/pci.h                      | 101 +++++
>>>   include/hw/pci/pci_device.h               |   1 +
>>>   include/hw/pci/pcie.h                     |   9 +-
>>>   include/hw/pci/pcie_regs.h                |   3 +
>>>   include/standard-headers/linux/pci_regs.h |   1 +
>>>   system/memory.c                           |  20 +
>>>   tests/unit/meson.build                    |   1 +
>>>   tests/unit/test-atc.c                     | 527 ++++++++++++++++++++++
>>>   util/atc.c                                | 211 +++++++++
>>>   util/atc.h                                | 117 +++++
>>>   util/meson.build                          |   1 +
>>>   17 files changed, 1330 insertions(+), 32 deletions(-)
>>>   create mode 100644 tests/unit/test-atc.c
>>>   create mode 100644 util/atc.c
>>>   create mode 100644 util/atc.h
>>>
>>> --
>>> 2.45.1
>>
> 

-- 
Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (22 preceding siblings ...)
  2024-07-02 12:16 ` [PATCH ats_vtd v5 00/22] ATS support for VT-d Michael S. Tsirkin
@ 2024-07-02 13:44 ` Yi Liu
  2024-07-02 15:12   ` CLEMENT MATHIEU--DRIF
  2024-07-03 12:32 ` Yi Liu
  24 siblings, 1 reply; 61+ messages in thread
From: Yi Liu @ 2024-07-02 13:44 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com, Clement Mathieu--Drif

On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
> From: Clement Mathieu--Drif <cmdetu@gmail.com>
> 
> This series belongs to a list of series that add SVM support for VT-d.
> 
> As a starting point, we use the series called 'intel_iommu: Enable stage-1 translation' (rfc2) by Zhenzhong Duan and Yi Liu.
> 
> Here we focus on the implementation of ATS support in the IOMMU and on a PCI-level
> API for ATS to be used by virtual devices.
> 
> This work is based on the VT-d specification version 4.1 (March 2023).
> Here is a link to a GitHub repository where you can find the following elements :
>      - Qemu with all the patches for SVM
>          - ATS
>          - PRI
>          - Device IOTLB invalidations
>          - Requests with already translated addresses
>      - A demo device
>      - A simple driver for the demo device
>      - A userspace program (for testing and demonstration purposes)
> 
> https://github.com/BullSequana/Qemu-in-guest-SVM-demo
> 
> v2
>      - handle huge pages better by detecting the page table level at which the translation errors occur
>      - Changes after review by ZhenZhong Duan :
>      	- Set the access bit after checking permissions
>      	- helper for PASID and ATS : make the commit message more accurate ('present' replaced with 'enabled')
>      	- pcie_pasid_init: add PCI_PASID_CAP_WIDTH_SHIFT and use it instead of PCI_EXT_CAP_PASID_SIZEOF for shifting the pasid width when preparing the capability register
>      	- pci: do not check pci_bus_bypass_iommu after calling pci_device_get_iommu_bus_devfn
>      	- do not alter formatting of IOMMUTLBEntry declaration
>      	- vtd_iova_fl_check_canonical : directly use s->aw_bits instead of aw for the sake of clarity
> 
> v3
>      - rebase on new version of Zhenzhong's flts implementation
>      - fix the atc lookup operation (check the mask before returning an entry)
>      - add a unit test for the ATC
>      - store a user pointer in the iommu notifiers to simplify the implementation of svm devices
>      Changes after review by Zhenzhong :
>      	- store the input pasid instead of rid2pasid when returning an entry after a translation
>      	- split the ATC implementation and its unit tests
> 
> v4
>      Changes after internal review
>      	- Fix the nowrite optimization, an ATS translation without the nowrite flag should not fail when the write permission is not set
> 
> v5
>      Changes after review by Philippe :
>      	- change the type of 'level' to unsigned in vtd_lookup_iotlb

Hi CMD,

I saw two v5 in my inbox, are they the same? :)

> Clément Mathieu--Drif (22):
>    intel_iommu: fix FRCD construction macro.
>    intel_iommu: make types match
>    intel_iommu: return page walk level even when the translation fails
>    intel_iommu: do not consider wait_desc as an invalid descriptor
>    memory: add permissions in IOMMUAccessFlags
>    pcie: add helper to declare PASID capability for a pcie device
>    pcie: helper functions to check if PASID and ATS are enabled
>    intel_iommu: declare supported PASID size
>    pci: cache the bus mastering status in the device
>    pci: add IOMMU operations to get address spaces and memory regions
>      with PASID
>    memory: store user data pointer in the IOMMU notifiers
>    pci: add a pci-level initialization function for iommu notifiers
>    intel_iommu: implement the get_address_space_pasid iommu operation
>    intel_iommu: implement the get_memory_region_pasid iommu operation
>    memory: Allow to store the PASID in IOMMUTLBEntry
>    intel_iommu: fill the PASID field when creating an instance of
>      IOMMUTLBEntry
>    atc: generic ATC that can be used by PCIe devices that support SVM
>    atc: add unit tests
>    memory: add an API for ATS support
>    pci: add a pci-level API for ATS
>    intel_iommu: set the address mask even when a translation fails
>    intel_iommu: add support for ATS
> 
>   hw/i386/intel_iommu.c                     | 146 +++++-
>   hw/i386/intel_iommu_internal.h            |   6 +-
>   hw/pci/pci.c                              | 127 +++++-
>   hw/pci/pcie.c                             |  42 ++
>   include/exec/memory.h                     |  51 ++-
>   include/hw/i386/intel_iommu.h             |   2 +-
>   include/hw/pci/pci.h                      | 101 +++++
>   include/hw/pci/pci_device.h               |   1 +
>   include/hw/pci/pcie.h                     |   9 +-
>   include/hw/pci/pcie_regs.h                |   3 +
>   include/standard-headers/linux/pci_regs.h |   1 +
>   system/memory.c                           |  20 +
>   tests/unit/meson.build                    |   1 +
>   tests/unit/test-atc.c                     | 527 ++++++++++++++++++++++
>   util/atc.c                                | 211 +++++++++
>   util/atc.h                                | 117 +++++
>   util/meson.build                          |   1 +
>   17 files changed, 1332 insertions(+), 34 deletions(-)
>   create mode 100644 tests/unit/test-atc.c
>   create mode 100644 util/atc.c
>   create mode 100644 util/atc.h
> 

-- 
Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
  2024-07-02 12:16 ` [PATCH ats_vtd v5 00/22] ATS support for VT-d Michael S. Tsirkin
@ 2024-07-02 15:09   ` CLEMENT MATHIEU--DRIF
  0 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02 15:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com,
	zhenzhong.duan@intel.com, kevin.tian@intel.com,
	yi.l.liu@intel.com, joao.m.martins@oracle.com, peterx@redhat.com,
	Clement Mathieu--Drif


On 02/07/2024 14:16, Michael S. Tsirkin wrote:
> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>
>
> On Tue, Jul 02, 2024 at 05:52:29AM +0000, CLEMENT MATHIEU--DRIF wrote:
>> From: Clement Mathieu--Drif <cmdetu@gmail.com>
>>
>> This series belongs to a list of series that add SVM support for VT-d.
> You don't need ats_vtd as a tag, I think, so if it's helpful
> for someone, I don't mind. What you do need is "repost" so
> people know how it's related to your previous v5 of the
> same patchset.
>
Ok fine, I will remove it in future versions, sorry

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 01/22] intel_iommu: fix FRCD construction macro.
  2024-07-02 13:01   ` Yi Liu
@ 2024-07-02 15:10     ` CLEMENT MATHIEU--DRIF
  0 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02 15:10 UTC (permalink / raw)
  To: Yi Liu, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com


On 02/07/2024 15:01, Yi Liu wrote:
> Caution: External email. Do not open attachments or click links, 
> unless this email comes from a known sender and you know the content 
> is safe.
>
>
> On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>
>> The constant must be unsigned, otherwise the two's complement
>> overrides the other fields when a PASID is present
>
> does it need a fix tag since it overrides the other fields?
yes, will add the tag
>
> Reviewed-by: Yi Liu <yi.l.liu@intel.com>
>
>> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>> ---
>>   hw/i386/intel_iommu_internal.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/i386/intel_iommu_internal.h 
>> b/hw/i386/intel_iommu_internal.h
>> index e8396575eb..b19f14ef63 100644
>> --- a/hw/i386/intel_iommu_internal.h
>> +++ b/hw/i386/intel_iommu_internal.h
>> @@ -272,7 +272,7 @@
>>   /* For the low 64-bit of 128-bit */
>>   #define VTD_FRCD_FI(val)        ((val) & ~0xfffULL)
>>   #define VTD_FRCD_PV(val)        (((val) & 0xffffULL) << 40)
>> -#define VTD_FRCD_PP(val)        (((val) & 0x1) << 31)
>> +#define VTD_FRCD_PP(val)        (((val) & 0x1ULL) << 31)
>>   #define VTD_FRCD_IR_IDX(val)    (((val) & 0xffffULL) << 48)
>>
>>   /* DMA Remapping Fault Conditions */
>
> -- 
> Regards,
> Yi Liu

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
  2024-07-02 13:44 ` Yi Liu
@ 2024-07-02 15:12   ` CLEMENT MATHIEU--DRIF
  0 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02 15:12 UTC (permalink / raw)
  To: Yi Liu, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com, Clement Mathieu--Drif


On 02/07/2024 15:44, Yi Liu wrote:
> Caution: External email. Do not open attachments or click links,
> unless this email comes from a known sender and you know the content
> is safe.
>
>
> On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
>> From: Clement Mathieu--Drif <cmdetu@gmail.com>
>>
>> This series belongs to a list of series that add SVM support for VT-d.
>>
>> As a starting point, we use the series called 'intel_iommu: Enable
>> stage-1 translation' (rfc2) by Zhenzhong Duan and Yi Liu.
>>
>> Here we focus on the implementation of ATS support in the IOMMU and
>> on a PCI-level
>> API for ATS to be used by virtual devices.
>>
>> This work is based on the VT-d specification version 4.1 (March 2023).
>> Here is a link to a GitHub repository where you can find the
>> following elements :
>>      - Qemu with all the patches for SVM
>>          - ATS
>>          - PRI
>>          - Device IOTLB invalidations
>>          - Requests with already translated addresses
>>      - A demo device
>>      - A simple driver for the demo device
>>      - A userspace program (for testing and demonstration purposes)
>>
>> https://github.com/BullSequana/Qemu-in-guest-SVM-demo
>>
>>
>> v2
>>      - handle huge pages better by detecting the page table level at
>> which the translation errors occur
>>      - Changes after review by ZhenZhong Duan :
>>       - Set the access bit after checking permissions
>>       - helper for PASID and ATS : make the commit message more
>> accurate ('present' replaced with 'enabled')
>>       - pcie_pasid_init: add PCI_PASID_CAP_WIDTH_SHIFT and use it
>> instead of PCI_EXT_CAP_PASID_SIZEOF for shifting the pasid width when
>> preparing the capability register
>>       - pci: do not check pci_bus_bypass_iommu after calling
>> pci_device_get_iommu_bus_devfn
>>       - do not alter formatting of IOMMUTLBEntry declaration
>>       - vtd_iova_fl_check_canonical : directly use s->aw_bits instead
>> of aw for the sake of clarity
>>
>> v3
>>      - rebase on new version of Zhenzhong's flts implementation
>>      - fix the atc lookup operation (check the mask before returning
>> an entry)
>>      - add a unit test for the ATC
>>      - store a user pointer in the iommu notifiers to simplify the
>> implementation of svm devices
>>      Changes after review by Zhenzhong :
>>       - store the input pasid instead of rid2pasid when returning an
>> entry after a translation
>>       - split the ATC implementation and its unit tests
>>
>> v4
>>      Changes after internal review
>>       - Fix the nowrite optimization, an ATS translation without the
>> nowrite flag should not fail when the write permission is not set
>>
>> v5
>>      Changes after review by Philippe :
>>       - change the type of 'level' to unsigned in vtd_lookup_iotlb
>
> Hi CMD,
>
> I saw two v5 in my inbox, are they the same? :)

Hi,

No, it's a resend following a request by Michael, sorry for that

>
>> Clément Mathieu--Drif (22):
>>    intel_iommu: fix FRCD construction macro.
>>    intel_iommu: make types match
>>    intel_iommu: return page walk level even when the translation fails
>>    intel_iommu: do not consider wait_desc as an invalid descriptor
>>    memory: add permissions in IOMMUAccessFlags
>>    pcie: add helper to declare PASID capability for a pcie device
>>    pcie: helper functions to check if PASID and ATS are enabled
>>    intel_iommu: declare supported PASID size
>>    pci: cache the bus mastering status in the device
>>    pci: add IOMMU operations to get address spaces and memory regions
>>      with PASID
>>    memory: store user data pointer in the IOMMU notifiers
>>    pci: add a pci-level initialization function for iommu notifiers
>>    intel_iommu: implement the get_address_space_pasid iommu operation
>>    intel_iommu: implement the get_memory_region_pasid iommu operation
>>    memory: Allow to store the PASID in IOMMUTLBEntry
>>    intel_iommu: fill the PASID field when creating an instance of
>>      IOMMUTLBEntry
>>    atc: generic ATC that can be used by PCIe devices that support SVM
>>    atc: add unit tests
>>    memory: add an API for ATS support
>>    pci: add a pci-level API for ATS
>>    intel_iommu: set the address mask even when a translation fails
>>    intel_iommu: add support for ATS
>>
>>   hw/i386/intel_iommu.c                     | 146 +++++-
>>   hw/i386/intel_iommu_internal.h            |   6 +-
>>   hw/pci/pci.c                              | 127 +++++-
>>   hw/pci/pcie.c                             |  42 ++
>>   include/exec/memory.h                     |  51 ++-
>>   include/hw/i386/intel_iommu.h             |   2 +-
>>   include/hw/pci/pci.h                      | 101 +++++
>>   include/hw/pci/pci_device.h               |   1 +
>>   include/hw/pci/pcie.h                     |   9 +-
>>   include/hw/pci/pcie_regs.h                |   3 +
>>   include/standard-headers/linux/pci_regs.h |   1 +
>>   system/memory.c                           |  20 +
>>   tests/unit/meson.build                    |   1 +
>>   tests/unit/test-atc.c                     | 527 ++++++++++++++++++++++
>>   util/atc.c                                | 211 +++++++++
>>   util/atc.h                                | 117 +++++
>>   util/meson.build                          |   1 +
>>   17 files changed, 1332 insertions(+), 34 deletions(-)
>>   create mode 100644 tests/unit/test-atc.c
>>   create mode 100644 util/atc.c
>>   create mode 100644 util/atc.h
>>
>
> --
> Regards,
> Yi Liu

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
  2024-07-02 13:42       ` Yi Liu
@ 2024-07-02 15:27         ` CLEMENT MATHIEU--DRIF
  2024-07-02 15:28           ` Michael S. Tsirkin
  0 siblings, 1 reply; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02 15:27 UTC (permalink / raw)
  To: Yi Liu, Michael S. Tsirkin
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com,
	zhenzhong.duan@intel.com, kevin.tian@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com


On 02/07/2024 15:42, Yi Liu wrote:
> Caution: External email. Do not open attachments or click links,
> unless this email comes from a known sender and you know the content
> is safe.
>
>
> On 2024/7/2 20:15, Michael S. Tsirkin wrote:
>> On Tue, Jul 02, 2024 at 05:57:57AM +0000, CLEMENT MATHIEU--DRIF wrote:
>>>
>>>
>>> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>>>
>>> From: Michael S. Tsirkin <mst@redhat.com>
>>> Sent: 01 July 2024 22:02
>>> To: CLEMENT MATHIEU--DRIF <clement.mathieu--drif@eviden.com>
>>> Cc: qemu-devel@nongnu.org <qemu-devel@nongnu.org>; jasowang@redhat.com
>>> <jasowang@redhat.com>; zhenzhong.duan@intel.com
>>> <zhenzhong.duan@intel.com>;
>>> kevin.tian@intel.com <kevin.tian@intel.com>; yi.l.liu@intel.com
>>> <yi.l.liu@intel.com>; joao.m.martins@oracle.com
>>> <joao.m.martins@oracle.com>;
>>> peterx@redhat.com <peterx@redhat.com>
>>> Subject: Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
>>>
>>> Caution: External email. Do not open attachments or click links,
>>> unless this
>>> email comes from a known sender and you know the content is safe.
>>>
>>>
>>> On Mon, Jun 03, 2024 at 05:59:38AM +0000, CLEMENT MATHIEU--DRIF wrote:
>>>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>>>
>>>> This series belongs to a list of series that add SVM support for VT-d.
>>>>
>>>> As a starting point, we use the series called 'intel_iommu: Enable
>>>> stage-1
>>> translation' (rfc2) by Zhenzhong Duan and Yi Liu.
>>>>
>>>> Here we focus on the implementation of ATS support in the IOMMU and
>>>> on a
>>> PCI-level
>>>> API for ATS to be used by virtual devices.
>>>>
>>>> This work is based on the VT-d specification version 4.1 (March 2023).
>>>> Here is a link to a GitHub repository where you can find the following
>>> elements :
>>>>      - Qemu with all the patches for SVM
>>>>          - ATS
>>>>          - PRI
>>>>          - Device IOTLB invalidations
>>>>          - Requests with already translated addresses
>>>>      - A demo device
>>>>      - A simple driver for the demo device
>>>>      - A userspace program (for testing and demonstration purposes)
>>>>
>>>> https://eur06.safelinks.protection.outlook.com/?url=
>>> https%3A%2F%2Fgithub.com%2FBullSequana%2FQemu-in-guest-SVM-demo&data=
>>> 05%7C02%7Cclement.mathieu--drif%40eviden.com%7Cf5759aefcc5f4e7d4e6c08dc9a08d29a%7C7d1c77852d8a437db8421ed5d8fbe00a%7C0%7C0%7C638554609882544195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C
>>>
>>> &sdata=2Gza1VD7hKr1Sx3fOLoRh6tk3taSPKTn5nfimhPLz70%3D&reserved=0
>>>
>>> I will merge, but could you please resend this using git format-patch
>>> for formatting?  The patches have trailing CRs and don't show which
>>> sha1
>>> they are for, which makes re-applying them after each change painful.
>>>
>>>
>>>
>>> Hi Michael,
>>> I sent the series again without the trailing new line.
>>> Tell me if it's better.
>>>
>>> Is Zhenzhong's FLTS series merged? If not, it might the cause of the
>>> sha1
>>> problem you are facing
>>
>> I don't think I have FLTS in any queue.
>>
>> If your series has a dependency please specify this in
>> the cover letter.
>>
>> Alternatively just include the dependency in the posting.
>
> seems this is the dependency.
>
> https://lore.kernel.org/qemu-devel/20240522062313.453317-1-zhenzhong.duan@intel.com/#t
>
>
Sorry if I didn't make it clear.

As mentioned in the cover letter, this series is based on Zhenzhong's
and Yi's FLTS implementation which (AFAIK) has only be posted as an RFC
so far (keep me up to date please).

v5 is based on that branch :
https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_rfcv2

>>
>>
>>
>>
>>> Thanks
>>>> cmd
>>>
>>>
>>>> v2
>>>>      - handle huge pages better by detecting the page table level
>>>> at which the
>>> translation errors occur
>>>>      - Changes after review by ZhenZhong Duan :
>>>>        - Set the access bit after checking permissions
>>>>        - helper for PASID and ATS : make the commit message more
>>>> accurate
>>> ('present' replaced with 'enabled')
>>>>        - pcie_pasid_init: add PCI_PASID_CAP_WIDTH_SHIFT and use it
>>>> instead of
>>> PCI_EXT_CAP_PASID_SIZEOF for shifting the pasid width when preparing
>>> the
>>> capability register
>>>>        - pci: do not check pci_bus_bypass_iommu after calling
>>> pci_device_get_iommu_bus_devfn
>>>>        - do not alter formatting of IOMMUTLBEntry declaration
>>>>        - vtd_iova_fl_check_canonical : directly use s->aw_bits
>>>> instead of aw
>>> for the sake of clarity
>>>>
>>>> v3
>>>>      - rebase on new version of Zhenzhong's flts implementation
>>>>      - fix the atc lookup operation (check the mask before
>>>> returning an entry)
>>>>      - add a unit test for the ATC
>>>>      - store a user pointer in the iommu notifiers to simplify the
>>> implementation of svm devices
>>>>      Changes after review by Zhenzhong :
>>>>        - store the input pasid instead of rid2pasid when returning
>>>> an entry
>>> after a translation
>>>>        - split the ATC implementation and its unit tests
>>>>
>>>> v4
>>>>      Changes after internal review
>>>>        - Fix the nowrite optimization, an ATS translation without
>>>> the nowrite
>>> flag should not fail when the write permission is not set
>>>>
>>>> v5
>>>>      Changes after review by Philippe :
>>>>        - change the type of 'level' to unsigned in vtd_lookup_iotlb
>>>>
>>>>
>>>>
>>>> Clément Mathieu--Drif (22):
>>>>    intel_iommu: fix FRCD construction macro.
>>>>    intel_iommu: make types match
>>>>    intel_iommu: return page walk level even when the translation fails
>>>>    intel_iommu: do not consider wait_desc as an invalid descriptor
>>>>    memory: add permissions in IOMMUAccessFlags
>>>>    pcie: add helper to declare PASID capability for a pcie device
>>>>    pcie: helper functions to check if PASID and ATS are enabled
>>>>    intel_iommu: declare supported PASID size
>>>>    pci: cache the bus mastering status in the device
>>>>    pci: add IOMMU operations to get address spaces and memory regions
>>>>      with PASID
>>>>    memory: store user data pointer in the IOMMU notifiers
>>>>    pci: add a pci-level initialization function for iommu notifiers
>>>>    intel_iommu: implement the get_address_space_pasid iommu operation
>>>>    intel_iommu: implement the get_memory_region_pasid iommu operation
>>>>    memory: Allow to store the PASID in IOMMUTLBEntry
>>>>    intel_iommu: fill the PASID field when creating an instance of
>>>>      IOMMUTLBEntry
>>>>    atc: generic ATC that can be used by PCIe devices that support SVM
>>>>    atc: add unit tests
>>>>    memory: add an API for ATS support
>>>>    pci: add a pci-level API for ATS
>>>>    intel_iommu: set the address mask even when a translation fails
>>>>    intel_iommu: add support for ATS
>>>>
>>>>   hw/i386/intel_iommu.c                     | 142 +++++-
>>>>   hw/i386/intel_iommu_internal.h            |   6 +-
>>>>   hw/pci/pci.c                              | 127 +++++-
>>>>   hw/pci/pcie.c                             |  42 ++
>>>>   include/exec/memory.h                     |  51 ++-
>>>>   include/hw/i386/intel_iommu.h             |   2 +-
>>>>   include/hw/pci/pci.h                      | 101 +++++
>>>>   include/hw/pci/pci_device.h               |   1 +
>>>>   include/hw/pci/pcie.h                     |   9 +-
>>>>   include/hw/pci/pcie_regs.h                |   3 +
>>>>   include/standard-headers/linux/pci_regs.h |   1 +
>>>>   system/memory.c                           |  20 +
>>>>   tests/unit/meson.build                    |   1 +
>>>>   tests/unit/test-atc.c                     | 527
>>>> ++++++++++++++++++++++
>>>>   util/atc.c                                | 211 +++++++++
>>>>   util/atc.h                                | 117 +++++
>>>>   util/meson.build                          |   1 +
>>>>   17 files changed, 1330 insertions(+), 32 deletions(-)
>>>>   create mode 100644 tests/unit/test-atc.c
>>>>   create mode 100644 util/atc.c
>>>>   create mode 100644 util/atc.h
>>>>
>>>> --
>>>> 2.45.1
>>>
>>
>
> --
> Regards,
> Yi Liu

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
  2024-07-02 15:27         ` CLEMENT MATHIEU--DRIF
@ 2024-07-02 15:28           ` Michael S. Tsirkin
  0 siblings, 0 replies; 61+ messages in thread
From: Michael S. Tsirkin @ 2024-07-02 15:28 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF
  Cc: Yi Liu, qemu-devel@nongnu.org, jasowang@redhat.com,
	zhenzhong.duan@intel.com, kevin.tian@intel.com,
	joao.m.martins@oracle.com, peterx@redhat.com

On Tue, Jul 02, 2024 at 03:27:13PM +0000, CLEMENT MATHIEU--DRIF wrote:
> 
> On 02/07/2024 15:42, Yi Liu wrote:
> > Caution: External email. Do not open attachments or click links,
> > unless this email comes from a known sender and you know the content
> > is safe.
> >
> >
> > On 2024/7/2 20:15, Michael S. Tsirkin wrote:
> >> On Tue, Jul 02, 2024 at 05:57:57AM +0000, CLEMENT MATHIEU--DRIF wrote:
> >>>
> >>>
> >>> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> >>>
> >>> From: Michael S. Tsirkin <mst@redhat.com>
> >>> Sent: 01 July 2024 22:02
> >>> To: CLEMENT MATHIEU--DRIF <clement.mathieu--drif@eviden.com>
> >>> Cc: qemu-devel@nongnu.org <qemu-devel@nongnu.org>; jasowang@redhat.com
> >>> <jasowang@redhat.com>; zhenzhong.duan@intel.com
> >>> <zhenzhong.duan@intel.com>;
> >>> kevin.tian@intel.com <kevin.tian@intel.com>; yi.l.liu@intel.com
> >>> <yi.l.liu@intel.com>; joao.m.martins@oracle.com
> >>> <joao.m.martins@oracle.com>;
> >>> peterx@redhat.com <peterx@redhat.com>
> >>> Subject: Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
> >>>
> >>> Caution: External email. Do not open attachments or click links,
> >>> unless this
> >>> email comes from a known sender and you know the content is safe.
> >>>
> >>>
> >>> On Mon, Jun 03, 2024 at 05:59:38AM +0000, CLEMENT MATHIEU--DRIF wrote:
> >>>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> >>>>
> >>>> This series belongs to a list of series that add SVM support for VT-d.
> >>>>
> >>>> As a starting point, we use the series called 'intel_iommu: Enable
> >>>> stage-1
> >>> translation' (rfc2) by Zhenzhong Duan and Yi Liu.
> >>>>
> >>>> Here we focus on the implementation of ATS support in the IOMMU and
> >>>> on a
> >>> PCI-level
> >>>> API for ATS to be used by virtual devices.
> >>>>
> >>>> This work is based on the VT-d specification version 4.1 (March 2023).
> >>>> Here is a link to a GitHub repository where you can find the following
> >>> elements :
> >>>>      - Qemu with all the patches for SVM
> >>>>          - ATS
> >>>>          - PRI
> >>>>          - Device IOTLB invalidations
> >>>>          - Requests with already translated addresses
> >>>>      - A demo device
> >>>>      - A simple driver for the demo device
> >>>>      - A userspace program (for testing and demonstration purposes)
> >>>>
> >>>> https://eur06.safelinks.protection.outlook.com/?url=
> >>> https%3A%2F%2Fgithub.com%2FBullSequana%2FQemu-in-guest-SVM-demo&data=
> >>> 05%7C02%7Cclement.mathieu--drif%40eviden.com%7Cf5759aefcc5f4e7d4e6c08dc9a08d29a%7C7d1c77852d8a437db8421ed5d8fbe00a%7C0%7C0%7C638554609882544195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C
> >>>
> >>> &sdata=2Gza1VD7hKr1Sx3fOLoRh6tk3taSPKTn5nfimhPLz70%3D&reserved=0
> >>>
> >>> I will merge, but could you please resend this using git format-patch
> >>> for formatting?  The patches have trailing CRs and don't show which
> >>> sha1
> >>> they are for, which makes re-applying them after each change painful.
> >>>
> >>>
> >>>
> >>> Hi Michael,
> >>> I sent the series again without the trailing new line.
> >>> Tell me if it's better.
> >>>
> >>> Is Zhenzhong's FLTS series merged? If not, it might the cause of the
> >>> sha1
> >>> problem you are facing
> >>
> >> I don't think I have FLTS in any queue.
> >>
> >> If your series has a dependency please specify this in
> >> the cover letter.
> >>
> >> Alternatively just include the dependency in the posting.
> >
> > seems this is the dependency.
> >
> > https://lore.kernel.org/qemu-devel/20240522062313.453317-1-zhenzhong.duan@intel.com/#t
> >
> >
> Sorry if I didn't make it clear.
> 
> As mentioned in the cover letter, this series is based on Zhenzhong's
> and Yi's FLTS implementation which (AFAIK) has only be posted as an RFC
> so far (keep me up to date please).
> 
> v5 is based on that branch :
> https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_rfcv2

Ah, OK so this is not for merge yet until that patchset is ready.


> >>
> >>
> >>
> >>
> >>> Thanks
> >>>> cmd
> >>>
> >>>
> >>>> v2
> >>>>      - handle huge pages better by detecting the page table level
> >>>> at which the
> >>> translation errors occur
> >>>>      - Changes after review by ZhenZhong Duan :
> >>>>        - Set the access bit after checking permissions
> >>>>        - helper for PASID and ATS : make the commit message more
> >>>> accurate
> >>> ('present' replaced with 'enabled')
> >>>>        - pcie_pasid_init: add PCI_PASID_CAP_WIDTH_SHIFT and use it
> >>>> instead of
> >>> PCI_EXT_CAP_PASID_SIZEOF for shifting the pasid width when preparing
> >>> the
> >>> capability register
> >>>>        - pci: do not check pci_bus_bypass_iommu after calling
> >>> pci_device_get_iommu_bus_devfn
> >>>>        - do not alter formatting of IOMMUTLBEntry declaration
> >>>>        - vtd_iova_fl_check_canonical : directly use s->aw_bits
> >>>> instead of aw
> >>> for the sake of clarity
> >>>>
> >>>> v3
> >>>>      - rebase on new version of Zhenzhong's flts implementation
> >>>>      - fix the atc lookup operation (check the mask before
> >>>> returning an entry)
> >>>>      - add a unit test for the ATC
> >>>>      - store a user pointer in the iommu notifiers to simplify the
> >>> implementation of svm devices
> >>>>      Changes after review by Zhenzhong :
> >>>>        - store the input pasid instead of rid2pasid when returning
> >>>> an entry
> >>> after a translation
> >>>>        - split the ATC implementation and its unit tests
> >>>>
> >>>> v4
> >>>>      Changes after internal review
> >>>>        - Fix the nowrite optimization, an ATS translation without
> >>>> the nowrite
> >>> flag should not fail when the write permission is not set
> >>>>
> >>>> v5
> >>>>      Changes after review by Philippe :
> >>>>        - change the type of 'level' to unsigned in vtd_lookup_iotlb
> >>>>
> >>>>
> >>>>
> >>>> Clément Mathieu--Drif (22):
> >>>>    intel_iommu: fix FRCD construction macro.
> >>>>    intel_iommu: make types match
> >>>>    intel_iommu: return page walk level even when the translation fails
> >>>>    intel_iommu: do not consider wait_desc as an invalid descriptor
> >>>>    memory: add permissions in IOMMUAccessFlags
> >>>>    pcie: add helper to declare PASID capability for a pcie device
> >>>>    pcie: helper functions to check if PASID and ATS are enabled
> >>>>    intel_iommu: declare supported PASID size
> >>>>    pci: cache the bus mastering status in the device
> >>>>    pci: add IOMMU operations to get address spaces and memory regions
> >>>>      with PASID
> >>>>    memory: store user data pointer in the IOMMU notifiers
> >>>>    pci: add a pci-level initialization function for iommu notifiers
> >>>>    intel_iommu: implement the get_address_space_pasid iommu operation
> >>>>    intel_iommu: implement the get_memory_region_pasid iommu operation
> >>>>    memory: Allow to store the PASID in IOMMUTLBEntry
> >>>>    intel_iommu: fill the PASID field when creating an instance of
> >>>>      IOMMUTLBEntry
> >>>>    atc: generic ATC that can be used by PCIe devices that support SVM
> >>>>    atc: add unit tests
> >>>>    memory: add an API for ATS support
> >>>>    pci: add a pci-level API for ATS
> >>>>    intel_iommu: set the address mask even when a translation fails
> >>>>    intel_iommu: add support for ATS
> >>>>
> >>>>   hw/i386/intel_iommu.c                     | 142 +++++-
> >>>>   hw/i386/intel_iommu_internal.h            |   6 +-
> >>>>   hw/pci/pci.c                              | 127 +++++-
> >>>>   hw/pci/pcie.c                             |  42 ++
> >>>>   include/exec/memory.h                     |  51 ++-
> >>>>   include/hw/i386/intel_iommu.h             |   2 +-
> >>>>   include/hw/pci/pci.h                      | 101 +++++
> >>>>   include/hw/pci/pci_device.h               |   1 +
> >>>>   include/hw/pci/pcie.h                     |   9 +-
> >>>>   include/hw/pci/pcie_regs.h                |   3 +
> >>>>   include/standard-headers/linux/pci_regs.h |   1 +
> >>>>   system/memory.c                           |  20 +
> >>>>   tests/unit/meson.build                    |   1 +
> >>>>   tests/unit/test-atc.c                     | 527
> >>>> ++++++++++++++++++++++
> >>>>   util/atc.c                                | 211 +++++++++
> >>>>   util/atc.h                                | 117 +++++
> >>>>   util/meson.build                          |   1 +
> >>>>   17 files changed, 1330 insertions(+), 32 deletions(-)
> >>>>   create mode 100644 tests/unit/test-atc.c
> >>>>   create mode 100644 util/atc.c
> >>>>   create mode 100644 util/atc.h
> >>>>
> >>>> --
> >>>> 2.45.1
> >>>
> >>
> >
> > --
> > Regards,
> > Yi Liu



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 04/22] intel_iommu: do not consider wait_desc as an invalid descriptor
  2024-07-02 13:33   ` Yi Liu
@ 2024-07-02 15:29     ` CLEMENT MATHIEU--DRIF
  2024-07-02 15:40       ` cmd
  2024-07-03  7:29       ` Yi Liu
  0 siblings, 2 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-02 15:29 UTC (permalink / raw)
  To: Yi Liu, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com


On 02/07/2024 15:33, Yi Liu wrote:
> Caution: External email. Do not open attachments or click links, 
> unless this email comes from a known sender and you know the content 
> is safe.
>
>
> On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>
>> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   hw/i386/intel_iommu.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 98996ededc..71cebe2fd3 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -3500,6 +3500,11 @@ static bool 
>> vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
>>       } else if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
>>           /* Interrupt flag */
>>           vtd_generate_completion_event(s);
>> +    } else if (inv_desc->lo & VTD_INV_DESC_WAIT_FN) {
>> +        /*
>> +         * SW = 0, IF = 0, FN = 1
>> +         * Nothing to do as we process the events sequentially
>> +         */
>
> This code looks a bit weird. SW field does not co-exist with IF. But 
> either
> SW or IF can co-exist with FN flag. Is it? Have you already seen a wait
> descriptor that only has FN flag set but no SW nor IF flag?
Yes, my test suite triggers that condition
>
>>       } else {
>>           error_report_once("%s: invalid wait desc: hi=%"PRIx64", 
>> lo=%"PRIx64
>>                             " (unknown type)", __func__, inv_desc->hi,
>
> -- 
> Regards,
> Yi Liu

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 04/22] intel_iommu: do not consider wait_desc as an invalid descriptor
  2024-07-02 15:29     ` CLEMENT MATHIEU--DRIF
@ 2024-07-02 15:40       ` cmd
  2024-07-03  7:29       ` Yi Liu
  1 sibling, 0 replies; 61+ messages in thread
From: cmd @ 2024-07-02 15:40 UTC (permalink / raw)
  To: qemu-devel


On 02/07/2024 17:29, CLEMENT MATHIEU--DRIF wrote:
> On 02/07/2024 15:33, Yi Liu wrote:
>> Caution: External email. Do not open attachments or click links,
>> unless this email comes from a known sender and you know the content
>> is safe.
>>
>>
>> On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
>>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>>
>>> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>>    hw/i386/intel_iommu.c | 5 +++++
>>>    1 file changed, 5 insertions(+)
>>>
>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>> index 98996ededc..71cebe2fd3 100644
>>> --- a/hw/i386/intel_iommu.c
>>> +++ b/hw/i386/intel_iommu.c
>>> @@ -3500,6 +3500,11 @@ static bool
>>> vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
>>>        } else if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
>>>            /* Interrupt flag */
>>>            vtd_generate_completion_event(s);
>>> +    } else if (inv_desc->lo & VTD_INV_DESC_WAIT_FN) {
>>> +        /*
>>> +         * SW = 0, IF = 0, FN = 1
>>> +         * Nothing to do as we process the events sequentially
>>> +         */
>> This code looks a bit weird. SW field does not co-exist with IF. But
>> either
>> SW or IF can co-exist with FN flag. Is it? Have you already seen a wait
>> descriptor that only has FN flag set but no SW nor IF flag?
> Yes, my test suite triggers that condition
I think it comes from the kernel function intel_drain_pasid_prq 
(https://elixir.bootlin.com/linux/latest/source/drivers/iommu/intel/svm.c#L467)
>>>        } else {
>>>            error_report_once("%s: invalid wait desc: hi=%"PRIx64",
>>> lo=%"PRIx64
>>>                              " (unknown type)", __func__, inv_desc->hi,
>> -- 
>> Regards,
>> Yi Liu


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 04/22] intel_iommu: do not consider wait_desc as an invalid descriptor
  2024-07-02 15:29     ` CLEMENT MATHIEU--DRIF
  2024-07-02 15:40       ` cmd
@ 2024-07-03  7:29       ` Yi Liu
  2024-07-03  8:28         ` cmd
  2024-07-04  4:23         ` CLEMENT MATHIEU--DRIF
  1 sibling, 2 replies; 61+ messages in thread
From: Yi Liu @ 2024-07-03  7:29 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com

On 2024/7/2 23:29, CLEMENT MATHIEU--DRIF wrote:
> 
> On 02/07/2024 15:33, Yi Liu wrote:
>> Caution: External email. Do not open attachments or click links,
>> unless this email comes from a known sender and you know the content
>> is safe.
>>
>>
>> On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
>>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>>
>>> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>>    hw/i386/intel_iommu.c | 5 +++++
>>>    1 file changed, 5 insertions(+)
>>>
>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>> index 98996ededc..71cebe2fd3 100644
>>> --- a/hw/i386/intel_iommu.c
>>> +++ b/hw/i386/intel_iommu.c
>>> @@ -3500,6 +3500,11 @@ static bool
>>> vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
>>>        } else if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
>>>            /* Interrupt flag */
>>>            vtd_generate_completion_event(s);
>>> +    } else if (inv_desc->lo & VTD_INV_DESC_WAIT_FN) {
>>> +        /*
>>> +         * SW = 0, IF = 0, FN = 1
>>> +         * Nothing to do as we process the events sequentially
>>> +         */
>>
>> This code looks a bit weird. SW field does not co-exist with IF. But
>> either
>> SW or IF can co-exist with FN flag. Is it? Have you already seen a wait
>> descriptor that only has FN flag set but no SW nor IF flag?
> Yes, my test suite triggers that condition

I see. Spec indeed has such usage. Please add a comment for it.
Since it does not need a response, so QEMU can just bypass it. Also
please adjust the subject a bit. It's misleading. Perhaps

"intel_iommu: Bypass barrier wait descriptor"

Spec CH 7.10
a. Submit Invalidation Wait Descriptor (inv_wait_dsc) with Fence flag 
(FN=1) Set to Invalidation
Queue. This ensures that all requests submitted to the Invalidation Queue 
ahead of this wait
descriptor are processed and completed by remapping hardware before 
processing requests
after the Invalidation Wait Descriptor. It is not required to specify SW 
flag (or IF flag) in this
descriptor or for software to wait on its completion, as its function is to 
only act as a barrier.

>>
>>>        } else {
>>>            error_report_once("%s: invalid wait desc: hi=%"PRIx64",
>>> lo=%"PRIx64
>>>                              " (unknown type)", __func__, inv_desc->hi,
>>
>> -- 
>> Regards,
>> Yi Liu

-- 
Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 04/22] intel_iommu: do not consider wait_desc as an invalid descriptor
  2024-07-03  7:29       ` Yi Liu
@ 2024-07-03  8:28         ` cmd
  2024-07-04  4:23         ` CLEMENT MATHIEU--DRIF
  1 sibling, 0 replies; 61+ messages in thread
From: cmd @ 2024-07-03  8:28 UTC (permalink / raw)
  To: Yi Liu, CLEMENT MATHIEU--DRIF, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com


On 03/07/2024 09:29, Yi Liu wrote:
> On 2024/7/2 23:29, CLEMENT MATHIEU--DRIF wrote:
>>
>> On 02/07/2024 15:33, Yi Liu wrote:
>>> Caution: External email. Do not open attachments or click links,
>>> unless this email comes from a known sender and you know the content
>>> is safe.
>>>
>>>
>>> On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
>>>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>>>
>>>> Signed-off-by: Clément Mathieu--Drif 
>>>> <clement.mathieu--drif@eviden.com>
>>>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> ---
>>>>    hw/i386/intel_iommu.c | 5 +++++
>>>>    1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>> index 98996ededc..71cebe2fd3 100644
>>>> --- a/hw/i386/intel_iommu.c
>>>> +++ b/hw/i386/intel_iommu.c
>>>> @@ -3500,6 +3500,11 @@ static bool
>>>> vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
>>>>        } else if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
>>>>            /* Interrupt flag */
>>>>            vtd_generate_completion_event(s);
>>>> +    } else if (inv_desc->lo & VTD_INV_DESC_WAIT_FN) {
>>>> +        /*
>>>> +         * SW = 0, IF = 0, FN = 1
>>>> +         * Nothing to do as we process the events sequentially
>>>> +         */
>>>
>>> This code looks a bit weird. SW field does not co-exist with IF. But
>>> either
>>> SW or IF can co-exist with FN flag. Is it? Have you already seen a wait
>>> descriptor that only has FN flag set but no SW nor IF flag?
>> Yes, my test suite triggers that condition
>
> I see. Spec indeed has such usage. Please add a comment for it.
> Since it does not need a response, so QEMU can just bypass it. Also
> please adjust the subject a bit. It's misleading. Perhaps
>
> "intel_iommu: Bypass barrier wait descriptor"
Fine, will do
>
> Spec CH 7.10
> a. Submit Invalidation Wait Descriptor (inv_wait_dsc) with Fence flag 
> (FN=1) Set to Invalidation
> Queue. This ensures that all requests submitted to the Invalidation 
> Queue ahead of this wait
> descriptor are processed and completed by remapping hardware before 
> processing requests
> after the Invalidation Wait Descriptor. It is not required to specify 
> SW flag (or IF flag) in this
> descriptor or for software to wait on its completion, as its function 
> is to only act as a barrier.
>
>>>
>>>>        } else {
>>>>            error_report_once("%s: invalid wait desc: hi=%"PRIx64",
>>>> lo=%"PRIx64
>>>>                              " (unknown type)", __func__, 
>>>> inv_desc->hi,
>>>
>>> -- 
>>> Regards,
>>> Yi Liu
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 03/22] intel_iommu: return page walk level even when the translation fails
  2024-07-02  5:52 ` [PATCH ats_vtd v5 03/22] intel_iommu: return page walk level even when the translation fails CLEMENT MATHIEU--DRIF
@ 2024-07-03 11:59   ` Yi Liu
  2024-07-04  4:23     ` CLEMENT MATHIEU--DRIF
  0 siblings, 1 reply; 61+ messages in thread
From: Yi Liu @ 2024-07-03 11:59 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com

On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> 
> We use this information in vtd_do_iommu_translate to populate the
> IOMMUTLBEntry and indicate the correct page mask. This prevents ATS
> devices from sending many useless translation requests when a megapage
> or gigapage iova is not mapped to a physical address.

you may move this patch prior to "[PATCH ats_vtd v5 22/22] intel_iommu: add 
support for ATS" or just merge to it since it's the "user" of this commit.

> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> ---
>   hw/i386/intel_iommu.c | 15 +++++++--------
>   1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index c6474ae735..98996ededc 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2096,9 +2096,9 @@ static int vtd_iova_to_flpte(IntelIOMMUState *s, VTDContextEntry *ce,
>                                uint32_t pasid)
>   {
>       dma_addr_t addr = vtd_get_iova_pgtbl_base(s, ce, pasid);
> -    uint32_t level = vtd_get_iova_level(s, ce, pasid);
>       uint32_t offset;
>       uint64_t flpte;
> +    *flpte_level = vtd_get_iova_level(s, ce, pasid);
>   
>       if (!vtd_iova_fl_check_canonical(s, iova, ce, pasid)) {
>           error_report_once("%s: detected non canonical IOVA (iova=0x%" PRIx64 ","
> @@ -2107,11 +2107,11 @@ static int vtd_iova_to_flpte(IntelIOMMUState *s, VTDContextEntry *ce,
>       }
>   
>       while (true) {
> -        offset = vtd_iova_level_offset(iova, level);
> +        offset = vtd_iova_level_offset(iova, *flpte_level);
>           flpte = vtd_get_pte(addr, offset);
>   
>           if (flpte == (uint64_t)-1) {
> -            if (level == vtd_get_iova_level(s, ce, pasid)) {
> +            if (*flpte_level == vtd_get_iova_level(s, ce, pasid)) {
>                   /* Invalid programming of context-entry */
>                   return -VTD_FR_CONTEXT_ENTRY_INV;
>               } else {
> @@ -2128,11 +2128,11 @@ static int vtd_iova_to_flpte(IntelIOMMUState *s, VTDContextEntry *ce,
>           if (is_write && !(flpte & VTD_FL_RW_MASK)) {
>               return -VTD_FR_WRITE;
>           }
> -        if (vtd_flpte_nonzero_rsvd(flpte, level)) {
> +        if (vtd_flpte_nonzero_rsvd(flpte, *flpte_level)) {
>               error_report_once("%s: detected flpte reserved non-zero "
>                                 "iova=0x%" PRIx64 ", level=0x%" PRIx32
>                                 "flpte=0x%" PRIx64 ", pasid=0x%" PRIX32 ")",
> -                              __func__, iova, level, flpte, pasid);
> +                              __func__, iova, *flpte_level, flpte, pasid);
>               return -VTD_FR_PAGING_ENTRY_RSVD;
>           }
>   
> @@ -2140,19 +2140,18 @@ static int vtd_iova_to_flpte(IntelIOMMUState *s, VTDContextEntry *ce,
>               return -VTD_FR_FS_BIT_UPDATE_FAILED;
>           }
>   
> -        if (vtd_is_last_pte(flpte, level)) {
> +        if (vtd_is_last_pte(flpte, *flpte_level)) {
>               if (is_write &&
>                   (vtd_set_flag_in_pte(addr, offset, flpte, VTD_FL_D) !=
>                                                                       MEMTX_OK)) {
>                       return -VTD_FR_FS_BIT_UPDATE_FAILED;
>               }
>               *flptep = flpte;
> -            *flpte_level = level;
>               return 0;
>           }
>   
>           addr = vtd_get_pte_addr(flpte, aw_bits);
> -        level--;
> +        (*flpte_level)--;
>       }
>   }
>   

-- 
Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 06/22] pcie: add helper to declare PASID capability for a pcie device
  2024-07-02  5:52 ` [PATCH ats_vtd v5 06/22] pcie: add helper to declare PASID capability for a pcie device CLEMENT MATHIEU--DRIF
@ 2024-07-03 12:04   ` Yi Liu
  2024-07-04  4:25     ` CLEMENT MATHIEU--DRIF
  0 siblings, 1 reply; 61+ messages in thread
From: Yi Liu @ 2024-07-03 12:04 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com

On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> 
> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> ---
>   hw/pci/pcie.c                             | 24 +++++++++++++++++++++++
>   include/hw/pci/pcie.h                     |  6 +++++-
>   include/hw/pci/pcie_regs.h                |  3 +++
>   include/standard-headers/linux/pci_regs.h |  1 +
>   4 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 4b2f0805c6..d6a052b616 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -1177,3 +1177,27 @@ void pcie_acs_reset(PCIDevice *dev)
>           pci_set_word(dev->config + dev->exp.acs_cap + PCI_ACS_CTRL, 0);
>       }
>   }
> +
> +/* PASID */
> +void pcie_pasid_init(PCIDevice *dev, uint16_t offset, uint8_t pasid_width,
> +                     bool exec_perm, bool priv_mod)
> +{
> +    assert(pasid_width <= PCI_EXT_CAP_PASID_MAX_WIDTH);
> +    static const uint16_t control_reg_rw_mask = 0x07;
> +    uint16_t capability_reg = pasid_width;
> +
> +    pcie_add_capability(dev, PCI_EXT_CAP_ID_PASID, PCI_PASID_VER, offset,
> +                        PCI_EXT_CAP_PASID_SIZEOF);
> +
> +    capability_reg <<= PCI_PASID_CAP_WIDTH_SHIFT;
> +    capability_reg |= exec_perm ? PCI_PASID_CAP_EXEC : 0;
> +    capability_reg |= priv_mod  ? PCI_PASID_CAP_PRIV : 0;
> +    pci_set_word(dev->config + offset + PCI_PASID_CAP, capability_reg);
> +
> +    /* Everything is disabled by default */
> +    pci_set_word(dev->config + offset + PCI_PASID_CTRL, 0);
> +
> +    pci_set_word(dev->wmask + offset + PCI_PASID_CTRL, control_reg_rw_mask);
> +
> +    dev->exp.pasid_cap = offset;
> +}

seems no user of this helper in this series. If yes, you may drop this
patch and include it when there is a caller of it.

> diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> index 5eddb90976..b870958c99 100644
> --- a/include/hw/pci/pcie.h
> +++ b/include/hw/pci/pcie.h
> @@ -72,8 +72,9 @@ struct PCIExpressDevice {
>       uint16_t aer_cap;
>       PCIEAERLog aer_log;
>   
> -    /* Offset of ATS capability in config space */
> +    /* Offset of ATS and PASID capabilities in config space */
>       uint16_t ats_cap;
> +    uint16_t pasid_cap;
>   
>       /* ACS */
>       uint16_t acs_cap;
> @@ -150,4 +151,7 @@ void pcie_cap_slot_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
>                                Error **errp);
>   void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev,
>                                        DeviceState *dev, Error **errp);
> +
> +void pcie_pasid_init(PCIDevice *dev, uint16_t offset, uint8_t pasid_width,
> +                     bool exec_perm, bool priv_mod);
>   #endif /* QEMU_PCIE_H */
> diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
> index 9d3b6868dc..0a86598f80 100644
> --- a/include/hw/pci/pcie_regs.h
> +++ b/include/hw/pci/pcie_regs.h
> @@ -86,6 +86,9 @@ typedef enum PCIExpLinkWidth {
>   #define PCI_ARI_VER                     1
>   #define PCI_ARI_SIZEOF                  8
>   
> +/* PASID */
> +#define PCI_PASID_VER                   1
> +#define PCI_EXT_CAP_PASID_MAX_WIDTH     20
>   /* AER */
>   #define PCI_ERR_VER                     2
>   #define PCI_ERR_SIZEOF                  0x48
> diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
> index a39193213f..406dce8e82 100644
> --- a/include/standard-headers/linux/pci_regs.h
> +++ b/include/standard-headers/linux/pci_regs.h
> @@ -935,6 +935,7 @@
>   #define  PCI_PASID_CAP_EXEC	0x0002	/* Exec permissions Supported */
>   #define  PCI_PASID_CAP_PRIV	0x0004	/* Privilege Mode Supported */
>   #define  PCI_PASID_CAP_WIDTH	0x1f00
> +#define  PCI_PASID_CAP_WIDTH_SHIFT  8
>   #define PCI_PASID_CTRL		0x06    /* PASID control register */
>   #define  PCI_PASID_CTRL_ENABLE	0x0001	/* Enable bit */
>   #define  PCI_PASID_CTRL_EXEC	0x0002	/* Exec permissions Enable */

-- 
Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 19/22] memory: add an API for ATS support
  2024-07-02  5:52 ` [PATCH ats_vtd v5 19/22] memory: add an API for ATS support CLEMENT MATHIEU--DRIF
@ 2024-07-03 12:14   ` Yi Liu
  2024-07-04  4:30     ` CLEMENT MATHIEU--DRIF
  0 siblings, 1 reply; 61+ messages in thread
From: Yi Liu @ 2024-07-03 12:14 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com

On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> 
> IOMMU have to implement iommu_ats_request_translation to support ATS.
> 
> Devices can use IOMMU_TLB_ENTRY_TRANSLATION_ERROR to check the tlb
> entries returned by a translation request.
> 
> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> ---
>   include/exec/memory.h | 26 ++++++++++++++++++++++++++
>   system/memory.c       | 20 ++++++++++++++++++++
>   2 files changed, 46 insertions(+)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 003ee06610..48555c87c6 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -148,6 +148,10 @@ struct IOMMUTLBEntry {
>       uint32_t         pasid;
>   };
>   
> +/* Check if an IOMMU TLB entry indicates a translation error */
> +#define IOMMU_TLB_ENTRY_TRANSLATION_ERROR(entry) ((((entry)->perm) & IOMMU_RW) \
> +                                                    == IOMMU_NONE)
> +
>   /*
>    * Bitmap for different IOMMUNotifier capabilities. Each notifier can
>    * register with one or multiple IOMMU Notifier capability bit(s).
> @@ -571,6 +575,20 @@ struct IOMMUMemoryRegionClass {
>        int (*iommu_set_iova_ranges)(IOMMUMemoryRegion *iommu,
>                                     GList *iova_ranges,
>                                     Error **errp);
> +
> +    /**
> +     * @iommu_ats_request_translation:
> +     * This method must be implemented if the IOMMU has ATS enabled
> +     *
> +     * @see pci_ats_request_translation_pasid
> +     */
> +    ssize_t (*iommu_ats_request_translation)(IOMMUMemoryRegion *iommu,
> +                                             bool priv_req, bool exec_req,
> +                                             hwaddr addr, size_t length,
> +                                             bool no_write,
> +                                             IOMMUTLBEntry *result,
> +                                             size_t result_length,
> +                                             uint32_t *err_count);
>   };
>   

I'm not quite understanding why the existing translate() does not work.
Could you elaborate?

>   typedef struct RamDiscardListener RamDiscardListener;
> @@ -1926,6 +1944,14 @@ void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n);
>   void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
>                                                IOMMUNotifier *n);
>   
> +ssize_t memory_region_iommu_ats_request_translation(IOMMUMemoryRegion *iommu_mr,
> +                                                bool priv_req, bool exec_req,
> +                                                hwaddr addr, size_t length,
> +                                                bool no_write,
> +                                                IOMMUTLBEntry *result,
> +                                                size_t result_length,
> +                                                uint32_t *err_count);
> +
>   /**
>    * memory_region_iommu_get_attr: return an IOMMU attr if get_attr() is
>    * defined on the IOMMU.
> diff --git a/system/memory.c b/system/memory.c
> index 74cd73ebc7..8268df7bf5 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -2005,6 +2005,26 @@ void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
>       memory_region_update_iommu_notify_flags(iommu_mr, NULL);
>   }
>   
> +ssize_t memory_region_iommu_ats_request_translation(IOMMUMemoryRegion *iommu_mr,
> +                                                    bool priv_req,
> +                                                    bool exec_req,
> +                                                    hwaddr addr, size_t length,
> +                                                    bool no_write,
> +                                                    IOMMUTLBEntry *result,
> +                                                    size_t result_length,
> +                                                    uint32_t *err_count)
> +{
> +    IOMMUMemoryRegionClass *imrc = memory_region_get_iommu_class_nocheck(iommu_mr);
> +
> +    if (!imrc->iommu_ats_request_translation) {
> +        return -ENODEV;
> +    }
> +
> +    return imrc->iommu_ats_request_translation(iommu_mr, priv_req, exec_req,
> +                                               addr, length, no_write, result,
> +                                               result_length, err_count);
> +}
> +
>   void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
>                                       IOMMUTLBEvent *event)
>   {

-- 
Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
  2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
                   ` (23 preceding siblings ...)
  2024-07-02 13:44 ` Yi Liu
@ 2024-07-03 12:32 ` Yi Liu
  2024-07-04  4:36   ` CLEMENT MATHIEU--DRIF
  24 siblings, 1 reply; 61+ messages in thread
From: Yi Liu @ 2024-07-03 12:32 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com, Clement Mathieu--Drif

Hi CMD,

I've went through the series. Some general suggestions on the series.

1) Patch 01, 02, 04 can be sent separately as they are fixes.
2) This series mixed the ATS and PASID capability a bit. Actually,
    they don't have dependency. I'd suggest you split the series into
       - support ATS for the requests without PASID
       - support ATS for requests with PASID
    The second part should be an incremental change based on the first
    part. If you can make use of the existing translate() callback, then
    it is possible to remove the dependency on Zhenzhong's stage-1 series.
3) Some commits do not have commit message. It would be good to have
    it.
4) Some helpers look to be used by device model, if possible, it's better
    to submit them with a demo device.
5) A design description in the cover-letter would be helpful.

On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
> From: Clement Mathieu--Drif <cmdetu@gmail.com>
> 
> This series belongs to a list of series that add SVM support for VT-d.
> 
> As a starting point, we use the series called 'intel_iommu: Enable stage-1 translation' (rfc2) by Zhenzhong Duan and Yi Liu.
> 
> Here we focus on the implementation of ATS support in the IOMMU and on a PCI-level
> API for ATS to be used by virtual devices.
> 
> This work is based on the VT-d specification version 4.1 (March 2023).
> Here is a link to a GitHub repository where you can find the following elements :
>      - Qemu with all the patches for SVM
>          - ATS
>          - PRI
>          - Device IOTLB invalidations
>          - Requests with already translated addresses
>      - A demo device
>      - A simple driver for the demo device
>      - A userspace program (for testing and demonstration purposes)
> 
> https://github.com/BullSequana/Qemu-in-guest-SVM-demo
> 
> v2
>      - handle huge pages better by detecting the page table level at which the translation errors occur
>      - Changes after review by ZhenZhong Duan :
>      	- Set the access bit after checking permissions
>      	- helper for PASID and ATS : make the commit message more accurate ('present' replaced with 'enabled')
>      	- pcie_pasid_init: add PCI_PASID_CAP_WIDTH_SHIFT and use it instead of PCI_EXT_CAP_PASID_SIZEOF for shifting the pasid width when preparing the capability register
>      	- pci: do not check pci_bus_bypass_iommu after calling pci_device_get_iommu_bus_devfn
>      	- do not alter formatting of IOMMUTLBEntry declaration
>      	- vtd_iova_fl_check_canonical : directly use s->aw_bits instead of aw for the sake of clarity
> 
> v3
>      - rebase on new version of Zhenzhong's flts implementation
>      - fix the atc lookup operation (check the mask before returning an entry)
>      - add a unit test for the ATC
>      - store a user pointer in the iommu notifiers to simplify the implementation of svm devices
>      Changes after review by Zhenzhong :
>      	- store the input pasid instead of rid2pasid when returning an entry after a translation
>      	- split the ATC implementation and its unit tests
> 
> v4
>      Changes after internal review
>      	- Fix the nowrite optimization, an ATS translation without the nowrite flag should not fail when the write permission is not set

It's strange to list internal review here.

> v5
>      Changes after review by Philippe :
>      	- change the type of 'level' to unsigned in vtd_lookup_iotlb

list change log from latest to the earliest would be nice too. Look forward
to your next version. :)

Regards,
Yi Liu

> Clément Mathieu--Drif (22):
>    intel_iommu: fix FRCD construction macro.
>    intel_iommu: make types match
>    intel_iommu: return page walk level even when the translation fails
>    intel_iommu: do not consider wait_desc as an invalid descriptor
>    memory: add permissions in IOMMUAccessFlags
>    pcie: add helper to declare PASID capability for a pcie device
>    pcie: helper functions to check if PASID and ATS are enabled
>    intel_iommu: declare supported PASID size
>    pci: cache the bus mastering status in the device
>    pci: add IOMMU operations to get address spaces and memory regions
>      with PASID
>    memory: store user data pointer in the IOMMU notifiers
>    pci: add a pci-level initialization function for iommu notifiers
>    intel_iommu: implement the get_address_space_pasid iommu operation
>    intel_iommu: implement the get_memory_region_pasid iommu operation
>    memory: Allow to store the PASID in IOMMUTLBEntry
>    intel_iommu: fill the PASID field when creating an instance of
>      IOMMUTLBEntry
>    atc: generic ATC that can be used by PCIe devices that support SVM
>    atc: add unit tests
>    memory: add an API for ATS support
>    pci: add a pci-level API for ATS
>    intel_iommu: set the address mask even when a translation fails
>    intel_iommu: add support for ATS
> 
>   hw/i386/intel_iommu.c                     | 146 +++++-
>   hw/i386/intel_iommu_internal.h            |   6 +-
>   hw/pci/pci.c                              | 127 +++++-
>   hw/pci/pcie.c                             |  42 ++
>   include/exec/memory.h                     |  51 ++-
>   include/hw/i386/intel_iommu.h             |   2 +-
>   include/hw/pci/pci.h                      | 101 +++++
>   include/hw/pci/pci_device.h               |   1 +
>   include/hw/pci/pcie.h                     |   9 +-
>   include/hw/pci/pcie_regs.h                |   3 +
>   include/standard-headers/linux/pci_regs.h |   1 +
>   system/memory.c                           |  20 +
>   tests/unit/meson.build                    |   1 +
>   tests/unit/test-atc.c                     | 527 ++++++++++++++++++++++
>   util/atc.c                                | 211 +++++++++
>   util/atc.h                                | 117 +++++
>   util/meson.build                          |   1 +
>   17 files changed, 1332 insertions(+), 34 deletions(-)
>   create mode 100644 tests/unit/test-atc.c
>   create mode 100644 util/atc.c
>   create mode 100644 util/atc.h
> 




^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 03/22] intel_iommu: return page walk level even when the translation fails
  2024-07-03 11:59   ` Yi Liu
@ 2024-07-04  4:23     ` CLEMENT MATHIEU--DRIF
  0 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-04  4:23 UTC (permalink / raw)
  To: Yi Liu, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com


On 03/07/2024 13:59, Yi Liu wrote:
> Caution: External email. Do not open attachments or click links, 
> unless this email comes from a known sender and you know the content 
> is safe.
>
>
> On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>
>> We use this information in vtd_do_iommu_translate to populate the
>> IOMMUTLBEntry and indicate the correct page mask. This prevents ATS
>> devices from sending many useless translation requests when a megapage
>> or gigapage iova is not mapped to a physical address.
>
> you may move this patch prior to "[PATCH ats_vtd v5 22/22] 
> intel_iommu: add
> support for ATS" or just merge to it since it's the "user" of this 
> commit.
will do
>
>> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>> ---
>>   hw/i386/intel_iommu.c | 15 +++++++--------
>>   1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index c6474ae735..98996ededc 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -2096,9 +2096,9 @@ static int vtd_iova_to_flpte(IntelIOMMUState 
>> *s, VTDContextEntry *ce,
>>                                uint32_t pasid)
>>   {
>>       dma_addr_t addr = vtd_get_iova_pgtbl_base(s, ce, pasid);
>> -    uint32_t level = vtd_get_iova_level(s, ce, pasid);
>>       uint32_t offset;
>>       uint64_t flpte;
>> +    *flpte_level = vtd_get_iova_level(s, ce, pasid);
>>
>>       if (!vtd_iova_fl_check_canonical(s, iova, ce, pasid)) {
>>           error_report_once("%s: detected non canonical IOVA 
>> (iova=0x%" PRIx64 ","
>> @@ -2107,11 +2107,11 @@ static int vtd_iova_to_flpte(IntelIOMMUState 
>> *s, VTDContextEntry *ce,
>>       }
>>
>>       while (true) {
>> -        offset = vtd_iova_level_offset(iova, level);
>> +        offset = vtd_iova_level_offset(iova, *flpte_level);
>>           flpte = vtd_get_pte(addr, offset);
>>
>>           if (flpte == (uint64_t)-1) {
>> -            if (level == vtd_get_iova_level(s, ce, pasid)) {
>> +            if (*flpte_level == vtd_get_iova_level(s, ce, pasid)) {
>>                   /* Invalid programming of context-entry */
>>                   return -VTD_FR_CONTEXT_ENTRY_INV;
>>               } else {
>> @@ -2128,11 +2128,11 @@ static int vtd_iova_to_flpte(IntelIOMMUState 
>> *s, VTDContextEntry *ce,
>>           if (is_write && !(flpte & VTD_FL_RW_MASK)) {
>>               return -VTD_FR_WRITE;
>>           }
>> -        if (vtd_flpte_nonzero_rsvd(flpte, level)) {
>> +        if (vtd_flpte_nonzero_rsvd(flpte, *flpte_level)) {
>>               error_report_once("%s: detected flpte reserved non-zero "
>>                                 "iova=0x%" PRIx64 ", level=0x%" PRIx32
>>                                 "flpte=0x%" PRIx64 ", pasid=0x%" 
>> PRIX32 ")",
>> -                              __func__, iova, level, flpte, pasid);
>> +                              __func__, iova, *flpte_level, flpte, 
>> pasid);
>>               return -VTD_FR_PAGING_ENTRY_RSVD;
>>           }
>>
>> @@ -2140,19 +2140,18 @@ static int vtd_iova_to_flpte(IntelIOMMUState 
>> *s, VTDContextEntry *ce,
>>               return -VTD_FR_FS_BIT_UPDATE_FAILED;
>>           }
>>
>> -        if (vtd_is_last_pte(flpte, level)) {
>> +        if (vtd_is_last_pte(flpte, *flpte_level)) {
>>               if (is_write &&
>>                   (vtd_set_flag_in_pte(addr, offset, flpte, VTD_FL_D) !=
>> MEMTX_OK)) {
>>                       return -VTD_FR_FS_BIT_UPDATE_FAILED;
>>               }
>>               *flptep = flpte;
>> -            *flpte_level = level;
>>               return 0;
>>           }
>>
>>           addr = vtd_get_pte_addr(flpte, aw_bits);
>> -        level--;
>> +        (*flpte_level)--;
>>       }
>>   }
>>
>
> -- 
> Regards,
> Yi Liu

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 04/22] intel_iommu: do not consider wait_desc as an invalid descriptor
  2024-07-03  7:29       ` Yi Liu
  2024-07-03  8:28         ` cmd
@ 2024-07-04  4:23         ` CLEMENT MATHIEU--DRIF
  1 sibling, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-04  4:23 UTC (permalink / raw)
  To: Yi Liu, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com


On 03/07/2024 09:29, Yi Liu wrote:
> Caution: External email. Do not open attachments or click links, 
> unless this email comes from a known sender and you know the content 
> is safe.
>
>
> On 2024/7/2 23:29, CLEMENT MATHIEU--DRIF wrote:
>>
>> On 02/07/2024 15:33, Yi Liu wrote:
>>> Caution: External email. Do not open attachments or click links,
>>> unless this email comes from a known sender and you know the content
>>> is safe.
>>>
>>>
>>> On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
>>>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>>>
>>>> Signed-off-by: Clément Mathieu--Drif 
>>>> <clement.mathieu--drif@eviden.com>
>>>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> ---
>>>>    hw/i386/intel_iommu.c | 5 +++++
>>>>    1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>> index 98996ededc..71cebe2fd3 100644
>>>> --- a/hw/i386/intel_iommu.c
>>>> +++ b/hw/i386/intel_iommu.c
>>>> @@ -3500,6 +3500,11 @@ static bool
>>>> vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
>>>>        } else if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
>>>>            /* Interrupt flag */
>>>>            vtd_generate_completion_event(s);
>>>> +    } else if (inv_desc->lo & VTD_INV_DESC_WAIT_FN) {
>>>> +        /*
>>>> +         * SW = 0, IF = 0, FN = 1
>>>> +         * Nothing to do as we process the events sequentially
>>>> +         */
>>>
>>> This code looks a bit weird. SW field does not co-exist with IF. But
>>> either
>>> SW or IF can co-exist with FN flag. Is it? Have you already seen a wait
>>> descriptor that only has FN flag set but no SW nor IF flag?
>> Yes, my test suite triggers that condition
>
> I see. Spec indeed has such usage. Please add a comment for it.
> Since it does not need a response, so QEMU can just bypass it. Also
> please adjust the subject a bit. It's misleading. Perhaps
>
> "intel_iommu: Bypass barrier wait descriptor"
good idea, will do
>
> Spec CH 7.10
> a. Submit Invalidation Wait Descriptor (inv_wait_dsc) with Fence flag
> (FN=1) Set to Invalidation
> Queue. This ensures that all requests submitted to the Invalidation Queue
> ahead of this wait
> descriptor are processed and completed by remapping hardware before
> processing requests
> after the Invalidation Wait Descriptor. It is not required to specify SW
> flag (or IF flag) in this
> descriptor or for software to wait on its completion, as its function 
> is to
> only act as a barrier.
>
>>>
>>>>        } else {
>>>>            error_report_once("%s: invalid wait desc: hi=%"PRIx64",
>>>> lo=%"PRIx64
>>>>                              " (unknown type)", __func__, 
>>>> inv_desc->hi,
>>>
>>> -- 
>>> Regards,
>>> Yi Liu
>
> -- 
> Regards,
> Yi Liu

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 06/22] pcie: add helper to declare PASID capability for a pcie device
  2024-07-03 12:04   ` Yi Liu
@ 2024-07-04  4:25     ` CLEMENT MATHIEU--DRIF
  0 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-04  4:25 UTC (permalink / raw)
  To: Yi Liu, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com


On 03/07/2024 14:04, Yi Liu wrote:
> Caution: External email. Do not open attachments or click links, 
> unless this email comes from a known sender and you know the content 
> is safe.
>
>
> On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>
>> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>> ---
>>   hw/pci/pcie.c                             | 24 +++++++++++++++++++++++
>>   include/hw/pci/pcie.h                     |  6 +++++-
>>   include/hw/pci/pcie_regs.h                |  3 +++
>>   include/standard-headers/linux/pci_regs.h |  1 +
>>   4 files changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
>> index 4b2f0805c6..d6a052b616 100644
>> --- a/hw/pci/pcie.c
>> +++ b/hw/pci/pcie.c
>> @@ -1177,3 +1177,27 @@ void pcie_acs_reset(PCIDevice *dev)
>>           pci_set_word(dev->config + dev->exp.acs_cap + PCI_ACS_CTRL, 
>> 0);
>>       }
>>   }
>> +
>> +/* PASID */
>> +void pcie_pasid_init(PCIDevice *dev, uint16_t offset, uint8_t 
>> pasid_width,
>> +                     bool exec_perm, bool priv_mod)
>> +{
>> +    assert(pasid_width <= PCI_EXT_CAP_PASID_MAX_WIDTH);
>> +    static const uint16_t control_reg_rw_mask = 0x07;
>> +    uint16_t capability_reg = pasid_width;
>> +
>> +    pcie_add_capability(dev, PCI_EXT_CAP_ID_PASID, PCI_PASID_VER, 
>> offset,
>> +                        PCI_EXT_CAP_PASID_SIZEOF);
>> +
>> +    capability_reg <<= PCI_PASID_CAP_WIDTH_SHIFT;
>> +    capability_reg |= exec_perm ? PCI_PASID_CAP_EXEC : 0;
>> +    capability_reg |= priv_mod  ? PCI_PASID_CAP_PRIV : 0;
>> +    pci_set_word(dev->config + offset + PCI_PASID_CAP, capability_reg);
>> +
>> +    /* Everything is disabled by default */
>> +    pci_set_word(dev->config + offset + PCI_PASID_CTRL, 0);
>> +
>> +    pci_set_word(dev->wmask + offset + PCI_PASID_CTRL, 
>> control_reg_rw_mask);
>> +
>> +    dev->exp.pasid_cap = offset;
>> +}
>
> seems no user of this helper in this series. If yes, you may drop this
> patch and include it when there is a caller of it.
You are right, I will move it to the series that implements the SVM demo 
device
>
>> diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
>> index 5eddb90976..b870958c99 100644
>> --- a/include/hw/pci/pcie.h
>> +++ b/include/hw/pci/pcie.h
>> @@ -72,8 +72,9 @@ struct PCIExpressDevice {
>>       uint16_t aer_cap;
>>       PCIEAERLog aer_log;
>>
>> -    /* Offset of ATS capability in config space */
>> +    /* Offset of ATS and PASID capabilities in config space */
>>       uint16_t ats_cap;
>> +    uint16_t pasid_cap;
>>
>>       /* ACS */
>>       uint16_t acs_cap;
>> @@ -150,4 +151,7 @@ void pcie_cap_slot_unplug_cb(HotplugHandler 
>> *hotplug_dev, DeviceState *dev,
>>                                Error **errp);
>>   void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev,
>>                                        DeviceState *dev, Error **errp);
>> +
>> +void pcie_pasid_init(PCIDevice *dev, uint16_t offset, uint8_t 
>> pasid_width,
>> +                     bool exec_perm, bool priv_mod);
>>   #endif /* QEMU_PCIE_H */
>> diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
>> index 9d3b6868dc..0a86598f80 100644
>> --- a/include/hw/pci/pcie_regs.h
>> +++ b/include/hw/pci/pcie_regs.h
>> @@ -86,6 +86,9 @@ typedef enum PCIExpLinkWidth {
>>   #define PCI_ARI_VER                     1
>>   #define PCI_ARI_SIZEOF                  8
>>
>> +/* PASID */
>> +#define PCI_PASID_VER                   1
>> +#define PCI_EXT_CAP_PASID_MAX_WIDTH     20
>>   /* AER */
>>   #define PCI_ERR_VER                     2
>>   #define PCI_ERR_SIZEOF                  0x48
>> diff --git a/include/standard-headers/linux/pci_regs.h 
>> b/include/standard-headers/linux/pci_regs.h
>> index a39193213f..406dce8e82 100644
>> --- a/include/standard-headers/linux/pci_regs.h
>> +++ b/include/standard-headers/linux/pci_regs.h
>> @@ -935,6 +935,7 @@
>>   #define  PCI_PASID_CAP_EXEC 0x0002  /* Exec permissions Supported */
>>   #define  PCI_PASID_CAP_PRIV 0x0004  /* Privilege Mode Supported */
>>   #define  PCI_PASID_CAP_WIDTH        0x1f00
>> +#define  PCI_PASID_CAP_WIDTH_SHIFT  8
>>   #define PCI_PASID_CTRL              0x06    /* PASID control 
>> register */
>>   #define  PCI_PASID_CTRL_ENABLE      0x0001  /* Enable bit */
>>   #define  PCI_PASID_CTRL_EXEC        0x0002  /* Exec permissions 
>> Enable */
>
> -- 
> Regards,
> Yi Liu

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 19/22] memory: add an API for ATS support
  2024-07-03 12:14   ` Yi Liu
@ 2024-07-04  4:30     ` CLEMENT MATHIEU--DRIF
  2024-07-04 12:52       ` Yi Liu
  0 siblings, 1 reply; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-04  4:30 UTC (permalink / raw)
  To: Yi Liu, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com


On 03/07/2024 14:14, Yi Liu wrote:
> Caution: External email. Do not open attachments or click links, 
> unless this email comes from a known sender and you know the content 
> is safe.
>
>
> On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>
>> IOMMU have to implement iommu_ats_request_translation to support ATS.
>>
>> Devices can use IOMMU_TLB_ENTRY_TRANSLATION_ERROR to check the tlb
>> entries returned by a translation request.
>>
>> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>> ---
>>   include/exec/memory.h | 26 ++++++++++++++++++++++++++
>>   system/memory.c       | 20 ++++++++++++++++++++
>>   2 files changed, 46 insertions(+)
>>
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 003ee06610..48555c87c6 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -148,6 +148,10 @@ struct IOMMUTLBEntry {
>>       uint32_t         pasid;
>>   };
>>
>> +/* Check if an IOMMU TLB entry indicates a translation error */
>> +#define IOMMU_TLB_ENTRY_TRANSLATION_ERROR(entry) ((((entry)->perm) & 
>> IOMMU_RW) \
>> +                                                    == IOMMU_NONE)
>> +
>>   /*
>>    * Bitmap for different IOMMUNotifier capabilities. Each notifier can
>>    * register with one or multiple IOMMU Notifier capability bit(s).
>> @@ -571,6 +575,20 @@ struct IOMMUMemoryRegionClass {
>>        int (*iommu_set_iova_ranges)(IOMMUMemoryRegion *iommu,
>>                                     GList *iova_ranges,
>>                                     Error **errp);
>> +
>> +    /**
>> +     * @iommu_ats_request_translation:
>> +     * This method must be implemented if the IOMMU has ATS enabled
>> +     *
>> +     * @see pci_ats_request_translation_pasid
>> +     */
>> +    ssize_t (*iommu_ats_request_translation)(IOMMUMemoryRegion *iommu,
>> +                                             bool priv_req, bool 
>> exec_req,
>> +                                             hwaddr addr, size_t 
>> length,
>> +                                             bool no_write,
>> +                                             IOMMUTLBEntry *result,
>> +                                             size_t result_length,
>> +                                             uint32_t *err_count);
>>   };
>>
>
> I'm not quite understanding why the existing translate() does not work.
> Could you elaborate?
We need more parameters than what the existing translation function has.
This one is designed to get translations for a range instead of just a 
single address.
The main purpose is to expose an API that has the same parameters as a 
PCIe translation request message
and to give all the information the IOMMU needs to process the request.
>
>>   typedef struct RamDiscardListener RamDiscardListener;
>> @@ -1926,6 +1944,14 @@ void 
>> memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier 
>> *n);
>>   void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
>>                                                IOMMUNotifier *n);
>>
>> +ssize_t 
>> memory_region_iommu_ats_request_translation(IOMMUMemoryRegion *iommu_mr,
>> +                                                bool priv_req, bool 
>> exec_req,
>> +                                                hwaddr addr, size_t 
>> length,
>> +                                                bool no_write,
>> +                                                IOMMUTLBEntry *result,
>> +                                                size_t result_length,
>> +                                                uint32_t *err_count);
>> +
>>   /**
>>    * memory_region_iommu_get_attr: return an IOMMU attr if get_attr() is
>>    * defined on the IOMMU.
>> diff --git a/system/memory.c b/system/memory.c
>> index 74cd73ebc7..8268df7bf5 100644
>> --- a/system/memory.c
>> +++ b/system/memory.c
>> @@ -2005,6 +2005,26 @@ void 
>> memory_region_unregister_iommu_notifier(MemoryRegion *mr,
>>       memory_region_update_iommu_notify_flags(iommu_mr, NULL);
>>   }
>>
>> +ssize_t 
>> memory_region_iommu_ats_request_translation(IOMMUMemoryRegion *iommu_mr,
>> +                                                    bool priv_req,
>> +                                                    bool exec_req,
>> +                                                    hwaddr addr, 
>> size_t length,
>> +                                                    bool no_write,
>> + IOMMUTLBEntry *result,
>> +                                                    size_t 
>> result_length,
>> +                                                    uint32_t 
>> *err_count)
>> +{
>> +    IOMMUMemoryRegionClass *imrc = 
>> memory_region_get_iommu_class_nocheck(iommu_mr);
>> +
>> +    if (!imrc->iommu_ats_request_translation) {
>> +        return -ENODEV;
>> +    }
>> +
>> +    return imrc->iommu_ats_request_translation(iommu_mr, priv_req, 
>> exec_req,
>> +                                               addr, length, 
>> no_write, result,
>> +                                               result_length, 
>> err_count);
>> +}
>> +
>>   void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
>>                                       IOMMUTLBEvent *event)
>>   {
>
> -- 
> Regards,
> Yi Liu

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
  2024-07-03 12:32 ` Yi Liu
@ 2024-07-04  4:36   ` CLEMENT MATHIEU--DRIF
  2024-07-04  8:14     ` Yi Liu
  0 siblings, 1 reply; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-04  4:36 UTC (permalink / raw)
  To: Yi Liu, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com, Clement Mathieu--Drif


On 03/07/2024 14:32, Yi Liu wrote:
> Caution: External email. Do not open attachments or click links,
> unless this email comes from a known sender and you know the content
> is safe.
>
Hi, thanks for your review! very efficient!
>
> Hi CMD,
>
> I've went through the series. Some general suggestions on the series.
>
> 1) Patch 01, 02, 04 can be sent separately as they are fixes.
Will do
> 2) This series mixed the ATS and PASID capability a bit. Actually,
>    they don't have dependency. I'd suggest you split the series into
>       - support ATS for the requests without PASID
>       - support ATS for requests with PASID
>    The second part should be an incremental change based on the first
>    part. If you can make use of the existing translate() callback, then
>    it is possible to remove the dependency on Zhenzhong's stage-1 series.
The final purpose is to support SVM, consequently, we only add support
for ATS with PASID here
> 3) Some commits do not have commit message. It would be good to have
>    it.
Ok, I will be more verbose ;)
> 4) Some helpers look to be used by device model, if possible, it's better
>    to submit them with a demo device.
The demo device is already in my GitHub repo
(https://github.com/BullSequana/qemu/tree/master)
It will be sent in a future series that adds the last features required
for SVM (splitting the series to make reviews less painful)
> 5) A design description in the cover-letter would be helpful.
Ok, I will elaborate
>
> On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
>> From: Clement Mathieu--Drif <cmdetu@gmail.com>
>>
>> This series belongs to a list of series that add SVM support for VT-d.
>>
>> As a starting point, we use the series called 'intel_iommu: Enable
>> stage-1 translation' (rfc2) by Zhenzhong Duan and Yi Liu.
>>
>> Here we focus on the implementation of ATS support in the IOMMU and
>> on a PCI-level
>> API for ATS to be used by virtual devices.
>>
>> This work is based on the VT-d specification version 4.1 (March 2023).
>> Here is a link to a GitHub repository where you can find the
>> following elements :
>>      - Qemu with all the patches for SVM
>>          - ATS
>>          - PRI
>>          - Device IOTLB invalidations
>>          - Requests with already translated addresses
>>      - A demo device
>>      - A simple driver for the demo device
>>      - A userspace program (for testing and demonstration purposes)
>>
>> https://github.com/BullSequana/Qemu-in-guest-SVM-demo
>>
>>
>> v2
>>      - handle huge pages better by detecting the page table level at
>> which the translation errors occur
>>      - Changes after review by ZhenZhong Duan :
>>       - Set the access bit after checking permissions
>>       - helper for PASID and ATS : make the commit message more
>> accurate ('present' replaced with 'enabled')
>>       - pcie_pasid_init: add PCI_PASID_CAP_WIDTH_SHIFT and use it
>> instead of PCI_EXT_CAP_PASID_SIZEOF for shifting the pasid width when
>> preparing the capability register
>>       - pci: do not check pci_bus_bypass_iommu after calling
>> pci_device_get_iommu_bus_devfn
>>       - do not alter formatting of IOMMUTLBEntry declaration
>>       - vtd_iova_fl_check_canonical : directly use s->aw_bits instead
>> of aw for the sake of clarity
>>
>> v3
>>      - rebase on new version of Zhenzhong's flts implementation
>>      - fix the atc lookup operation (check the mask before returning
>> an entry)
>>      - add a unit test for the ATC
>>      - store a user pointer in the iommu notifiers to simplify the
>> implementation of svm devices
>>      Changes after review by Zhenzhong :
>>       - store the input pasid instead of rid2pasid when returning an
>> entry after a translation
>>       - split the ATC implementation and its unit tests
>>
>> v4
>>      Changes after internal review
>>       - Fix the nowrite optimization, an ATS translation without the
>> nowrite flag should not fail when the write permission is not set
>
> It's strange to list internal review here.
>
>> v5
>>      Changes after review by Philippe :
>>       - change the type of 'level' to unsigned in vtd_lookup_iotlb
>
> list change log from latest to the earliest would be nice too. Look
> forward
> to your next version. :)
>
> Regards,
> Yi Liu
>
>> Clément Mathieu--Drif (22):
>>    intel_iommu: fix FRCD construction macro.
>>    intel_iommu: make types match
>>    intel_iommu: return page walk level even when the translation fails
>>    intel_iommu: do not consider wait_desc as an invalid descriptor
>>    memory: add permissions in IOMMUAccessFlags
>>    pcie: add helper to declare PASID capability for a pcie device
>>    pcie: helper functions to check if PASID and ATS are enabled
>>    intel_iommu: declare supported PASID size
>>    pci: cache the bus mastering status in the device
>>    pci: add IOMMU operations to get address spaces and memory regions
>>      with PASID
>>    memory: store user data pointer in the IOMMU notifiers
>>    pci: add a pci-level initialization function for iommu notifiers
>>    intel_iommu: implement the get_address_space_pasid iommu operation
>>    intel_iommu: implement the get_memory_region_pasid iommu operation
>>    memory: Allow to store the PASID in IOMMUTLBEntry
>>    intel_iommu: fill the PASID field when creating an instance of
>>      IOMMUTLBEntry
>>    atc: generic ATC that can be used by PCIe devices that support SVM
>>    atc: add unit tests
>>    memory: add an API for ATS support
>>    pci: add a pci-level API for ATS
>>    intel_iommu: set the address mask even when a translation fails
>>    intel_iommu: add support for ATS
>>
>>   hw/i386/intel_iommu.c                     | 146 +++++-
>>   hw/i386/intel_iommu_internal.h            |   6 +-
>>   hw/pci/pci.c                              | 127 +++++-
>>   hw/pci/pcie.c                             |  42 ++
>>   include/exec/memory.h                     |  51 ++-
>>   include/hw/i386/intel_iommu.h             |   2 +-
>>   include/hw/pci/pci.h                      | 101 +++++
>>   include/hw/pci/pci_device.h               |   1 +
>>   include/hw/pci/pcie.h                     |   9 +-
>>   include/hw/pci/pcie_regs.h                |   3 +
>>   include/standard-headers/linux/pci_regs.h |   1 +
>>   system/memory.c                           |  20 +
>>   tests/unit/meson.build                    |   1 +
>>   tests/unit/test-atc.c                     | 527 ++++++++++++++++++++++
>>   util/atc.c                                | 211 +++++++++
>>   util/atc.h                                | 117 +++++
>>   util/meson.build                          |   1 +
>>   17 files changed, 1332 insertions(+), 34 deletions(-)
>>   create mode 100644 tests/unit/test-atc.c
>>   create mode 100644 util/atc.c
>>   create mode 100644 util/atc.h
>>
>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 00/22] ATS support for VT-d
  2024-07-04  4:36   ` CLEMENT MATHIEU--DRIF
@ 2024-07-04  8:14     ` Yi Liu
  0 siblings, 0 replies; 61+ messages in thread
From: Yi Liu @ 2024-07-04  8:14 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com, Clement Mathieu--Drif

On 2024/7/4 12:36, CLEMENT MATHIEU--DRIF wrote:
> 
> On 03/07/2024 14:32, Yi Liu wrote:
>> Caution: External email. Do not open attachments or click links,
>> unless this email comes from a known sender and you know the content
>> is safe.
>>
> Hi, thanks for your review! very efficient!
>>
>> Hi CMD,
>>
>> I've went through the series. Some general suggestions on the series.
>>
>> 1) Patch 01, 02, 04 can be sent separately as they are fixes.
> Will do
>> 2) This series mixed the ATS and PASID capability a bit. Actually,
>>     they don't have dependency. I'd suggest you split the series into
>>        - support ATS for the requests without PASID
>>        - support ATS for requests with PASID
>>     The second part should be an incremental change based on the first
>>     part. If you can make use of the existing translate() callback, then
>>     it is possible to remove the dependency on Zhenzhong's stage-1 series.
> The final purpose is to support SVM, consequently, we only add support
> for ATS with PASID here

yes. but no need to put all of them in one series. Just like you sent
the PRI series separately.

>> 3) Some commits do not have commit message. It would be good to have
>>     it.
> Ok, I will be more verbose ;)
>> 4) Some helpers look to be used by device model, if possible, it's better
>>     to submit them with a demo device.
> The demo device is already in my GitHub repo
> (https://github.com/BullSequana/qemu/tree/master)
> It will be sent in a future series that adds the last features required
> for SVM (splitting the series to make reviews less painful)
>> 5) A design description in the cover-letter would be helpful.
> Ok, I will elaborate
>>
>> On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
>>> From: Clement Mathieu--Drif <cmdetu@gmail.com>
>>>
>>> This series belongs to a list of series that add SVM support for VT-d.
>>>
>>> As a starting point, we use the series called 'intel_iommu: Enable
>>> stage-1 translation' (rfc2) by Zhenzhong Duan and Yi Liu.
>>>
>>> Here we focus on the implementation of ATS support in the IOMMU and
>>> on a PCI-level
>>> API for ATS to be used by virtual devices.
>>>
>>> This work is based on the VT-d specification version 4.1 (March 2023).
>>> Here is a link to a GitHub repository where you can find the
>>> following elements :
>>>       - Qemu with all the patches for SVM
>>>           - ATS
>>>           - PRI
>>>           - Device IOTLB invalidations
>>>           - Requests with already translated addresses
>>>       - A demo device
>>>       - A simple driver for the demo device
>>>       - A userspace program (for testing and demonstration purposes)
>>>
>>> https://github.com/BullSequana/Qemu-in-guest-SVM-demo
>>>
>>>
>>> v2
>>>       - handle huge pages better by detecting the page table level at
>>> which the translation errors occur
>>>       - Changes after review by ZhenZhong Duan :
>>>        - Set the access bit after checking permissions
>>>        - helper for PASID and ATS : make the commit message more
>>> accurate ('present' replaced with 'enabled')
>>>        - pcie_pasid_init: add PCI_PASID_CAP_WIDTH_SHIFT and use it
>>> instead of PCI_EXT_CAP_PASID_SIZEOF for shifting the pasid width when
>>> preparing the capability register
>>>        - pci: do not check pci_bus_bypass_iommu after calling
>>> pci_device_get_iommu_bus_devfn
>>>        - do not alter formatting of IOMMUTLBEntry declaration
>>>        - vtd_iova_fl_check_canonical : directly use s->aw_bits instead
>>> of aw for the sake of clarity
>>>
>>> v3
>>>       - rebase on new version of Zhenzhong's flts implementation
>>>       - fix the atc lookup operation (check the mask before returning
>>> an entry)
>>>       - add a unit test for the ATC
>>>       - store a user pointer in the iommu notifiers to simplify the
>>> implementation of svm devices
>>>       Changes after review by Zhenzhong :
>>>        - store the input pasid instead of rid2pasid when returning an
>>> entry after a translation
>>>        - split the ATC implementation and its unit tests
>>>
>>> v4
>>>       Changes after internal review
>>>        - Fix the nowrite optimization, an ATS translation without the
>>> nowrite flag should not fail when the write permission is not set
>>
>> It's strange to list internal review here.
>>
>>> v5
>>>       Changes after review by Philippe :
>>>        - change the type of 'level' to unsigned in vtd_lookup_iotlb
>>
>> list change log from latest to the earliest would be nice too. Look
>> forward
>> to your next version. :)
>>
>> Regards,
>> Yi Liu
>>
>>> Clément Mathieu--Drif (22):
>>>     intel_iommu: fix FRCD construction macro.
>>>     intel_iommu: make types match
>>>     intel_iommu: return page walk level even when the translation fails
>>>     intel_iommu: do not consider wait_desc as an invalid descriptor
>>>     memory: add permissions in IOMMUAccessFlags
>>>     pcie: add helper to declare PASID capability for a pcie device
>>>     pcie: helper functions to check if PASID and ATS are enabled
>>>     intel_iommu: declare supported PASID size
>>>     pci: cache the bus mastering status in the device
>>>     pci: add IOMMU operations to get address spaces and memory regions
>>>       with PASID
>>>     memory: store user data pointer in the IOMMU notifiers
>>>     pci: add a pci-level initialization function for iommu notifiers
>>>     intel_iommu: implement the get_address_space_pasid iommu operation
>>>     intel_iommu: implement the get_memory_region_pasid iommu operation
>>>     memory: Allow to store the PASID in IOMMUTLBEntry
>>>     intel_iommu: fill the PASID field when creating an instance of
>>>       IOMMUTLBEntry
>>>     atc: generic ATC that can be used by PCIe devices that support SVM
>>>     atc: add unit tests
>>>     memory: add an API for ATS support
>>>     pci: add a pci-level API for ATS
>>>     intel_iommu: set the address mask even when a translation fails
>>>     intel_iommu: add support for ATS
>>>
>>>    hw/i386/intel_iommu.c                     | 146 +++++-
>>>    hw/i386/intel_iommu_internal.h            |   6 +-
>>>    hw/pci/pci.c                              | 127 +++++-
>>>    hw/pci/pcie.c                             |  42 ++
>>>    include/exec/memory.h                     |  51 ++-
>>>    include/hw/i386/intel_iommu.h             |   2 +-
>>>    include/hw/pci/pci.h                      | 101 +++++
>>>    include/hw/pci/pci_device.h               |   1 +
>>>    include/hw/pci/pcie.h                     |   9 +-
>>>    include/hw/pci/pcie_regs.h                |   3 +
>>>    include/standard-headers/linux/pci_regs.h |   1 +
>>>    system/memory.c                           |  20 +
>>>    tests/unit/meson.build                    |   1 +
>>>    tests/unit/test-atc.c                     | 527 ++++++++++++++++++++++
>>>    util/atc.c                                | 211 +++++++++
>>>    util/atc.h                                | 117 +++++
>>>    util/meson.build                          |   1 +
>>>    17 files changed, 1332 insertions(+), 34 deletions(-)
>>>    create mode 100644 tests/unit/test-atc.c
>>>    create mode 100644 util/atc.c
>>>    create mode 100644 util/atc.h
>>>
>>
>>

-- 
Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 19/22] memory: add an API for ATS support
  2024-07-04  4:30     ` CLEMENT MATHIEU--DRIF
@ 2024-07-04 12:52       ` Yi Liu
  0 siblings, 0 replies; 61+ messages in thread
From: Yi Liu @ 2024-07-04 12:52 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF, qemu-devel@nongnu.org
  Cc: jasowang@redhat.com, zhenzhong.duan@intel.com,
	kevin.tian@intel.com, joao.m.martins@oracle.com,
	peterx@redhat.com, mst@redhat.com

On 2024/7/4 12:30, CLEMENT MATHIEU--DRIF wrote:
> 
> On 03/07/2024 14:14, Yi Liu wrote:
>> Caution: External email. Do not open attachments or click links,
>> unless this email comes from a known sender and you know the content
>> is safe.
>>
>>
>> On 2024/7/2 13:52, CLEMENT MATHIEU--DRIF wrote:
>>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>>
>>> IOMMU have to implement iommu_ats_request_translation to support ATS.
>>>
>>> Devices can use IOMMU_TLB_ENTRY_TRANSLATION_ERROR to check the tlb
>>> entries returned by a translation request.
>>>
>>> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>> ---
>>>    include/exec/memory.h | 26 ++++++++++++++++++++++++++
>>>    system/memory.c       | 20 ++++++++++++++++++++
>>>    2 files changed, 46 insertions(+)
>>>
>>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>>> index 003ee06610..48555c87c6 100644
>>> --- a/include/exec/memory.h
>>> +++ b/include/exec/memory.h
>>> @@ -148,6 +148,10 @@ struct IOMMUTLBEntry {
>>>        uint32_t         pasid;
>>>    };
>>>
>>> +/* Check if an IOMMU TLB entry indicates a translation error */
>>> +#define IOMMU_TLB_ENTRY_TRANSLATION_ERROR(entry) ((((entry)->perm) &
>>> IOMMU_RW) \
>>> +                                                    == IOMMU_NONE)
>>> +
>>>    /*
>>>     * Bitmap for different IOMMUNotifier capabilities. Each notifier can
>>>     * register with one or multiple IOMMU Notifier capability bit(s).
>>> @@ -571,6 +575,20 @@ struct IOMMUMemoryRegionClass {
>>>         int (*iommu_set_iova_ranges)(IOMMUMemoryRegion *iommu,
>>>                                      GList *iova_ranges,
>>>                                      Error **errp);
>>> +
>>> +    /**
>>> +     * @iommu_ats_request_translation:
>>> +     * This method must be implemented if the IOMMU has ATS enabled
>>> +     *
>>> +     * @see pci_ats_request_translation_pasid
>>> +     */
>>> +    ssize_t (*iommu_ats_request_translation)(IOMMUMemoryRegion *iommu,
>>> +                                             bool priv_req, bool
>>> exec_req,
>>> +                                             hwaddr addr, size_t
>>> length,
>>> +                                             bool no_write,
>>> +                                             IOMMUTLBEntry *result,
>>> +                                             size_t result_length,
>>> +                                             uint32_t *err_count);
>>>    };
>>>
>>
>> I'm not quite understanding why the existing translate() does not work.
>> Could you elaborate?
> We need more parameters than what the existing translation function has.
> This one is designed to get translations for a range instead of just a
> single address.
> The main purpose is to expose an API that has the same parameters as a
> PCIe translation request message
> and to give all the information the IOMMU needs to process the request.

ok. Please make the reason clear in commit message as well. Let's see if
any other opinion on it.

>>
>>>    typedef struct RamDiscardListener RamDiscardListener;
>>> @@ -1926,6 +1944,14 @@ void
>>> memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier
>>> *n);
>>>    void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
>>>                                                 IOMMUNotifier *n);
>>>
>>> +ssize_t
>>> memory_region_iommu_ats_request_translation(IOMMUMemoryRegion *iommu_mr,
>>> +                                                bool priv_req, bool
>>> exec_req,
>>> +                                                hwaddr addr, size_t
>>> length,
>>> +                                                bool no_write,
>>> +                                                IOMMUTLBEntry *result,
>>> +                                                size_t result_length,
>>> +                                                uint32_t *err_count);
>>> +
>>>    /**
>>>     * memory_region_iommu_get_attr: return an IOMMU attr if get_attr() is
>>>     * defined on the IOMMU.
>>> diff --git a/system/memory.c b/system/memory.c
>>> index 74cd73ebc7..8268df7bf5 100644
>>> --- a/system/memory.c
>>> +++ b/system/memory.c
>>> @@ -2005,6 +2005,26 @@ void
>>> memory_region_unregister_iommu_notifier(MemoryRegion *mr,
>>>        memory_region_update_iommu_notify_flags(iommu_mr, NULL);
>>>    }
>>>
>>> +ssize_t
>>> memory_region_iommu_ats_request_translation(IOMMUMemoryRegion *iommu_mr,
>>> +                                                    bool priv_req,
>>> +                                                    bool exec_req,
>>> +                                                    hwaddr addr,
>>> size_t length,
>>> +                                                    bool no_write,
>>> + IOMMUTLBEntry *result,
>>> +                                                    size_t
>>> result_length,
>>> +                                                    uint32_t
>>> *err_count)
>>> +{
>>> +    IOMMUMemoryRegionClass *imrc =
>>> memory_region_get_iommu_class_nocheck(iommu_mr);
>>> +
>>> +    if (!imrc->iommu_ats_request_translation) {
>>> +        return -ENODEV;
>>> +    }
>>> +
>>> +    return imrc->iommu_ats_request_translation(iommu_mr, priv_req,
>>> exec_req,
>>> +                                               addr, length,
>>> no_write, result,
>>> +                                               result_length,
>>> err_count);
>>> +}
>>> +
>>>    void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
>>>                                        IOMMUTLBEvent *event)
>>>    {
>>
>> -- 
>> Regards,
>> Yi Liu

-- 
Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 20/22] pci: add a pci-level API for ATS
  2024-07-02  5:52 ` [PATCH ats_vtd v5 20/22] pci: add a pci-level API for ATS CLEMENT MATHIEU--DRIF
@ 2024-07-09 10:15   ` Minwoo Im
  2024-07-09 11:58     ` CLEMENT MATHIEU--DRIF
  0 siblings, 1 reply; 61+ messages in thread
From: Minwoo Im @ 2024-07-09 10:15 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com,
	zhenzhong.duan@intel.com, kevin.tian@intel.com,
	yi.l.liu@intel.com, joao.m.martins@oracle.com, peterx@redhat.com,
	mst@redhat.com, minwoo.im

[-- Attachment #1: Type: text/plain, Size: 2315 bytes --]

On 24-07-02 05:52:45, CLEMENT MATHIEU--DRIF wrote:
> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> 
> Devices implementing ATS can send translation requests using
> pci_ats_request_translation_pasid.
> 
> The invalidation events are sent back to the device using the iommu
> notifier managed with pci_register_iommu_tlb_event_notifier and
> pci_unregister_iommu_tlb_event_notifier
> 
> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> ---
>  hw/pci/pci.c         | 44 +++++++++++++++++++++++++++++++++++++
>  include/hw/pci/pci.h | 52 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 96 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 7a483dd05d..93b816aff2 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2833,6 +2833,50 @@ void pci_device_unset_iommu_device(PCIDevice *dev)
>      }
>  }
>  
> +ssize_t pci_ats_request_translation_pasid(PCIDevice *dev, uint32_t pasid,
> +                                          bool priv_req, bool exec_req,
> +                                          hwaddr addr, size_t length,
> +                                          bool no_write, IOMMUTLBEntry *result,
> +                                          size_t result_length,
> +                                          uint32_t *err_count)
> +{
> +    assert(result_length);
> +    IOMMUMemoryRegion *iommu_mr = pci_device_iommu_memory_region_pasid(dev,
> +                                                                        pasid);
> +    if (!iommu_mr || !pcie_ats_enabled(dev)) {
> +        return -EPERM;
> +    }
> +    return memory_region_iommu_ats_request_translation(iommu_mr, priv_req,
> +                                                       exec_req, addr, length,
> +                                                       no_write, result,
> +                                                       result_length,
> +                                                       err_count);
> +}

Can we use this function not from the endpoint PCI device, but inside of the pci
subsystem (hw/pci/pci.c) to make transparent abstraction for ATS request from
PCI endpoint device POV?  I guess it would be better to have PCI subsystem to
issue ATS request if pcie_ats_enabled(dev) rather than calling from the endpoint
side.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 20/22] pci: add a pci-level API for ATS
  2024-07-09 10:15   ` Minwoo Im
@ 2024-07-09 11:58     ` CLEMENT MATHIEU--DRIF
  2024-07-09 21:17       ` Minwoo Im
  0 siblings, 1 reply; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-09 11:58 UTC (permalink / raw)
  To: Minwoo Im
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com,
	zhenzhong.duan@intel.com, kevin.tian@intel.com,
	yi.l.liu@intel.com, joao.m.martins@oracle.com, peterx@redhat.com,
	mst@redhat.com



On 09/07/2024 12:15, Minwoo Im wrote:
> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>
>
> On 24-07-02 05:52:45, CLEMENT MATHIEU--DRIF wrote:
>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>
>> Devices implementing ATS can send translation requests using
>> pci_ats_request_translation_pasid.
>>
>> The invalidation events are sent back to the device using the iommu
>> notifier managed with pci_register_iommu_tlb_event_notifier and
>> pci_unregister_iommu_tlb_event_notifier
>>
>> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>> ---
>>   hw/pci/pci.c         | 44 +++++++++++++++++++++++++++++++++++++
>>   include/hw/pci/pci.h | 52 ++++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 96 insertions(+)
>>
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index 7a483dd05d..93b816aff2 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -2833,6 +2833,50 @@ void pci_device_unset_iommu_device(PCIDevice *dev)
>>       }
>>   }
>>
>> +ssize_t pci_ats_request_translation_pasid(PCIDevice *dev, uint32_t pasid,
>> +                                          bool priv_req, bool exec_req,
>> +                                          hwaddr addr, size_t length,
>> +                                          bool no_write, IOMMUTLBEntry *result,
>> +                                          size_t result_length,
>> +                                          uint32_t *err_count)
>> +{
>> +    assert(result_length);
>> +    IOMMUMemoryRegion *iommu_mr = pci_device_iommu_memory_region_pasid(dev,
>> +                                                                        pasid);
>> +    if (!iommu_mr || !pcie_ats_enabled(dev)) {
>> +        return -EPERM;
>> +    }
>> +    return memory_region_iommu_ats_request_translation(iommu_mr, priv_req,
>> +                                                       exec_req, addr, length,
>> +                                                       no_write, result,
>> +                                                       result_length,
>> +                                                       err_count);
>> +}
> Can we use this function not from the endpoint PCI device, but inside of the pci
> subsystem (hw/pci/pci.c) to make transparent abstraction for ATS request from
> PCI endpoint device POV?  I guess it would be better to have PCI subsystem to
> issue ATS request if pcie_ats_enabled(dev) rather than calling from the endpoint
> side.
Hi,

This series aims to bring support for SVM (we are trying to integrate 
the patches bit by bit).
 From a spec point of view, I don't know if it would make sense to 
implement the SVM logic at the PCI level
as it's supposed to be implemented by endpoint devices.
However, we could consider providing a reference/reusable/encapsulated 
implementation of SVM with a simplified API
that would call the pci_* functions under the hood.

Do you have a specific use case in mind?

 >cmd

>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 20/22] pci: add a pci-level API for ATS
  2024-07-09 11:58     ` CLEMENT MATHIEU--DRIF
@ 2024-07-09 21:17       ` Minwoo Im
  2024-07-10  5:17         ` CLEMENT MATHIEU--DRIF
  0 siblings, 1 reply; 61+ messages in thread
From: Minwoo Im @ 2024-07-09 21:17 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com,
	zhenzhong.duan@intel.com, kevin.tian@intel.com,
	yi.l.liu@intel.com, joao.m.martins@oracle.com, peterx@redhat.com,
	mst@redhat.com, minwoo.im

[-- Attachment #1: Type: text/plain, Size: 3805 bytes --]

On 24-07-09 11:58:53, CLEMENT MATHIEU--DRIF wrote:
> 
> 
> On 09/07/2024 12:15, Minwoo Im wrote:
> > Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
> >
> >
> > On 24-07-02 05:52:45, CLEMENT MATHIEU--DRIF wrote:
> >> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> >>
> >> Devices implementing ATS can send translation requests using
> >> pci_ats_request_translation_pasid.
> >>
> >> The invalidation events are sent back to the device using the iommu
> >> notifier managed with pci_register_iommu_tlb_event_notifier and
> >> pci_unregister_iommu_tlb_event_notifier
> >>
> >> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> >> ---
> >>   hw/pci/pci.c         | 44 +++++++++++++++++++++++++++++++++++++
> >>   include/hw/pci/pci.h | 52 ++++++++++++++++++++++++++++++++++++++++++++
> >>   2 files changed, 96 insertions(+)
> >>
> >> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >> index 7a483dd05d..93b816aff2 100644
> >> --- a/hw/pci/pci.c
> >> +++ b/hw/pci/pci.c
> >> @@ -2833,6 +2833,50 @@ void pci_device_unset_iommu_device(PCIDevice *dev)
> >>       }
> >>   }
> >>
> >> +ssize_t pci_ats_request_translation_pasid(PCIDevice *dev, uint32_t pasid,
> >> +                                          bool priv_req, bool exec_req,
> >> +                                          hwaddr addr, size_t length,
> >> +                                          bool no_write, IOMMUTLBEntry *result,
> >> +                                          size_t result_length,
> >> +                                          uint32_t *err_count)
> >> +{
> >> +    assert(result_length);
> >> +    IOMMUMemoryRegion *iommu_mr = pci_device_iommu_memory_region_pasid(dev,
> >> +                                                                        pasid);
> >> +    if (!iommu_mr || !pcie_ats_enabled(dev)) {
> >> +        return -EPERM;
> >> +    }
> >> +    return memory_region_iommu_ats_request_translation(iommu_mr, priv_req,
> >> +                                                       exec_req, addr, length,
> >> +                                                       no_write, result,
> >> +                                                       result_length,
> >> +                                                       err_count);
> >> +}
> > Can we use this function not from the endpoint PCI device, but inside of the pci
> > subsystem (hw/pci/pci.c) to make transparent abstraction for ATS request from
> > PCI endpoint device POV?  I guess it would be better to have PCI subsystem to
> > issue ATS request if pcie_ats_enabled(dev) rather than calling from the endpoint
> > side.
> Hi,
> 
> This series aims to bring support for SVM (we are trying to integrate 
> the patches bit by bit).
>  From a spec point of view, I don't know if it would make sense to 
> implement the SVM logic at the PCI level
> as it's supposed to be implemented by endpoint devices.

Understood that this series is targeting the SVM usage.  But ATS feature is
something general to PCI devices, not only just for SVM, so I guess it would be
better to have caller to `pci_ats_request_translation_pasid()` in pci subsystem
like pci_dma_rw() to avoid duplicated implementation in the future for the
other PCI enpoint devices.

> However, we could consider providing a reference/reusable/encapsulated 
> implementation of SVM with a simplified API
> that would call the pci_* functions under the hood.

I would prefer that PCI devices which want to request ATS translation has no
additional implementation for ATS, but only pcie_ats_init().

> 
> Do you have a specific use case in mind?

ATS/PRI is the actual use case, and it's not that different what you are
targeting for :)

> 
>  >cmd
> 
> >

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 20/22] pci: add a pci-level API for ATS
  2024-07-09 21:17       ` Minwoo Im
@ 2024-07-10  5:17         ` CLEMENT MATHIEU--DRIF
  2024-07-11  8:04           ` Minwoo Im
  0 siblings, 1 reply; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-10  5:17 UTC (permalink / raw)
  To: Minwoo Im
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com,
	zhenzhong.duan@intel.com, kevin.tian@intel.com,
	yi.l.liu@intel.com, joao.m.martins@oracle.com, peterx@redhat.com,
	mst@redhat.com



On 09/07/2024 23:17, Minwoo Im wrote:
> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>
>
> On 24-07-09 11:58:53, CLEMENT MATHIEU--DRIF wrote:
>>
>> On 09/07/2024 12:15, Minwoo Im wrote:
>>> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>>>
>>>
>>> On 24-07-02 05:52:45, CLEMENT MATHIEU--DRIF wrote:
>>>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>>>
>>>> Devices implementing ATS can send translation requests using
>>>> pci_ats_request_translation_pasid.
>>>>
>>>> The invalidation events are sent back to the device using the iommu
>>>> notifier managed with pci_register_iommu_tlb_event_notifier and
>>>> pci_unregister_iommu_tlb_event_notifier
>>>>
>>>> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>>> ---
>>>>    hw/pci/pci.c         | 44 +++++++++++++++++++++++++++++++++++++
>>>>    include/hw/pci/pci.h | 52 ++++++++++++++++++++++++++++++++++++++++++++
>>>>    2 files changed, 96 insertions(+)
>>>>
>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>> index 7a483dd05d..93b816aff2 100644
>>>> --- a/hw/pci/pci.c
>>>> +++ b/hw/pci/pci.c
>>>> @@ -2833,6 +2833,50 @@ void pci_device_unset_iommu_device(PCIDevice *dev)
>>>>        }
>>>>    }
>>>>
>>>> +ssize_t pci_ats_request_translation_pasid(PCIDevice *dev, uint32_t pasid,
>>>> +                                          bool priv_req, bool exec_req,
>>>> +                                          hwaddr addr, size_t length,
>>>> +                                          bool no_write, IOMMUTLBEntry *result,
>>>> +                                          size_t result_length,
>>>> +                                          uint32_t *err_count)
>>>> +{
>>>> +    assert(result_length);
>>>> +    IOMMUMemoryRegion *iommu_mr = pci_device_iommu_memory_region_pasid(dev,
>>>> +                                                                        pasid);
>>>> +    if (!iommu_mr || !pcie_ats_enabled(dev)) {
>>>> +        return -EPERM;
>>>> +    }
>>>> +    return memory_region_iommu_ats_request_translation(iommu_mr, priv_req,
>>>> +                                                       exec_req, addr, length,
>>>> +                                                       no_write, result,
>>>> +                                                       result_length,
>>>> +                                                       err_count);
>>>> +}
>>> Can we use this function not from the endpoint PCI device, but inside of the pci
>>> subsystem (hw/pci/pci.c) to make transparent abstraction for ATS request from
>>> PCI endpoint device POV?  I guess it would be better to have PCI subsystem to
>>> issue ATS request if pcie_ats_enabled(dev) rather than calling from the endpoint
>>> side.
>> Hi,
>>
>> This series aims to bring support for SVM (we are trying to integrate
>> the patches bit by bit).
>>   From a spec point of view, I don't know if it would make sense to
>> implement the SVM logic at the PCI level
>> as it's supposed to be implemented by endpoint devices.
> Understood that this series is targeting the SVM usage.  But ATS feature is
> something general to PCI devices, not only just for SVM, so I guess it would be
> better to have caller to `pci_ats_request_translation_pasid()` in pci subsystem
> like pci_dma_rw() to avoid duplicated implementation in the future for the
> other PCI enpoint devices.

Would we store the ATC directly in the PCI subsytem?
>
>> However, we could consider providing a reference/reusable/encapsulated
>> implementation of SVM with a simplified API
>> that would call the pci_* functions under the hood.
> I would prefer that PCI devices which want to request ATS translation has no
> additional implementation for ATS, but only pcie_ats_init().
Hi,

I think both strategies can coexist.
Keeping control can be interesting for people who use Qemu for hardware 
prototyping and who generally want to experiment.
We can keep the current PCI-level API for devices that want to 
reimplement the logic themselves
and add a kind of "DMA module"/"ATS+PRI module" that works out of the box.
That module could be called "struct PciDmaModule" and expose a simple 
set of functions like pci_dma_module_init, pci_dma_module_read, 
pci_dma_module_write.
I think it's important to keep existing DMA API as is to allow devices 
to do both "with ATS" and "without ATS" operations.

Do you agree with that?
>
>> Do you have a specific use case in mind?
> ATS/PRI is the actual use case, and it's not that different what you are
> targeting for :)
>
>>   >cmd
>>
>>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 20/22] pci: add a pci-level API for ATS
  2024-07-10  5:17         ` CLEMENT MATHIEU--DRIF
@ 2024-07-11  8:04           ` Minwoo Im
  2024-07-11 19:00             ` CLEMENT MATHIEU--DRIF
  0 siblings, 1 reply; 61+ messages in thread
From: Minwoo Im @ 2024-07-11  8:04 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com,
	zhenzhong.duan@intel.com, kevin.tian@intel.com,
	yi.l.liu@intel.com, joao.m.martins@oracle.com, peterx@redhat.com,
	mst@redhat.com, minwoo.im

[-- Attachment #1: Type: text/plain, Size: 5721 bytes --]

On 24-07-10 05:17:42, CLEMENT MATHIEU--DRIF wrote:
> 
> 
> On 09/07/2024 23:17, Minwoo Im wrote:
> > Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
> >
> >
> > On 24-07-09 11:58:53, CLEMENT MATHIEU--DRIF wrote:
> >>
> >> On 09/07/2024 12:15, Minwoo Im wrote:
> >>> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
> >>>
> >>>
> >>> On 24-07-02 05:52:45, CLEMENT MATHIEU--DRIF wrote:
> >>>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> >>>>
> >>>> Devices implementing ATS can send translation requests using
> >>>> pci_ats_request_translation_pasid.
> >>>>
> >>>> The invalidation events are sent back to the device using the iommu
> >>>> notifier managed with pci_register_iommu_tlb_event_notifier and
> >>>> pci_unregister_iommu_tlb_event_notifier
> >>>>
> >>>> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> >>>> ---
> >>>>    hw/pci/pci.c         | 44 +++++++++++++++++++++++++++++++++++++
> >>>>    include/hw/pci/pci.h | 52 ++++++++++++++++++++++++++++++++++++++++++++
> >>>>    2 files changed, 96 insertions(+)
> >>>>
> >>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >>>> index 7a483dd05d..93b816aff2 100644
> >>>> --- a/hw/pci/pci.c
> >>>> +++ b/hw/pci/pci.c
> >>>> @@ -2833,6 +2833,50 @@ void pci_device_unset_iommu_device(PCIDevice *dev)
> >>>>        }
> >>>>    }
> >>>>
> >>>> +ssize_t pci_ats_request_translation_pasid(PCIDevice *dev, uint32_t pasid,
> >>>> +                                          bool priv_req, bool exec_req,
> >>>> +                                          hwaddr addr, size_t length,
> >>>> +                                          bool no_write, IOMMUTLBEntry *result,
> >>>> +                                          size_t result_length,
> >>>> +                                          uint32_t *err_count)
> >>>> +{
> >>>> +    assert(result_length);
> >>>> +    IOMMUMemoryRegion *iommu_mr = pci_device_iommu_memory_region_pasid(dev,
> >>>> +                                                                        pasid);
> >>>> +    if (!iommu_mr || !pcie_ats_enabled(dev)) {
> >>>> +        return -EPERM;
> >>>> +    }
> >>>> +    return memory_region_iommu_ats_request_translation(iommu_mr, priv_req,
> >>>> +                                                       exec_req, addr, length,
> >>>> +                                                       no_write, result,
> >>>> +                                                       result_length,
> >>>> +                                                       err_count);
> >>>> +}
> >>> Can we use this function not from the endpoint PCI device, but inside of the pci
> >>> subsystem (hw/pci/pci.c) to make transparent abstraction for ATS request from
> >>> PCI endpoint device POV?  I guess it would be better to have PCI subsystem to
> >>> issue ATS request if pcie_ats_enabled(dev) rather than calling from the endpoint
> >>> side.
> >> Hi,
> >>
> >> This series aims to bring support for SVM (we are trying to integrate
> >> the patches bit by bit).
> >>   From a spec point of view, I don't know if it would make sense to
> >> implement the SVM logic at the PCI level
> >> as it's supposed to be implemented by endpoint devices.
> > Understood that this series is targeting the SVM usage.  But ATS feature is
> > something general to PCI devices, not only just for SVM, so I guess it would be
> > better to have caller to `pci_ats_request_translation_pasid()` in pci subsystem
> > like pci_dma_rw() to avoid duplicated implementation in the future for the
> > other PCI enpoint devices.
> 
> Would we store the ATC directly in the PCI subsytem?

Yes, endpoint device (e.g., svm.c) should call pci_* helpers in PCI subsystem
with `PCIDevice *pdev instance` which represents the endpoint device itself.
By the instance, we can look up the IOTLB entry from the ATC in the PCI
subsystem, not the current caller side.

> >
> >> However, we could consider providing a reference/reusable/encapsulated
> >> implementation of SVM with a simplified API
> >> that would call the pci_* functions under the hood.
> > I would prefer that PCI devices which want to request ATS translation has no
> > additional implementation for ATS, but only pcie_ats_init().
> Hi,
> 
> I think both strategies can coexist.
> Keeping control can be interesting for people who use Qemu for hardware 
> prototyping and who generally want to experiment.
> We can keep the current PCI-level API for devices that want to 
> reimplement the logic themselves
> and add a kind of "DMA module"/"ATS+PRI module" that works out of the box.

I think we should proivde hybrid mode on this.  One for a `generic` cache
policy mode for every PCI endpoint devices which can be controlled in the PCI
subsystem for ATC, the other one is that device-specific cache policy mode
which let each device implement their own ATC lookup behaviors to optimize
their own caching impact.

> That module could be called "struct PciDmaModule" and expose a simple 
> set of functions like pci_dma_module_init, pci_dma_module_read, 
> pci_dma_module_write.
> I think it's important to keep existing DMA API as is to allow devices 
> to do both "with ATS" and "without ATS" operations.
> 
> Do you agree with that?

Indeed.  Keeping the existing APIs is a good choice, but I would like to have
endpoint devices code much more simpler for the generic usages :)

> >
> >> Do you have a specific use case in mind?
> > ATS/PRI is the actual use case, and it's not that different what you are
> > targeting for :)
> >
> >>   >cmd
> >>
> >>>

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 20/22] pci: add a pci-level API for ATS
  2024-07-11  8:04           ` Minwoo Im
@ 2024-07-11 19:00             ` CLEMENT MATHIEU--DRIF
  2024-07-17 23:44               ` Minwoo Im
  0 siblings, 1 reply; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-11 19:00 UTC (permalink / raw)
  To: Minwoo Im
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com,
	zhenzhong.duan@intel.com, kevin.tian@intel.com,
	yi.l.liu@intel.com, joao.m.martins@oracle.com, peterx@redhat.com,
	mst@redhat.com



On 11/07/2024 10:04, Minwoo Im wrote:
> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>
>
> On 24-07-10 05:17:42, CLEMENT MATHIEU--DRIF wrote:
>>
>> On 09/07/2024 23:17, Minwoo Im wrote:
>>> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>>>
>>>
>>> On 24-07-09 11:58:53, CLEMENT MATHIEU--DRIF wrote:
>>>> On 09/07/2024 12:15, Minwoo Im wrote:
>>>>> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>>>>>
>>>>>
>>>>> On 24-07-02 05:52:45, CLEMENT MATHIEU--DRIF wrote:
>>>>>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>>>>>
>>>>>> Devices implementing ATS can send translation requests using
>>>>>> pci_ats_request_translation_pasid.
>>>>>>
>>>>>> The invalidation events are sent back to the device using the iommu
>>>>>> notifier managed with pci_register_iommu_tlb_event_notifier and
>>>>>> pci_unregister_iommu_tlb_event_notifier
>>>>>>
>>>>>> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
>>>>>> ---
>>>>>>     hw/pci/pci.c         | 44 +++++++++++++++++++++++++++++++++++++
>>>>>>     include/hw/pci/pci.h | 52 ++++++++++++++++++++++++++++++++++++++++++++
>>>>>>     2 files changed, 96 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>>> index 7a483dd05d..93b816aff2 100644
>>>>>> --- a/hw/pci/pci.c
>>>>>> +++ b/hw/pci/pci.c
>>>>>> @@ -2833,6 +2833,50 @@ void pci_device_unset_iommu_device(PCIDevice *dev)
>>>>>>         }
>>>>>>     }
>>>>>>
>>>>>> +ssize_t pci_ats_request_translation_pasid(PCIDevice *dev, uint32_t pasid,
>>>>>> +                                          bool priv_req, bool exec_req,
>>>>>> +                                          hwaddr addr, size_t length,
>>>>>> +                                          bool no_write, IOMMUTLBEntry *result,
>>>>>> +                                          size_t result_length,
>>>>>> +                                          uint32_t *err_count)
>>>>>> +{
>>>>>> +    assert(result_length);
>>>>>> +    IOMMUMemoryRegion *iommu_mr = pci_device_iommu_memory_region_pasid(dev,
>>>>>> +                                                                        pasid);
>>>>>> +    if (!iommu_mr || !pcie_ats_enabled(dev)) {
>>>>>> +        return -EPERM;
>>>>>> +    }
>>>>>> +    return memory_region_iommu_ats_request_translation(iommu_mr, priv_req,
>>>>>> +                                                       exec_req, addr, length,
>>>>>> +                                                       no_write, result,
>>>>>> +                                                       result_length,
>>>>>> +                                                       err_count);
>>>>>> +}
>>>>> Can we use this function not from the endpoint PCI device, but inside of the pci
>>>>> subsystem (hw/pci/pci.c) to make transparent abstraction for ATS request from
>>>>> PCI endpoint device POV?  I guess it would be better to have PCI subsystem to
>>>>> issue ATS request if pcie_ats_enabled(dev) rather than calling from the endpoint
>>>>> side.
>>>> Hi,
>>>>
>>>> This series aims to bring support for SVM (we are trying to integrate
>>>> the patches bit by bit).
>>>>    From a spec point of view, I don't know if it would make sense to
>>>> implement the SVM logic at the PCI level
>>>> as it's supposed to be implemented by endpoint devices.
>>> Understood that this series is targeting the SVM usage.  But ATS feature is
>>> something general to PCI devices, not only just for SVM, so I guess it would be
>>> better to have caller to `pci_ats_request_translation_pasid()` in pci subsystem
>>> like pci_dma_rw() to avoid duplicated implementation in the future for the
>>> other PCI enpoint devices.
>> Would we store the ATC directly in the PCI subsytem?
> Yes, endpoint device (e.g., svm.c) should call pci_* helpers in PCI subsystem
> with `PCIDevice *pdev instance` which represents the endpoint device itself.
> By the instance, we can look up the IOTLB entry from the ATC in the PCI
> subsystem, not the current caller side.
>
>>>> However, we could consider providing a reference/reusable/encapsulated
>>>> implementation of SVM with a simplified API
>>>> that would call the pci_* functions under the hood.
>>> I would prefer that PCI devices which want to request ATS translation has no
>>> additional implementation for ATS, but only pcie_ats_init().
>> Hi,
>>
>> I think both strategies can coexist.
>> Keeping control can be interesting for people who use Qemu for hardware
>> prototyping and who generally want to experiment.
>> We can keep the current PCI-level API for devices that want to
>> reimplement the logic themselves
>> and add a kind of "DMA module"/"ATS+PRI module" that works out of the box.
> I think we should proivde hybrid mode on this.  One for a `generic` cache
> policy mode for every PCI endpoint devices which can be controlled in the PCI
> subsystem for ATC, the other one is that device-specific cache policy mode
> which let each device implement their own ATC lookup behaviors to optimize
> their own caching impact.
>
>> That module could be called "struct PciDmaModule" and expose a simple
>> set of functions like pci_dma_module_init, pci_dma_module_read,
>> pci_dma_module_write.
>> I think it's important to keep existing DMA API as is to allow devices
>> to do both "with ATS" and "without ATS" operations.
>>
>> Do you agree with that?
> Indeed.  Keeping the existing APIs is a good choice, but I would like to have
> endpoint devices code much more simpler for the generic usages :)
That's a good point, we will se what we can do once the current work is 
integrated.
Thanks for your comment :)
>
>>>> Do you have a specific use case in mind?
>>> ATS/PRI is the actual use case, and it's not that different what you are
>>> targeting for :)
>>>
>>>>    >cmd
>>>>
>>>>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 20/22] pci: add a pci-level API for ATS
  2024-07-11 19:00             ` CLEMENT MATHIEU--DRIF
@ 2024-07-17 23:44               ` Minwoo Im
  2024-07-18  7:46                 ` CLEMENT MATHIEU--DRIF
  0 siblings, 1 reply; 61+ messages in thread
From: Minwoo Im @ 2024-07-17 23:44 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com,
	zhenzhong.duan@intel.com, kevin.tian@intel.com,
	yi.l.liu@intel.com, joao.m.martins@oracle.com, peterx@redhat.com,
	mst@redhat.com, minwoo.im, jongh2.jeong

[-- Attachment #1: Type: text/plain, Size: 6581 bytes --]

On 24-07-11 19:00:58, CLEMENT MATHIEU--DRIF wrote:
> 
> 
> On 11/07/2024 10:04, Minwoo Im wrote:
> > Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
> >
> >
> > On 24-07-10 05:17:42, CLEMENT MATHIEU--DRIF wrote:
> >>
> >> On 09/07/2024 23:17, Minwoo Im wrote:
> >>> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
> >>>
> >>>
> >>> On 24-07-09 11:58:53, CLEMENT MATHIEU--DRIF wrote:
> >>>> On 09/07/2024 12:15, Minwoo Im wrote:
> >>>>> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
> >>>>>
> >>>>>
> >>>>> On 24-07-02 05:52:45, CLEMENT MATHIEU--DRIF wrote:
> >>>>>> From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> >>>>>>
> >>>>>> Devices implementing ATS can send translation requests using
> >>>>>> pci_ats_request_translation_pasid.
> >>>>>>
> >>>>>> The invalidation events are sent back to the device using the iommu
> >>>>>> notifier managed with pci_register_iommu_tlb_event_notifier and
> >>>>>> pci_unregister_iommu_tlb_event_notifier
> >>>>>>
> >>>>>> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
> >>>>>> ---
> >>>>>>     hw/pci/pci.c         | 44 +++++++++++++++++++++++++++++++++++++
> >>>>>>     include/hw/pci/pci.h | 52 ++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>     2 files changed, 96 insertions(+)
> >>>>>>
> >>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >>>>>> index 7a483dd05d..93b816aff2 100644
> >>>>>> --- a/hw/pci/pci.c
> >>>>>> +++ b/hw/pci/pci.c
> >>>>>> @@ -2833,6 +2833,50 @@ void pci_device_unset_iommu_device(PCIDevice *dev)
> >>>>>>         }
> >>>>>>     }
> >>>>>>
> >>>>>> +ssize_t pci_ats_request_translation_pasid(PCIDevice *dev, uint32_t pasid,
> >>>>>> +                                          bool priv_req, bool exec_req,
> >>>>>> +                                          hwaddr addr, size_t length,
> >>>>>> +                                          bool no_write, IOMMUTLBEntry *result,
> >>>>>> +                                          size_t result_length,
> >>>>>> +                                          uint32_t *err_count)
> >>>>>> +{
> >>>>>> +    assert(result_length);
> >>>>>> +    IOMMUMemoryRegion *iommu_mr = pci_device_iommu_memory_region_pasid(dev,
> >>>>>> +                                                                        pasid);
> >>>>>> +    if (!iommu_mr || !pcie_ats_enabled(dev)) {
> >>>>>> +        return -EPERM;
> >>>>>> +    }
> >>>>>> +    return memory_region_iommu_ats_request_translation(iommu_mr, priv_req,
> >>>>>> +                                                       exec_req, addr, length,
> >>>>>> +                                                       no_write, result,
> >>>>>> +                                                       result_length,
> >>>>>> +                                                       err_count);
> >>>>>> +}
> >>>>> Can we use this function not from the endpoint PCI device, but inside of the pci
> >>>>> subsystem (hw/pci/pci.c) to make transparent abstraction for ATS request from
> >>>>> PCI endpoint device POV?  I guess it would be better to have PCI subsystem to
> >>>>> issue ATS request if pcie_ats_enabled(dev) rather than calling from the endpoint
> >>>>> side.
> >>>> Hi,
> >>>>
> >>>> This series aims to bring support for SVM (we are trying to integrate
> >>>> the patches bit by bit).
> >>>>    From a spec point of view, I don't know if it would make sense to
> >>>> implement the SVM logic at the PCI level
> >>>> as it's supposed to be implemented by endpoint devices.
> >>> Understood that this series is targeting the SVM usage.  But ATS feature is
> >>> something general to PCI devices, not only just for SVM, so I guess it would be
> >>> better to have caller to `pci_ats_request_translation_pasid()` in pci subsystem
> >>> like pci_dma_rw() to avoid duplicated implementation in the future for the
> >>> other PCI enpoint devices.
> >> Would we store the ATC directly in the PCI subsytem?
> > Yes, endpoint device (e.g., svm.c) should call pci_* helpers in PCI subsystem
> > with `PCIDevice *pdev instance` which represents the endpoint device itself.
> > By the instance, we can look up the IOTLB entry from the ATC in the PCI
> > subsystem, not the current caller side.
> >
> >>>> However, we could consider providing a reference/reusable/encapsulated
> >>>> implementation of SVM with a simplified API
> >>>> that would call the pci_* functions under the hood.
> >>> I would prefer that PCI devices which want to request ATS translation has no
> >>> additional implementation for ATS, but only pcie_ats_init().
> >> Hi,
> >>
> >> I think both strategies can coexist.
> >> Keeping control can be interesting for people who use Qemu for hardware
> >> prototyping and who generally want to experiment.
> >> We can keep the current PCI-level API for devices that want to
> >> reimplement the logic themselves
> >> and add a kind of "DMA module"/"ATS+PRI module" that works out of the box.
> > I think we should proivde hybrid mode on this.  One for a `generic` cache
> > policy mode for every PCI endpoint devices which can be controlled in the PCI
> > subsystem for ATC, the other one is that device-specific cache policy mode
> > which let each device implement their own ATC lookup behaviors to optimize
> > their own caching impact.
> >
> >> That module could be called "struct PciDmaModule" and expose a simple
> >> set of functions like pci_dma_module_init, pci_dma_module_read,
> >> pci_dma_module_write.
> >> I think it's important to keep existing DMA API as is to allow devices
> >> to do both "with ATS" and "without ATS" operations.
> >>
> >> Do you agree with that?
> > Indeed.  Keeping the existing APIs is a good choice, but I would like to have
> > endpoint devices code much more simpler for the generic usages :)
> That's a good point, we will se what we can do once the current work is 
> integrated.
> Thanks for your comment :)

Do you have a plan to repost this series soon? I would like to apply your
patches and review/test this series.  It looks like you've been through some of
fix patches for VT-d, but I'm curious your further plan for the actual SVM
feature :)

> >
> >>>> Do you have a specific use case in mind?
> >>> ATS/PRI is the actual use case, and it's not that different what you are
> >>> targeting for :)
> >>>
> >>>>    >cmd
> >>>>
> >>>>>

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH ats_vtd v5 20/22] pci: add a pci-level API for ATS
  2024-07-17 23:44               ` Minwoo Im
@ 2024-07-18  7:46                 ` CLEMENT MATHIEU--DRIF
  0 siblings, 0 replies; 61+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-07-18  7:46 UTC (permalink / raw)
  To: Minwoo Im
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com,
	zhenzhong.duan@intel.com, kevin.tian@intel.com,
	yi.l.liu@intel.com, joao.m.martins@oracle.com, peterx@redhat.com,
	mst@redhat.com, jongh2.jeong@samsung.com

[-- Attachment #1: Type: text/plain, Size: 6822 bytes --]



On 18/07/2024 01:44, Minwoo Im wrote:

Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.


On 24-07-11 19:00:58, CLEMENT MATHIEU--DRIF wrote:




On 11/07/2024 10:04, Minwoo Im wrote:


Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.


On 24-07-10 05:17:42, CLEMENT MATHIEU--DRIF wrote:



On 09/07/2024 23:17, Minwoo Im wrote:


Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.


On 24-07-09 11:58:53, CLEMENT MATHIEU--DRIF wrote:


On 09/07/2024 12:15, Minwoo Im wrote:


Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.


On 24-07-02 05:52:45, CLEMENT MATHIEU--DRIF wrote:


From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com><mailto:clement.mathieu--drif@eviden.com>

Devices implementing ATS can send translation requests using
pci_ats_request_translation_pasid.

The invalidation events are sent back to the device using the iommu
notifier managed with pci_register_iommu_tlb_event_notifier and
pci_unregister_iommu_tlb_event_notifier

Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com><mailto:clement.mathieu--drif@eviden.com>
---
    hw/pci/pci.c         | 44 +++++++++++++++++++++++++++++++++++++
    include/hw/pci/pci.h | 52 ++++++++++++++++++++++++++++++++++++++++++++
    2 files changed, 96 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 7a483dd05d..93b816aff2 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2833,6 +2833,50 @@ void pci_device_unset_iommu_device(PCIDevice *dev)
        }
    }

+ssize_t pci_ats_request_translation_pasid(PCIDevice *dev, uint32_t pasid,
+                                          bool priv_req, bool exec_req,
+                                          hwaddr addr, size_t length,
+                                          bool no_write, IOMMUTLBEntry *result,
+                                          size_t result_length,
+                                          uint32_t *err_count)
+{
+    assert(result_length);
+    IOMMUMemoryRegion *iommu_mr = pci_device_iommu_memory_region_pasid(dev,
+                                                                        pasid);
+    if (!iommu_mr || !pcie_ats_enabled(dev)) {
+        return -EPERM;
+    }
+    return memory_region_iommu_ats_request_translation(iommu_mr, priv_req,
+                                                       exec_req, addr, length,
+                                                       no_write, result,
+                                                       result_length,
+                                                       err_count);
+}


Can we use this function not from the endpoint PCI device, but inside of the pci
subsystem (hw/pci/pci.c) to make transparent abstraction for ATS request from
PCI endpoint device POV?  I guess it would be better to have PCI subsystem to
issue ATS request if pcie_ats_enabled(dev) rather than calling from the endpoint
side.


Hi,

This series aims to bring support for SVM (we are trying to integrate
the patches bit by bit).
   From a spec point of view, I don't know if it would make sense to
implement the SVM logic at the PCI level
as it's supposed to be implemented by endpoint devices.


Understood that this series is targeting the SVM usage.  But ATS feature is
something general to PCI devices, not only just for SVM, so I guess it would be
better to have caller to `pci_ats_request_translation_pasid()` in pci subsystem
like pci_dma_rw() to avoid duplicated implementation in the future for the
other PCI enpoint devices.


Would we store the ATC directly in the PCI subsytem?


Yes, endpoint device (e.g., svm.c) should call pci_* helpers in PCI subsystem
with `PCIDevice *pdev instance` which represents the endpoint device itself.
By the instance, we can look up the IOTLB entry from the ATC in the PCI
subsystem, not the current caller side.



However, we could consider providing a reference/reusable/encapsulated
implementation of SVM with a simplified API
that would call the pci_* functions under the hood.


I would prefer that PCI devices which want to request ATS translation has no
additional implementation for ATS, but only pcie_ats_init().


Hi,

I think both strategies can coexist.
Keeping control can be interesting for people who use Qemu for hardware
prototyping and who generally want to experiment.
We can keep the current PCI-level API for devices that want to
reimplement the logic themselves
and add a kind of "DMA module"/"ATS+PRI module" that works out of the box.


I think we should proivde hybrid mode on this.  One for a `generic` cache
policy mode for every PCI endpoint devices which can be controlled in the PCI
subsystem for ATC, the other one is that device-specific cache policy mode
which let each device implement their own ATC lookup behaviors to optimize
their own caching impact.



That module could be called "struct PciDmaModule" and expose a simple
set of functions like pci_dma_module_init, pci_dma_module_read,
pci_dma_module_write.
I think it's important to keep existing DMA API as is to allow devices
to do both "with ATS" and "without ATS" operations.

Do you agree with that?


Indeed.  Keeping the existing APIs is a good choice, but I would like to have
endpoint devices code much more simpler for the generic usages :)


That's a good point, we will se what we can do once the current work is
integrated.
Thanks for your comment :)



Do you have a plan to repost this series soon? I would like to apply your
patches and review/test this series.  It looks like you've been through some of
fix patches for VT-d, but I'm curious your further plan for the actual SVM
feature :)

Hi,

As you may have noticed, my SVM implementation is based on Zhenzhong Duan's FLTS series, which has not yet been integrated.
The VT-d quick fixes were the only fixes without dependencies.
I'm now waiting for FLTS support to be integrated upstream before sending my series again (I'll add you to CC).

If you want to test the features, you can clone the following repos:
- https://github.com/BullSequana/Qemu-in-guest-SVM-demo
- https://github.com/BullSequana/qemu (submodule of the repo above)

>cmd









Do you have a specific use case in mind?


ATS/PRI is the actual use case, and it's not that different what you are
targeting for :)



   >cmd









[-- Attachment #2: Type: text/html, Size: 9982 bytes --]

^ permalink raw reply related	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2024-07-18  7:47 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-02  5:52 [PATCH ats_vtd v5 00/22] ATS support for VT-d CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 01/22] intel_iommu: fix FRCD construction macro CLEMENT MATHIEU--DRIF
2024-07-02 13:01   ` Yi Liu
2024-07-02 15:10     ` CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 02/22] intel_iommu: make types match CLEMENT MATHIEU--DRIF
2024-07-02 13:20   ` Yi Liu
2024-07-02  5:52 ` [PATCH ats_vtd v5 03/22] intel_iommu: return page walk level even when the translation fails CLEMENT MATHIEU--DRIF
2024-07-03 11:59   ` Yi Liu
2024-07-04  4:23     ` CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 05/22] memory: add permissions in IOMMUAccessFlags CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 04/22] intel_iommu: do not consider wait_desc as an invalid descriptor CLEMENT MATHIEU--DRIF
2024-07-02 13:33   ` Yi Liu
2024-07-02 15:29     ` CLEMENT MATHIEU--DRIF
2024-07-02 15:40       ` cmd
2024-07-03  7:29       ` Yi Liu
2024-07-03  8:28         ` cmd
2024-07-04  4:23         ` CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 06/22] pcie: add helper to declare PASID capability for a pcie device CLEMENT MATHIEU--DRIF
2024-07-03 12:04   ` Yi Liu
2024-07-04  4:25     ` CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 07/22] pcie: helper functions to check if PASID and ATS are enabled CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 08/22] intel_iommu: declare supported PASID size CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 09/22] pci: cache the bus mastering status in the device CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 10/22] pci: add IOMMU operations to get address spaces and memory regions with PASID CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 11/22] memory: store user data pointer in the IOMMU notifiers CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 12/22] pci: add a pci-level initialization function for iommu notifiers CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 13/22] intel_iommu: implement the get_address_space_pasid iommu operation CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 15/22] memory: Allow to store the PASID in IOMMUTLBEntry CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 14/22] intel_iommu: implement the get_memory_region_pasid iommu operation CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 16/22] intel_iommu: fill the PASID field when creating an instance of IOMMUTLBEntry CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 17/22] atc: generic ATC that can be used by PCIe devices that support SVM CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 18/22] atc: add unit tests CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 19/22] memory: add an API for ATS support CLEMENT MATHIEU--DRIF
2024-07-03 12:14   ` Yi Liu
2024-07-04  4:30     ` CLEMENT MATHIEU--DRIF
2024-07-04 12:52       ` Yi Liu
2024-07-02  5:52 ` [PATCH ats_vtd v5 20/22] pci: add a pci-level API for ATS CLEMENT MATHIEU--DRIF
2024-07-09 10:15   ` Minwoo Im
2024-07-09 11:58     ` CLEMENT MATHIEU--DRIF
2024-07-09 21:17       ` Minwoo Im
2024-07-10  5:17         ` CLEMENT MATHIEU--DRIF
2024-07-11  8:04           ` Minwoo Im
2024-07-11 19:00             ` CLEMENT MATHIEU--DRIF
2024-07-17 23:44               ` Minwoo Im
2024-07-18  7:46                 ` CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 21/22] intel_iommu: set the address mask even when a translation fails CLEMENT MATHIEU--DRIF
2024-07-02  5:52 ` [PATCH ats_vtd v5 22/22] intel_iommu: add support for ATS CLEMENT MATHIEU--DRIF
2024-07-02 12:16 ` [PATCH ats_vtd v5 00/22] ATS support for VT-d Michael S. Tsirkin
2024-07-02 15:09   ` CLEMENT MATHIEU--DRIF
2024-07-02 13:44 ` Yi Liu
2024-07-02 15:12   ` CLEMENT MATHIEU--DRIF
2024-07-03 12:32 ` Yi Liu
2024-07-04  4:36   ` CLEMENT MATHIEU--DRIF
2024-07-04  8:14     ` Yi Liu
  -- strict thread matches above, loose matches on Subject: below --
2024-06-03  5:59 CLEMENT MATHIEU--DRIF
2024-07-01 20:02 ` Michael S. Tsirkin
2024-07-02  5:57   ` CLEMENT MATHIEU--DRIF
2024-07-02 12:15     ` Michael S. Tsirkin
2024-07-02 13:42       ` Yi Liu
2024-07-02 15:27         ` CLEMENT MATHIEU--DRIF
2024-07-02 15:28           ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).