public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table
@ 2023-06-21 23:54 Suravee Suthikulpanit
  2023-06-21 23:54 ` [RFC PATCH 01/21] iommu/amd: Declare helper functions as extern Suravee Suthikulpanit
                   ` (21 more replies)
  0 siblings, 22 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:54 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

OVERVIEW
--------
AMD IOMMU Hardware Accelerated Virtualized IOMMU (HW-vIOMMU) feature
provides partial hardware acceleration for implementing guest IOMMUs.
When the feature is  enabled, the following components are virtualized:
  * Guest Command Buffer
  * Guest Event Log (work-in-progress)
  * Guest PPR Log (work-in-progress))

In addition, this feature can be used in combination with nested IOMMU page
tables to accelerated address translation from GIOVA to GPA. In this case,
the host page table (a.k.a stage2 or v1) is managed by the hypervisor
(i.e. KVM/VFIO) and the guest page table (a.k.a stage1 or v2) is managed
by the guest IOMMU driver (e.g. when booting guest kernel with
amd_iommu=pgtable_v2 mode).

Since the IOMMU hardware virtualizes the guest command buffer, this allows
IOMMU operations to be accelerated such as invalidation of guest pages
(i.e. stage1) when the command is issued by the guest kernel without
intervention from the hypervisor.

This series is implemented on top of the IOMMUFD framework. It leverages
the exisiting APIs and ioctls for providing guest iommu information
(i.e. struct iommu_hw_info_amd), and allowing guest to provide guest page
table information (i.e. struct iommu_hwpt_amd_v2) for setting up user
domain.

Please see the [4],[5], and [6] for more detail on the AMD HW-vIOMMU.

NOTES
-----
This series is organized into two parts:
  * Part1: Preparing IOMMU driver for HW-vIOMMU support (Patch 1-8).

  * Part2: Introducing HW-vIOMMU support (Patch 9-21).

  * Patch 12 and 21 extends the existing IOMMUFD ioctls to support
    additional opterations, which can be categorized into:
    - Ioctls to init/destroy AMD HW-vIOMMU instance
    - Ioctls to attach/detach guest devices to the AMD HW-vIOMMU instance.
    - Ioctls to attach/detach guest domains to the AMD HW-vIOMMU instance.
    - Ioctls to trap certain AMD HW-vIOMMU MMIO register accesses.
    - Ioctls to trap AMD HW-vIOMMU command buffer initialization.
 
    Since these are specific to AMD HW-vIOMMU implementation but still
    want to leverage /dev/iommu, they are separated from existing VFIO-related
    ioctls.

  * Initial revision only supports 1 pasid in the guest (i.e. pasid 0).
    Multiple pasids support will be added in subsequent revision.

GITHUB
------
* Working Linux kernel prototype of this series [1] is based on [3].
* This sereis is tested with QEMU [2] (work-in-progress)

REFERENCES
----------
[1] Linux Github branch for this series
    https://github.com/AMDESE/linux/tree/wip/iommufd_nesting-06192023-yi_amd_viommu_20230621

[2] QEMU Github branch to be used for testing this series.
    https://github.com/AMDESE/qemu/tree/wip/iommufd_rfcv4.mig.reset.v4_var3%2Bnesting_amd_viommu_202300621

[3] Base Github branch from Yi Lui.
    https://github.com/yiliu1765/iommufd/tree/wip/iommufd_nesting-06192023-yi

[4] AMD IOMMU Specification
    https://www.amd.com/system/files/TechDocs/48882_3.07_PUB.pdf

[5] KVM Forum 2020 Presentation
    https://tinyurl.com/2p8b543c

[6] KVM Forum 2021 Presentation
    https://tinyurl.com/49sy42ry

Thank you,
Suravee Suthikulpanit

Suravee Suthikulpanit (21):
  iommu/amd: Declare helper functions as extern
  iommu/amd: Clean up spacing in amd_iommu_ops declaration
  iommu/amd: Update PASID, GATS, and GLX feature related macros
  iommu/amd: Modify domain_enable_v2() to add giov parameter
  iommu/amd: Refactor set_dte_entry() helper function
  iommu/amd: Modify set_dte_entry() to add gcr3 input parameter
  iommu/amd: Modify set_dte_entry() to add user domain input parameter
  iommu/amd: Allow nested IOMMU page tables
  iommu/amd: Add support for hw_info for iommu capability query
  iommu/amd: Introduce vIOMMU-specific events and event info
  iommu/amd: Introduce Reset vMMIO Command
  iommu/amd: Introduce AMD vIOMMU-specific UAPI
  iommu/amd: Introduce vIOMMU command-line option
  iommu/amd: Initialize vIOMMU private address space regions
  iommu/amd: Introduce vIOMMU vminit and vmdestroy ioctl
  iommu/amd: Introduce vIOMMU ioctl for updating device mapping table
  iommu/amd: Introduce vIOMMU ioctl for updating domain mapping
  iommu/amd: Introduce vIOMMU ioctl for handling guest MMIO accesses
  iommu/amd: Introduce vIOMMU ioctl for handling command buffer mapping
  iommu/amd: Introduce vIOMMU ioctl for setting up guest CR3
  iommufd: Introduce AMD HW-vIOMMU IOCTL

 drivers/iommu/amd/Makefile          |    2 +-
 drivers/iommu/amd/amd_iommu.h       |   40 +-
 drivers/iommu/amd/amd_iommu_types.h |   62 +-
 drivers/iommu/amd/amd_viommu.h      |   57 ++
 drivers/iommu/amd/init.c            |   29 +-
 drivers/iommu/amd/io_pgtable.c      |   18 +-
 drivers/iommu/amd/iommu.c           |  370 +++++++--
 drivers/iommu/amd/iommu_v2.c        |    2 +-
 drivers/iommu/amd/viommu.c          | 1110 +++++++++++++++++++++++++++
 drivers/iommu/iommufd/Makefile      |    3 +-
 drivers/iommu/iommufd/amd_viommu.c  |  158 ++++
 drivers/iommu/iommufd/main.c        |   17 +-
 include/linux/amd-viommu.h          |   26 +
 include/linux/iommu.h               |    1 +
 include/linux/iommufd.h             |    8 +
 include/uapi/linux/amd_viommu.h     |  145 ++++
 include/uapi/linux/iommufd.h        |   31 +
 17 files changed, 1964 insertions(+), 115 deletions(-)
 create mode 100644 drivers/iommu/amd/amd_viommu.h
 create mode 100644 drivers/iommu/amd/viommu.c
 create mode 100644 drivers/iommu/iommufd/amd_viommu.c
 create mode 100644 include/linux/amd-viommu.h
 create mode 100644 include/uapi/linux/amd_viommu.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH 01/21] iommu/amd: Declare helper functions as extern
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
@ 2023-06-21 23:54 ` Suravee Suthikulpanit
  2023-06-21 23:54 ` [RFC PATCH 02/21] iommu/amd: Clean up spacing in amd_iommu_ops declaration Suravee Suthikulpanit
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:54 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

To allow reuse from other files. There is no functional change.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu.h  | 18 ++++++++++++++++++
 drivers/iommu/amd/init.c       |  6 +++---
 drivers/iommu/amd/io_pgtable.c | 18 +++++++++---------
 drivers/iommu/amd/iommu.c      | 14 +++++++-------
 4 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index e98f20a9bdd8..827d065bbe8e 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -11,6 +11,24 @@
 
 #include "amd_iommu_types.h"
 
+extern void iommu_feature_enable(struct amd_iommu *iommu, u8 bit);
+extern void iommu_feature_disable(struct amd_iommu *iommu, u8 bit);
+extern u8 __iomem * __init iommu_map_mmio_space(u64 address, u64 end);
+extern void set_dte_entry(struct amd_iommu *iommu, u16 devid,
+			  struct protection_domain *domain,
+			  bool ats, bool ppr);
+extern int iommu_flush_dte(struct amd_iommu *iommu, u16 devid);
+extern struct protection_domain *to_pdomain(struct iommu_domain *dom);
+extern struct iommu_domain *amd_iommu_domain_alloc(unsigned int type);
+extern void amd_iommu_domain_free(struct iommu_domain *dom);
+extern int amd_iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+				  phys_addr_t paddr, size_t pgsize, size_t pgcount,
+				  int prot, gfp_t gfp, size_t *mapped);
+extern unsigned long amd_iommu_v1_unmap_pages(struct io_pgtable_ops *ops,
+					      unsigned long iova,
+					      size_t pgsize, size_t pgcount,
+					      struct iommu_iotlb_gather *gather);
+
 extern irqreturn_t amd_iommu_int_thread(int irq, void *data);
 extern irqreturn_t amd_iommu_int_handler(int irq, void *data);
 extern void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, u16 devid);
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 329a406cc37d..886cf55e75e2 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -432,7 +432,7 @@ static void iommu_set_device_table(struct amd_iommu *iommu)
 }
 
 /* Generic functions to enable/disable certain features of the IOMMU. */
-static void iommu_feature_enable(struct amd_iommu *iommu, u8 bit)
+void iommu_feature_enable(struct amd_iommu *iommu, u8 bit)
 {
 	u64 ctrl;
 
@@ -441,7 +441,7 @@ static void iommu_feature_enable(struct amd_iommu *iommu, u8 bit)
 	writeq(ctrl, iommu->mmio_base +  MMIO_CONTROL_OFFSET);
 }
 
-static void iommu_feature_disable(struct amd_iommu *iommu, u8 bit)
+void iommu_feature_disable(struct amd_iommu *iommu, u8 bit)
 {
 	u64 ctrl;
 
@@ -490,7 +490,7 @@ static void iommu_disable(struct amd_iommu *iommu)
  * mapping and unmapping functions for the IOMMU MMIO space. Each AMD IOMMU in
  * the system has one.
  */
-static u8 __iomem * __init iommu_map_mmio_space(u64 address, u64 end)
+u8 __iomem * __init iommu_map_mmio_space(u64 address, u64 end)
 {
 	if (!request_mem_region(address, end, "amd_iommu")) {
 		pr_err("Can not reserve memory region %llx-%llx for mmio\n",
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 1b67116882be..9b398673208d 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -360,9 +360,9 @@ static void free_clear_pte(u64 *pte, u64 pteval, struct list_head *freelist)
  * supporting all features of AMD IOMMU page tables like level skipping
  * and full 64 bit address spaces.
  */
-static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
-			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
-			      int prot, gfp_t gfp, size_t *mapped)
+int amd_iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+			   phys_addr_t paddr, size_t pgsize, size_t pgcount,
+			   int prot, gfp_t gfp, size_t *mapped)
 {
 	struct protection_domain *dom = io_pgtable_ops_to_domain(ops);
 	LIST_HEAD(freelist);
@@ -435,10 +435,10 @@ static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	return ret;
 }
 
-static unsigned long iommu_v1_unmap_pages(struct io_pgtable_ops *ops,
-					  unsigned long iova,
-					  size_t pgsize, size_t pgcount,
-					  struct iommu_iotlb_gather *gather)
+unsigned long amd_iommu_v1_unmap_pages(struct io_pgtable_ops *ops,
+				       unsigned long iova,
+				       size_t pgsize, size_t pgcount,
+				       struct iommu_iotlb_gather *gather)
 {
 	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
 	unsigned long long unmapped;
@@ -524,8 +524,8 @@ static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void *coo
 	cfg->oas            = IOMMU_OUT_ADDR_BIT_SIZE,
 	cfg->tlb            = &v1_flush_ops;
 
-	pgtable->iop.ops.map_pages    = iommu_v1_map_pages;
-	pgtable->iop.ops.unmap_pages  = iommu_v1_unmap_pages;
+	pgtable->iop.ops.map_pages    = amd_iommu_v1_map_pages;
+	pgtable->iop.ops.unmap_pages  = amd_iommu_v1_unmap_pages;
 	pgtable->iop.ops.iova_to_phys = iommu_v1_iova_to_phys;
 
 	return &pgtable->iop;
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 4a314647d1f7..bbd10698851f 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -177,7 +177,7 @@ static struct amd_iommu *rlookup_amd_iommu(struct device *dev)
 	return __rlookup_amd_iommu(seg, PCI_SBDF_TO_DEVID(devid));
 }
 
-static struct protection_domain *to_pdomain(struct iommu_domain *dom)
+struct protection_domain *to_pdomain(struct iommu_domain *dom)
 {
 	return container_of(dom, struct protection_domain, domain);
 }
@@ -450,7 +450,7 @@ static void amd_iommu_uninit_device(struct device *dev)
  *
  ****************************************************************************/
 
-static void dump_dte_entry(struct amd_iommu *iommu, u16 devid)
+void dump_dte_entry(struct amd_iommu *iommu, u16 devid)
 {
 	int i;
 	struct dev_table_entry *dev_table = get_dev_table(iommu);
@@ -1192,7 +1192,7 @@ static int iommu_completion_wait(struct amd_iommu *iommu)
 	return ret;
 }
 
-static int iommu_flush_dte(struct amd_iommu *iommu, u16 devid)
+int iommu_flush_dte(struct amd_iommu *iommu, u16 devid)
 {
 	struct iommu_cmd cmd;
 
@@ -1553,8 +1553,8 @@ static void free_gcr3_table(struct protection_domain *domain)
 	free_page((unsigned long)domain->gcr3_tbl);
 }
 
-static void set_dte_entry(struct amd_iommu *iommu, u16 devid,
-			  struct protection_domain *domain, bool ats, bool ppr)
+void set_dte_entry(struct amd_iommu *iommu, u16 devid,
+		   struct protection_domain *domain, bool ats, bool ppr)
 {
 	u64 pte_root = 0;
 	u64 flags = 0;
@@ -2118,7 +2118,7 @@ static struct protection_domain *protection_domain_alloc(unsigned int type)
 	return NULL;
 }
 
-static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
+struct iommu_domain *amd_iommu_domain_alloc(unsigned int type)
 {
 	struct protection_domain *domain;
 
@@ -2140,7 +2140,7 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
 	return &domain->domain;
 }
 
-static void amd_iommu_domain_free(struct iommu_domain *dom)
+void amd_iommu_domain_free(struct iommu_domain *dom)
 {
 	struct protection_domain *domain;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 02/21] iommu/amd: Clean up spacing in amd_iommu_ops declaration
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
  2023-06-21 23:54 ` [RFC PATCH 01/21] iommu/amd: Declare helper functions as extern Suravee Suthikulpanit
@ 2023-06-21 23:54 ` Suravee Suthikulpanit
  2023-06-21 23:54 ` [RFC PATCH 03/21] iommu/amd: Update PASID, GATS, and GLX feature related macros Suravee Suthikulpanit
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:54 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

Preparing for additional iommu_ops. There is no functional change.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/iommu.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index bbd10698851f..356e52f478f1 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2423,17 +2423,17 @@ static bool amd_iommu_enforce_cache_coherency(struct iommu_domain *domain)
 }
 
 const struct iommu_ops amd_iommu_ops = {
-	.capable = amd_iommu_capable,
-	.domain_alloc = amd_iommu_domain_alloc,
-	.probe_device = amd_iommu_probe_device,
-	.release_device = amd_iommu_release_device,
-	.probe_finalize = amd_iommu_probe_finalize,
-	.device_group = amd_iommu_device_group,
-	.get_resv_regions = amd_iommu_get_resv_regions,
-	.is_attach_deferred = amd_iommu_is_attach_deferred,
-	.pgsize_bitmap	= AMD_IOMMU_PGSIZES,
-	.def_domain_type = amd_iommu_def_domain_type,
-	.default_domain_ops = &(const struct iommu_domain_ops) {
+	.capable		= amd_iommu_capable,
+	.domain_alloc		= amd_iommu_domain_alloc,
+	.probe_device		= amd_iommu_probe_device,
+	.release_device		= amd_iommu_release_device,
+	.probe_finalize		= amd_iommu_probe_finalize,
+	.device_group		= amd_iommu_device_group,
+	.get_resv_regions	= amd_iommu_get_resv_regions,
+	.is_attach_deferred	= amd_iommu_is_attach_deferred,
+	.pgsize_bitmap		= AMD_IOMMU_PGSIZES,
+	.def_domain_type	= amd_iommu_def_domain_type,
+	.default_domain_ops	= &(const struct iommu_domain_ops) {
 		.attach_dev	= amd_iommu_attach_device,
 		.map_pages	= amd_iommu_map_pages,
 		.unmap_pages	= amd_iommu_unmap_pages,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 03/21] iommu/amd: Update PASID, GATS, and GLX feature related macros
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
  2023-06-21 23:54 ` [RFC PATCH 01/21] iommu/amd: Declare helper functions as extern Suravee Suthikulpanit
  2023-06-21 23:54 ` [RFC PATCH 02/21] iommu/amd: Clean up spacing in amd_iommu_ops declaration Suravee Suthikulpanit
@ 2023-06-21 23:54 ` Suravee Suthikulpanit
  2023-06-21 23:54 ` [RFC PATCH 04/21] iommu/amd: Modify domain_enable_v2() to add giov parameter Suravee Suthikulpanit
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:54 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

Clean up and reorder them according to the bit index. There is no
functional change.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu_types.h | 13 +++++++------
 drivers/iommu/amd/init.c            | 10 +++++-----
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 2ddbda3a4374..09df25779fe9 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -93,18 +93,19 @@
 #define FEATURE_GA		(1ULL<<7)
 #define FEATURE_HE		(1ULL<<8)
 #define FEATURE_PC		(1ULL<<9)
-#define FEATURE_GATS_SHIFT	(12)
-#define FEATURE_GATS_MASK	(3ULL)
 #define FEATURE_GAM_VAPIC	(1ULL<<21)
 #define FEATURE_GIOSUP		(1ULL<<48)
 #define FEATURE_EPHSUP		(1ULL<<50)
 #define FEATURE_SNP		(1ULL<<63)
 
-#define FEATURE_PASID_SHIFT	32
-#define FEATURE_PASID_MASK	(0x1fULL << FEATURE_PASID_SHIFT)
+#define FEATURE_GATS_SHIFT	12
+#define FEATURE_GATS_MASK	(0x03ULL << FEATURE_GATS_SHIFT)
 
-#define FEATURE_GLXVAL_SHIFT	14
-#define FEATURE_GLXVAL_MASK	(0x03ULL << FEATURE_GLXVAL_SHIFT)
+#define FEATURE_GLX_SHIFT	14
+#define FEATURE_GLX_MASK	(0x03ULL << FEATURE_GLX_SHIFT)
+
+#define FEATURE_PASMAX_SHIFT	32
+#define FEATURE_PASMAX_MASK	(0x1FULL << FEATURE_PASMAX_SHIFT)
 
 /* Extended Feature 2 Bits */
 #define FEATURE_SNPAVICSUP_SHIFT	5
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 886cf55e75e2..6a045a187971 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -310,7 +310,7 @@ static bool check_feature_on_all_iommus(u64 mask)
 
 static inline int check_feature_gpt_level(void)
 {
-	return ((amd_iommu_efr >> FEATURE_GATS_SHIFT) & FEATURE_GATS_MASK);
+	return ((amd_iommu_efr && FEATURE_GATS_MASK) >> FEATURE_GATS_SHIFT);
 }
 
 /*
@@ -2039,16 +2039,16 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
 		u32 max_pasid;
 		u64 pasmax;
 
-		pasmax = iommu->features & FEATURE_PASID_MASK;
-		pasmax >>= FEATURE_PASID_SHIFT;
+		pasmax = iommu->features & FEATURE_PASMAX_MASK;
+		pasmax >>= FEATURE_PASMAX_SHIFT;
 		max_pasid  = (1 << (pasmax + 1)) - 1;
 
 		amd_iommu_max_pasid = min(amd_iommu_max_pasid, max_pasid);
 
 		BUG_ON(amd_iommu_max_pasid & ~PASID_MASK);
 
-		glxval   = iommu->features & FEATURE_GLXVAL_MASK;
-		glxval >>= FEATURE_GLXVAL_SHIFT;
+		glxval   = iommu->features & FEATURE_GLX_MASK;
+		glxval >>= FEATURE_GLX_SHIFT;
 
 		if (amd_iommu_max_glx_val == -1)
 			amd_iommu_max_glx_val = glxval;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 04/21] iommu/amd: Modify domain_enable_v2() to add giov parameter
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (2 preceding siblings ...)
  2023-06-21 23:54 ` [RFC PATCH 03/21] iommu/amd: Update PASID, GATS, and GLX feature related macros Suravee Suthikulpanit
@ 2023-06-21 23:54 ` Suravee Suthikulpanit
  2023-06-21 23:54 ` [RFC PATCH 05/21] iommu/amd: Refactor set_dte_entry() helper function Suravee Suthikulpanit
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:54 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

To preparation for subsequent changes. There is no functional change.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu.h |  2 +-
 drivers/iommu/amd/iommu.c     | 14 +++++++-------
 drivers/iommu/amd/iommu_v2.c  |  2 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 827d065bbe8e..5d2eed07a1fa 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -70,7 +70,7 @@ extern int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
 extern int amd_iommu_register_ppr_notifier(struct notifier_block *nb);
 extern int amd_iommu_unregister_ppr_notifier(struct notifier_block *nb);
 extern void amd_iommu_domain_direct_map(struct iommu_domain *dom);
-extern int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids);
+extern int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids, bool giov);
 extern int amd_iommu_flush_page(struct iommu_domain *dom, u32 pasid,
 				u64 address);
 extern void amd_iommu_update_and_flush_device_table(struct protection_domain *domain);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 356e52f478f1..6017fce8d7fd 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -79,7 +79,7 @@ struct iommu_cmd {
 struct kmem_cache *amd_iommu_irq_cache;
 
 static void detach_device(struct device *dev);
-static int domain_enable_v2(struct protection_domain *domain, int pasids);
+static int domain_enable_v2(struct protection_domain *domain, int pasids, bool giov);
 
 /****************************************************************************
  *
@@ -2051,11 +2051,9 @@ static int protection_domain_init_v2(struct protection_domain *domain)
 		return -ENOMEM;
 	INIT_LIST_HEAD(&domain->dev_list);
 
-	domain->flags |= PD_GIOV_MASK;
-
 	domain->domain.pgsize_bitmap = AMD_IOMMU_PGSIZES_V2;
 
-	if (domain_enable_v2(domain, 1)) {
+	if (domain_enable_v2(domain, 1, true)) {
 		domain_id_free(domain->id);
 		return -ENOMEM;
 	}
@@ -2484,7 +2482,7 @@ void amd_iommu_domain_direct_map(struct iommu_domain *dom)
 EXPORT_SYMBOL(amd_iommu_domain_direct_map);
 
 /* Note: This function expects iommu_domain->lock to be held prior calling the function. */
-static int domain_enable_v2(struct protection_domain *domain, int pasids)
+static int domain_enable_v2(struct protection_domain *domain, int pasids, bool giov)
 {
 	int levels;
 
@@ -2501,13 +2499,15 @@ static int domain_enable_v2(struct protection_domain *domain, int pasids)
 
 	domain->glx      = levels;
 	domain->flags   |= PD_IOMMUV2_MASK;
+	if (giov)
+		domain->flags |= PD_GIOV_MASK;
 
 	amd_iommu_domain_update(domain);
 
 	return 0;
 }
 
-int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids)
+int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids, bool giov)
 {
 	struct protection_domain *pdom = to_pdomain(dom);
 	unsigned long flags;
@@ -2525,7 +2525,7 @@ int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids)
 		goto out;
 
 	if (!pdom->gcr3_tbl)
-		ret = domain_enable_v2(pdom, pasids);
+		ret = domain_enable_v2(pdom, pasids, giov);
 
 out:
 	spin_unlock_irqrestore(&pdom->lock, flags);
diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index 864e4ffb6aa9..0ddd10953d41 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -784,7 +784,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
 	dev_state->domain->type = IOMMU_DOMAIN_IDENTITY;
 	amd_iommu_domain_direct_map(dev_state->domain);
 
-	ret = amd_iommu_domain_enable_v2(dev_state->domain, pasids);
+	ret = amd_iommu_domain_enable_v2(dev_state->domain, pasids, false);
 	if (ret)
 		goto out_free_domain;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 05/21] iommu/amd: Refactor set_dte_entry() helper function
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (3 preceding siblings ...)
  2023-06-21 23:54 ` [RFC PATCH 04/21] iommu/amd: Modify domain_enable_v2() to add giov parameter Suravee Suthikulpanit
@ 2023-06-21 23:54 ` Suravee Suthikulpanit
  2023-06-21 23:54 ` [RFC PATCH 06/21] iommu/amd: Modify set_dte_entry() to add gcr3 input parameter Suravee Suthikulpanit
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:54 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

To separate logic for IOMMU guest (v2) page table into another helper
function in preparation for subsequent changes.

There is no functional change.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/iommu.c | 72 ++++++++++++++++++++++-----------------
 1 file changed, 41 insertions(+), 31 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 6017fce8d7fd..3b31ecde0122 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1553,6 +1553,42 @@ static void free_gcr3_table(struct protection_domain *domain)
 	free_page((unsigned long)domain->gcr3_tbl);
 }
 
+static void set_dte_entry_v2(struct amd_iommu *iommu,
+			     struct protection_domain *domain,
+			     u64 *gcr3_tbl, u64 *pte_root, u64 *flags)
+{
+	u64 gcr3 = iommu_virt_to_phys(gcr3_tbl);
+	u64 glx  = domain->glx;
+	u64 tmp;
+
+	if (!(domain->flags & PD_IOMMUV2_MASK))
+		return;
+
+	if ((domain->flags & PD_GIOV_MASK) &&
+	    iommu_feature(iommu, FEATURE_GIOSUP))
+		*pte_root |= DTE_FLAG_GIOV;
+
+	*pte_root |= DTE_FLAG_GV;
+	*pte_root |= (glx & DTE_GLX_MASK) << DTE_GLX_SHIFT;
+
+	/* First mask out possible old values for GCR3 table */
+	tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
+	*flags    &= ~tmp;
+
+	tmp = DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
+	*flags    &= ~tmp;
+
+	/* Encode GCR3 table into DTE */
+	tmp = DTE_GCR3_VAL_A(gcr3) << DTE_GCR3_SHIFT_A;
+	*pte_root |= tmp;
+
+	tmp = DTE_GCR3_VAL_B(gcr3) << DTE_GCR3_SHIFT_B;
+	*flags    |= tmp;
+
+	tmp = DTE_GCR3_VAL_C(gcr3) << DTE_GCR3_SHIFT_C;
+	*flags    |= tmp;
+}
+
 void set_dte_entry(struct amd_iommu *iommu, u16 devid,
 		   struct protection_domain *domain, bool ats, bool ppr)
 {
@@ -1586,38 +1622,12 @@ void set_dte_entry(struct amd_iommu *iommu, u16 devid,
 			pte_root |= 1ULL << DEV_ENTRY_PPR;
 	}
 
-	if (domain->flags & PD_IOMMUV2_MASK) {
-		u64 gcr3 = iommu_virt_to_phys(domain->gcr3_tbl);
-		u64 glx  = domain->glx;
-		u64 tmp;
-
-		pte_root |= DTE_FLAG_GV;
-		pte_root |= (glx & DTE_GLX_MASK) << DTE_GLX_SHIFT;
-
-		/* First mask out possible old values for GCR3 table */
-		tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
-		flags    &= ~tmp;
-
-		tmp = DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
-		flags    &= ~tmp;
-
-		/* Encode GCR3 table into DTE */
-		tmp = DTE_GCR3_VAL_A(gcr3) << DTE_GCR3_SHIFT_A;
-		pte_root |= tmp;
-
-		tmp = DTE_GCR3_VAL_B(gcr3) << DTE_GCR3_SHIFT_B;
-		flags    |= tmp;
-
-		tmp = DTE_GCR3_VAL_C(gcr3) << DTE_GCR3_SHIFT_C;
-		flags    |= tmp;
-
-		if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL) {
-			dev_table[devid].data[2] |=
-				((u64)GUEST_PGTABLE_5_LEVEL << DTE_GPT_LEVEL_SHIFT);
-		}
+	set_dte_entry_v2(iommu, domain, domain->gcr3_tbl, &pte_root, &flags);
 
-		if (domain->flags & PD_GIOV_MASK)
-			pte_root |= DTE_FLAG_GIOV;
+	if ((domain->flags & PD_IOMMUV2_MASK) &&
+	    amd_iommu_gpt_level == PAGE_MODE_5_LEVEL) {
+		dev_table[devid].data[2] |=
+			((u64)GUEST_PGTABLE_5_LEVEL << DTE_GPT_LEVEL_SHIFT);
 	}
 
 	flags &= ~DEV_DOMID_MASK;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 06/21] iommu/amd: Modify set_dte_entry() to add gcr3 input parameter
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (4 preceding siblings ...)
  2023-06-21 23:54 ` [RFC PATCH 05/21] iommu/amd: Refactor set_dte_entry() helper function Suravee Suthikulpanit
@ 2023-06-21 23:54 ` Suravee Suthikulpanit
  2023-06-21 23:54 ` [RFC PATCH 07/21] iommu/amd: Modify set_dte_entry() to add user domain " Suravee Suthikulpanit
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:54 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

To preparation for subsequent changes. There is no functional change.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu.h |  1 +
 drivers/iommu/amd/iommu.c     | 10 ++++++----
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 5d2eed07a1fa..dbfc70556220 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -16,6 +16,7 @@ extern void iommu_feature_disable(struct amd_iommu *iommu, u8 bit);
 extern u8 __iomem * __init iommu_map_mmio_space(u64 address, u64 end);
 extern void set_dte_entry(struct amd_iommu *iommu, u16 devid,
 			  struct protection_domain *domain,
+			  u64 *gcr3_tbl,
 			  bool ats, bool ppr);
 extern int iommu_flush_dte(struct amd_iommu *iommu, u16 devid);
 extern struct protection_domain *to_pdomain(struct iommu_domain *dom);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3b31ecde0122..4728929657f5 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1590,7 +1590,9 @@ static void set_dte_entry_v2(struct amd_iommu *iommu,
 }
 
 void set_dte_entry(struct amd_iommu *iommu, u16 devid,
-		   struct protection_domain *domain, bool ats, bool ppr)
+		   struct protection_domain *domain,
+		   u64 *gcr3_tbl,
+		   bool ats, bool ppr)
 {
 	u64 pte_root = 0;
 	u64 flags = 0;
@@ -1622,7 +1624,7 @@ void set_dte_entry(struct amd_iommu *iommu, u16 devid,
 			pte_root |= 1ULL << DEV_ENTRY_PPR;
 	}
 
-	set_dte_entry_v2(iommu, domain, domain->gcr3_tbl, &pte_root, &flags);
+	set_dte_entry_v2(iommu, domain, gcr3_tbl, &pte_root, &flags);
 
 	if ((domain->flags & PD_IOMMUV2_MASK) &&
 	    amd_iommu_gpt_level == PAGE_MODE_5_LEVEL) {
@@ -1686,7 +1688,7 @@ static void do_attach(struct iommu_dev_data *dev_data,
 	domain->dev_cnt                 += 1;
 
 	/* Update device table */
-	set_dte_entry(iommu, dev_data->devid, domain,
+	set_dte_entry(iommu, dev_data->devid, domain, domain->gcr3_tbl,
 		      ats, dev_data->iommu_v2);
 	clone_aliases(iommu, dev_data->dev);
 
@@ -1965,7 +1967,7 @@ static void update_device_table(struct protection_domain *domain)
 
 		if (!iommu)
 			continue;
-		set_dte_entry(iommu, dev_data->devid, domain,
+		set_dte_entry(iommu, dev_data->devid, domain, domain->gcr3_tbl,
 			      dev_data->ats.enabled, dev_data->iommu_v2);
 		clone_aliases(iommu, dev_data->dev);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 07/21] iommu/amd: Modify set_dte_entry() to add user domain input parameter
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (5 preceding siblings ...)
  2023-06-21 23:54 ` [RFC PATCH 06/21] iommu/amd: Modify set_dte_entry() to add gcr3 input parameter Suravee Suthikulpanit
@ 2023-06-21 23:54 ` Suravee Suthikulpanit
  2023-06-21 23:54 ` [RFC PATCH 08/21] iommu/amd: Allow nested IOMMU page tables Suravee Suthikulpanit
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:54 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

When setting up IOMMU page table in nested mode, the host (v1) table
is managed by the hypervisor, while the guest (v2) table is managed by
the guest kernel. In this case, IOMMU driver needs to program IOMMU device
table entry (DTE) using the set_dte_entry() helper function with guest
table information (i.e. gcr3 table, glx, max pasid), which is stored
in the user domain.

There is no functional change.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu.h | 1 +
 drivers/iommu/amd/iommu.c     | 9 ++++++---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index dbfc70556220..d36a39796c2f 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -16,6 +16,7 @@ extern void iommu_feature_disable(struct amd_iommu *iommu, u8 bit);
 extern u8 __iomem * __init iommu_map_mmio_space(u64 address, u64 end);
 extern void set_dte_entry(struct amd_iommu *iommu, u16 devid,
 			  struct protection_domain *domain,
+			  struct protection_domain *udomain,
 			  u64 *gcr3_tbl,
 			  bool ats, bool ppr);
 extern int iommu_flush_dte(struct amd_iommu *iommu, u16 devid);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 4728929657f5..333c8a4831be 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1591,9 +1591,11 @@ static void set_dte_entry_v2(struct amd_iommu *iommu,
 
 void set_dte_entry(struct amd_iommu *iommu, u16 devid,
 		   struct protection_domain *domain,
+		   struct protection_domain *udomain,
 		   u64 *gcr3_tbl,
 		   bool ats, bool ppr)
 {
+	struct protection_domain *dom;
 	u64 pte_root = 0;
 	u64 flags = 0;
 	u32 old_domid;
@@ -1624,7 +1626,8 @@ void set_dte_entry(struct amd_iommu *iommu, u16 devid,
 			pte_root |= 1ULL << DEV_ENTRY_PPR;
 	}
 
-	set_dte_entry_v2(iommu, domain, gcr3_tbl, &pte_root, &flags);
+	dom = udomain ? udomain : domain;
+	set_dte_entry_v2(iommu, dom, gcr3_tbl, &pte_root, &flags);
 
 	if ((domain->flags & PD_IOMMUV2_MASK) &&
 	    amd_iommu_gpt_level == PAGE_MODE_5_LEVEL) {
@@ -1688,7 +1691,7 @@ static void do_attach(struct iommu_dev_data *dev_data,
 	domain->dev_cnt                 += 1;
 
 	/* Update device table */
-	set_dte_entry(iommu, dev_data->devid, domain, domain->gcr3_tbl,
+	set_dte_entry(iommu, dev_data->devid, domain, NULL, domain->gcr3_tbl,
 		      ats, dev_data->iommu_v2);
 	clone_aliases(iommu, dev_data->dev);
 
@@ -1967,7 +1970,7 @@ static void update_device_table(struct protection_domain *domain)
 
 		if (!iommu)
 			continue;
-		set_dte_entry(iommu, dev_data->devid, domain, domain->gcr3_tbl,
+		set_dte_entry(iommu, dev_data->devid, domain, NULL, domain->gcr3_tbl,
 			      dev_data->ats.enabled, dev_data->iommu_v2);
 		clone_aliases(iommu, dev_data->dev);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 08/21] iommu/amd: Allow nested IOMMU page tables
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (6 preceding siblings ...)
  2023-06-21 23:54 ` [RFC PATCH 07/21] iommu/amd: Modify set_dte_entry() to add user domain " Suravee Suthikulpanit
@ 2023-06-21 23:54 ` Suravee Suthikulpanit
  2023-06-21 23:54 ` [RFC PATCH 09/21] iommu/amd: Add support for hw_info for iommu capability query Suravee Suthikulpanit
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:54 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

The GCR3 table contains guest CR3 registers, and it is used to
setup guest page tables. Current logic only allow guest CR3 table
setup only when the host table is not setup (i.e. PAGE_MODE_NONE).
Therefore, only 1-level page translation is allowed
(e.g. host only vs. guest only).

Remove this restriction to allow nested page table setup.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/iommu.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 333c8a4831be..c23f99ebdffc 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2531,12 +2531,12 @@ int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids, bool giov)
 	spin_lock_irqsave(&pdom->lock, flags);
 
 	/*
-	 * Save us all sanity checks whether devices already in the
-	 * domain support IOMMUv2. Just force that the domain has no
-	 * devices attached when it is switched into IOMMUv2 mode.
+	 * With nested page table, we can enable * v2 (i.e GCR3)
+	 * on a existing domain. Therefore, only check if domain
+	 * already enable v2.
 	 */
 	ret = -EBUSY;
-	if (pdom->dev_cnt > 0 || pdom->flags & PD_IOMMUV2_MASK)
+	if (pdom->flags & PD_IOMMUV2_MASK)
 		goto out;
 
 	if (!pdom->gcr3_tbl)
@@ -2688,9 +2688,6 @@ static int __set_gcr3(struct protection_domain *domain, u32 pasid,
 {
 	u64 *pte;
 
-	if (domain->iop.mode != PAGE_MODE_NONE)
-		return -EINVAL;
-
 	pte = __get_gcr3_pte(domain->gcr3_tbl, domain->glx, pasid, true);
 	if (pte == NULL)
 		return -ENOMEM;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 09/21] iommu/amd: Add support for hw_info for iommu capability query
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (7 preceding siblings ...)
  2023-06-21 23:54 ` [RFC PATCH 08/21] iommu/amd: Allow nested IOMMU page tables Suravee Suthikulpanit
@ 2023-06-21 23:54 ` Suravee Suthikulpanit
  2023-06-21 23:54 ` [RFC PATCH 10/21] iommu/amd: Introduce vIOMMU-specific events and event info Suravee Suthikulpanit
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:54 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

AMD IOMMU Extended Feature(2) Register (EFR/EFR2) specifies features
supported by each IOMMU hardware instance. The IOMMU driver checks each
feature-specific bits before enabling each feature at run time.

For hardware-assisted vIOMMU, the hypervisor determines which IOMMU
features to supported in the guest, and communicates this information
to user-space (e.g. QEMU) via iommufd IOMMU_DEVICE_GET_HW_INFO ioctl.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu.h       |  2 ++
 drivers/iommu/amd/amd_iommu_types.h |  3 +++
 drivers/iommu/amd/iommu.c           | 37 +++++++++++++++++++++++++++++
 include/uapi/linux/iommufd.h        | 11 +++++++++
 4 files changed, 53 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index d36a39796c2f..c9dfa4734801 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -84,6 +84,8 @@ extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, u32 pasid,
 				     unsigned long cr3);
 extern int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, u32 pasid);
 
+extern void amd_iommu_build_efr(u64 *efr, u64 *efr2);
+
 #ifdef CONFIG_IRQ_REMAP
 extern int amd_iommu_create_irq_domain(struct amd_iommu *iommu);
 #else
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 09df25779fe9..8830f511bee4 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -98,12 +98,15 @@
 #define FEATURE_EPHSUP		(1ULL<<50)
 #define FEATURE_SNP		(1ULL<<63)
 
+#define FEATURE_GATS_5LEVEL	1ULL
 #define FEATURE_GATS_SHIFT	12
 #define FEATURE_GATS_MASK	(0x03ULL << FEATURE_GATS_SHIFT)
 
+#define FEATURE_GLX_3LEVEL	0ULL
 #define FEATURE_GLX_SHIFT	14
 #define FEATURE_GLX_MASK	(0x03ULL << FEATURE_GLX_SHIFT)
 
+#define FEATURE_PASMAX_16	0xFULL
 #define FEATURE_PASMAX_SHIFT	32
 #define FEATURE_PASMAX_MASK	(0x1FULL << FEATURE_PASMAX_SHIFT)
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index c23f99ebdffc..4a42af85664e 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2435,8 +2435,45 @@ static bool amd_iommu_enforce_cache_coherency(struct iommu_domain *domain)
 	return true;
 }
 
+void amd_iommu_build_efr(u64 *efr, u64 *efr2)
+{
+	if (efr) {
+		*efr = (FEATURE_GT | FEATURE_GIOSUP);
+
+		/* 5-level v2 page table support */
+		*efr |= ((FEATURE_GATS_5LEVEL << FEATURE_GATS_SHIFT) &
+			 FEATURE_GATS_MASK);
+
+		/* 3-level GCR3 table support */
+		*efr |= ((FEATURE_GLX_3LEVEL << FEATURE_GLX_SHIFT) &
+			 FEATURE_GLX_MASK);
+
+		/* 16-bit PASMAX support */
+		*efr |= ((FEATURE_PASMAX_16 << FEATURE_PASMAX_SHIFT) &
+			 FEATURE_PASMAX_MASK);
+	}
+
+	if (efr2)
+		*efr2 = 0;
+}
+
+static void *amd_iommu_hw_info(struct device *dev, u32 *length)
+{
+	struct iommu_hw_info_amd *hwinfo;
+
+	hwinfo = kzalloc(sizeof(*hwinfo), GFP_KERNEL);
+	if (!hwinfo)
+		return ERR_PTR(-ENOMEM);
+
+	*length = sizeof(*hwinfo);
+
+	amd_iommu_build_efr(&hwinfo->efr, &hwinfo->efr2);
+	return hwinfo;
+}
+
 const struct iommu_ops amd_iommu_ops = {
 	.capable		= amd_iommu_capable,
+	.hw_info		= amd_iommu_hw_info,
 	.domain_alloc		= amd_iommu_domain_alloc,
 	.probe_device		= amd_iommu_probe_device,
 	.release_device		= amd_iommu_release_device,
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index ec870e2d32fd..f8ea9faf6770 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -508,6 +508,17 @@ struct iommu_hw_info_smmuv3 {
 	__u32 idr[6];
 };
 
+/**
+ * struct iommu_hw_info_amd - AMD IOMMU device info
+ *
+ * @efr : Value of AMD IOMMU Extended Feature Register (EFR)
+ * @efr2: Value of AMD IOMMU Extended Feature 2 Register (EFR2)
+ */
+struct iommu_hw_info_amd {
+	__u64 efr;
+	__u64 efr2;
+};
+
 /**
  * enum iommu_hw_info_type - IOMMU Hardware Info Types
  * @IOMMU_HW_INFO_TYPE_INTEL_VTD: Intel VT-d iommu info type
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 10/21] iommu/amd: Introduce vIOMMU-specific events and event info
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (8 preceding siblings ...)
  2023-06-21 23:54 ` [RFC PATCH 09/21] iommu/amd: Add support for hw_info for iommu capability query Suravee Suthikulpanit
@ 2023-06-21 23:54 ` Suravee Suthikulpanit
  2023-06-21 23:54 ` [RFC PATCH 11/21] iommu/amd: Introduce Reset vMMIO Command Suravee Suthikulpanit
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:54 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

Adding support for new vIOMMU events:
  * Guest Event Fault event
  * vIOMMU Hardware Error event

Also, adding support for the additional vIOMMU related flags
in existing events.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu_types.h |  3 ++
 drivers/iommu/amd/iommu.c           | 58 ++++++++++++++++++++++-------
 2 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 8830f511bee4..d832e0c36a21 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -145,6 +145,9 @@
 #define EVENT_TYPE_IOTLB_INV_TO	0x7
 #define EVENT_TYPE_INV_DEV_REQ	0x8
 #define EVENT_TYPE_INV_PPR_REQ	0x9
+#define EVENT_TYPE_GUEST_EVENT_FAULT	0xb
+#define EVENT_TYPE_VIOMMU_HW_ERR	0xc
+
 #define EVENT_TYPE_RMP_FAULT	0xd
 #define EVENT_TYPE_RMP_HW_ERR	0xe
 #define EVENT_DEVID_MASK	0xffff
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 4a42af85664e..efced59ba8a5 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -541,7 +541,7 @@ static void amd_iommu_report_rmp_fault(struct amd_iommu *iommu, volatile u32 *ev
 
 static void amd_iommu_report_page_fault(struct amd_iommu *iommu,
 					u16 devid, u16 domain_id,
-					u64 address, int flags)
+					u64 address, int flags, u8 vflags)
 {
 	struct iommu_dev_data *dev_data = NULL;
 	struct pci_dev *pdev;
@@ -576,13 +576,13 @@ static void amd_iommu_report_page_fault(struct amd_iommu *iommu,
 		}
 
 		if (__ratelimit(&dev_data->rs)) {
-			pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n",
-				domain_id, address, flags);
+			pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x vflags=%#x]\n",
+				domain_id, address, flags, vflags);
 		}
 	} else {
-		pr_err_ratelimited("Event logged [IO_PAGE_FAULT device=%04x:%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n",
+		pr_err_ratelimited("Event logged [IO_PAGE_FAULT device=%04x:%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x vflags=%#x]\n",
 			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
-			domain_id, address, flags);
+			domain_id, address, flags, vflags);
 	}
 
 out:
@@ -618,28 +618,41 @@ static void iommu_print_event(struct amd_iommu *iommu, void *__evt)
 	}
 
 	if (type == EVENT_TYPE_IO_FAULT) {
-		amd_iommu_report_page_fault(iommu, devid, pasid, address, flags);
+		u8 vflags = (event[0] >> 27) & 0x1F;
+
+		amd_iommu_report_page_fault(iommu, devid, pasid, address, flags, vflags);
 		return;
 	}
 
 	switch (type) {
 	case EVENT_TYPE_ILL_DEV:
-		dev_err(dev, "Event logged [ILLEGAL_DEV_TABLE_ENTRY device=%04x:%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x]\n",
+	{
+		u8 vflags = (event[0] >> 27) & 0x1F;
+
+		dev_err(dev, "Event logged [ILLEGAL_DEV_TABLE_ENTRY deice=%04x:%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x vflags=%#x]\n",
 			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
-			pasid, address, flags);
+			pasid, address, flags, vflags);
 		dump_dte_entry(iommu, devid);
 		break;
+	}
 	case EVENT_TYPE_DEV_TAB_ERR:
-		dev_err(dev, "Event logged [DEV_TAB_HARDWARE_ERROR device=%04x:%02x:%02x.%x "
-			"address=0x%llx flags=0x%04x]\n",
+	{
+		u8 vflags = (event[0] >> 27) & 0x1F;
+
+		dev_err(dev, "Event logged [DEV_TAB_HARDWARE_ERROR device=%04x:%02x:%02x.%x address=%#llx flags=%#04x vlfags=%#x]\n",
 			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
-			address, flags);
+			address, flags, vflags);
 		break;
+	}
 	case EVENT_TYPE_PAGE_TAB_ERR:
-		dev_err(dev, "Event logged [PAGE_TAB_HARDWARE_ERROR device=%04x:%02x:%02x.%x pasid=0x%04x address=0x%llx flags=0x%04x]\n",
+	{
+		u8 vflags = (event[0] >> 27) & 0x1F;
+
+		dev_err(dev, "Event logged [PAGE_TAB_HARDWARE_ERROR device=%04x:%02x:%02x.%x pasid=0x%04x address=0x%llx flags=0x%04x vflags=%#x]\n",
 			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
-			pasid, address, flags);
+			pasid, address, flags, vflags);
 		break;
+	}
 	case EVENT_TYPE_ILL_CMD:
 		dev_err(dev, "Event logged [ILLEGAL_COMMAND_ERROR address=0x%llx]\n", address);
 		dump_command(address);
@@ -671,6 +684,25 @@ static void iommu_print_event(struct amd_iommu *iommu, void *__evt)
 			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
 			pasid, address, flags, tag);
 		break;
+	case EVENT_TYPE_GUEST_EVENT_FAULT:
+	{
+		u8 gid = event[1] & 0xFFFF;
+		u8 vflags = (event[0] >> 27) & 0x1F;
+
+		dev_err(dev, "Event logged [GUEST_EVENT_FAULT gid=#%x flags=0x%04x vflags=%#x]\n",
+			gid, flags, vflags);
+		break;
+	}
+	case EVENT_TYPE_VIOMMU_HW_ERR:
+	{
+		u16 gid = event[0] & 0xFFFF;
+		u8 src = (event[0] >> 16) & 0x3;
+		u8 vflags = (event[0] >> 27) & 0x1F;
+
+		dev_err(dev, "Event logged [VIOMMU_HW_ERR gid=%#x address=%#llx src=%#x flags=%#x vflags=%#x]\n",
+			gid, address, src, flags, vflags);
+		break;
+	}
 	default:
 		dev_err(dev, "Event logged [UNKNOWN event[0]=0x%08x event[1]=0x%08x event[2]=0x%08x event[3]=0x%08x\n",
 			event[0], event[1], event[2], event[3]);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 11/21] iommu/amd: Introduce Reset vMMIO Command
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (9 preceding siblings ...)
  2023-06-21 23:54 ` [RFC PATCH 10/21] iommu/amd: Introduce vIOMMU-specific events and event info Suravee Suthikulpanit
@ 2023-06-21 23:54 ` Suravee Suthikulpanit
  2023-06-21 23:54 ` [RFC PATCH 12/21] iommu/amd: Introduce AMD vIOMMU-specific UAPI Suravee Suthikulpanit
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:54 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

Introduce new IOMMU commands for vIOMMU for resetting
virtualized MMIO registers of a particular guest.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu_types.h |  1 +
 drivers/iommu/amd/iommu.c           | 22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index d832e0c36a21..aa16a7079b5c 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -200,6 +200,7 @@
 #define CMD_INV_IRT		0x05
 #define CMD_COMPLETE_PPR	0x07
 #define CMD_INV_ALL		0x08
+#define CMD_RESET_VMMIO		0x0A
 
 #define CMD_COMPL_WAIT_STORE_MASK	0x01
 #define CMD_COMPL_WAIT_INT_MASK		0x02
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index efced59ba8a5..b5c62bc8249c 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1133,6 +1133,18 @@ static void build_inv_irt(struct iommu_cmd *cmd, u16 devid)
 	CMD_SET_TYPE(cmd, CMD_INV_IRT);
 }
 
+static void build_reset_vmmio(struct iommu_cmd *cmd, u16 guestId,
+			      bool vcmd, bool all)
+{
+	memset(cmd, 0, sizeof(*cmd));
+	cmd->data[0] = guestId;
+	if (all)
+		cmd->data[0] |= (1 << 28);
+	if (vcmd)
+		cmd->data[0] |= (1 << 31);
+	CMD_SET_TYPE(cmd, CMD_RESET_VMMIO);
+}
+
 /*
  * Writes the command to the IOMMUs command buffer and informs the
  * hardware about the new command.
@@ -1315,6 +1327,16 @@ void iommu_flush_all_caches(struct amd_iommu *iommu)
 	}
 }
 
+void iommu_reset_vmmio(struct amd_iommu *iommu, u16 guestId)
+{
+	struct iommu_cmd cmd;
+
+	build_reset_vmmio(&cmd, guestId, 1, 1);
+
+	iommu_queue_command(iommu, &cmd);
+	iommu_completion_wait(iommu);
+}
+
 /*
  * Command send function for flushing on-device TLB
  */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 12/21] iommu/amd: Introduce AMD vIOMMU-specific UAPI
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (10 preceding siblings ...)
  2023-06-21 23:54 ` [RFC PATCH 11/21] iommu/amd: Introduce Reset vMMIO Command Suravee Suthikulpanit
@ 2023-06-21 23:54 ` Suravee Suthikulpanit
  2023-06-21 23:55 ` [RFC PATCH 13/21] iommu/amd: Introduce vIOMMU command-line option Suravee Suthikulpanit
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:54 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

To handle various operations necessary for setting up the vIOMMU hardware.
These operations are specific to AMD hardware.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 include/uapi/linux/amd_viommu.h | 145 ++++++++++++++++++++++++++++++++
 1 file changed, 145 insertions(+)
 create mode 100644 include/uapi/linux/amd_viommu.h

diff --git a/include/uapi/linux/amd_viommu.h b/include/uapi/linux/amd_viommu.h
new file mode 100644
index 000000000000..f4a91ecd5dc2
--- /dev/null
+++ b/include/uapi/linux/amd_viommu.h
@@ -0,0 +1,145 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * AMD Hardwaer Accelerated Virtualized IOMMU (HW-vIOMMU)
+ *
+ * Copyright (c) 2023, Advanced Micro Devices, Inc.
+ *
+ */
+#ifndef _UAPI_AMD_VIOMMU_H_
+#define _UAPI_AMD_VIOMMU_H_
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+/**
+ * The ioctl interfaces in this file are specific for AMD HW-vIOMMU.
+ * They are an extension of extend the IOMMUFD ioctl interfaces.
+ * Please see include/uapi/linux/iommufd.h for more detail.
+ */
+#include <linux/iommufd.h>
+
+enum iommufd_viommu_cmd {
+	IOMMUFD_VIOMMU_CMD_BASE = 0x60,
+	IOMMUFD_CMD_IOMMU_INIT = IOMMUFD_VIOMMU_CMD_BASE,
+	IOMMUFD_CMD_IOMMU_DESTROY,
+	IOMMUFD_CMD_DEVICE_ATTACH,
+	IOMMUFD_CMD_DEVICE_DETACH,
+	IOMMUFD_CMD_DOMAIN_ATTACH,
+	IOMMUFD_CMD_DOMAIN_DETACH,
+	IOMMUFD_CMD_MMIO_ACCESS,
+	IOMMUFD_CMD_CMDBUF_UPDATE,
+};
+
+/**
+ * struct amd_viommu_iommu_info - ioctl(VIOMMU_IOMMU_[INIT|DESTROY])
+ * @size: sizeof(struct amd_viommu_iommu_info)
+ * @iommu_id: PCI device ID of the AMD IOMMU instance
+ * @gid: guest ID
+ *
+ * Initialize and destroy AMD HW-vIOMMU instances for the specified
+ * guest ID.
+ */
+struct amd_viommu_iommu_info {
+	__u32	size;
+	__u32	iommu_id;
+	__u32	gid;
+};
+#define VIOMMU_IOMMU_INIT	_IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOMMU_INIT)
+#define VIOMMU_IOMMU_DESTROY	_IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOMMU_DESTROY)
+
+/**
+ * struct amd_viommu_dev_info - ioctl(VIOMMU_DEVICE_[ATTACH|DETACH])
+ * @size: sizeof(struct amd_viommu_dev_info)
+ * @iommu_id: PCI device ID of the AMD IOMMU instance
+ * @gid: guest ID
+ * @hdev_id: host PCI device ID
+ * @gdev_id: guest PCI device ID
+ * @queue_id: guest PCI device queue ID
+ *
+ * Attach / Detach PCI device to a HW-vIOMMU instance, and program
+ * the IOMMU Device ID mapping table for the specified guest.
+ */
+struct amd_viommu_dev_info {
+	__u32	size;
+	__u32	iommu_id;
+	__u32	gid;
+	__u16	hdev_id;
+	__u16	gdev_id;
+	__u16	queue_id;
+};
+
+#define VIOMMU_DEVICE_ATTACH	_IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_ATTACH)
+#define VIOMMU_DEVICE_DETACH	_IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_DETACH)
+
+/**
+ * struct amd_viommu_dom_info - ioctl(VIOMMU_DOMAIN_[ATTACH|DETACH])
+ * @size: sizeof(struct amd_viommu_dom_info)
+ * @iommu_id: PCI device ID of the AMD IOMMU instance
+ * @gid: guest ID
+ * @hdev_id: host PCI device ID
+ * @gdev_id: guest PCI device ID
+ * @gdom_id: guest domain ID
+ *
+ * Attach / Detach domain of a PCI device to a HW-vIOMMU instance, and program
+ * the IOMMU Domain ID mapping table for the specified guest.
+ */
+struct amd_viommu_dom_info {
+	__u32	size;
+	__u32	iommu_id;
+	__u32	gid;
+	__u16	gdev_id;
+	__u16	gdom_id;
+};
+
+#define VIOMMU_DOMAIN_ATTACH	_IO(IOMMUFD_TYPE, IOMMUFD_CMD_DOMAIN_ATTACH)
+#define VIOMMU_DOMAIN_DETACH	_IO(IOMMUFD_TYPE, IOMMUFD_CMD_DOMAIN_DETACH)
+
+/**
+ * struct amd_viommu_mmio_data- ioctl(VIOMMU_MMIO_ACCESS)
+ * @size: sizeof(struct amd_viommu_mmio_data)
+ * @iommu_id: PCI device ID of the AMD IOMMU instance
+ * @gid: guest ID
+ * @offset: specify MMIO offset
+ * @value: specify MMIO write value or retrieving MMIO read value
+ * @mmio_size: specify MMIO size
+ * @is_write: specify MMIO read (0) / write (1)
+ *
+ * - Trap guest IOMMU MMIO write to program HW-vIOMMU for the specified
+ *   guest.
+ * - Trap guest IOMMU MMIO read to emulate return value for the specified
+ *   guest.
+ */
+struct amd_viommu_mmio_data {
+	__u32	size;
+	__u32	iommu_id;
+	__u32	gid;
+	__u32	offset;
+	__u64	value;
+	__u32	mmio_size;
+	__u8	is_write;
+};
+
+#define VIOMMU_MMIO_ACCESS	_IO(IOMMUFD_TYPE, IOMMUFD_CMD_MMIO_ACCESS)
+
+/**
+ * struct amd_viommu_cmdbuf_data - ioctl(VIOMMU_CMDBUF_UPDATE)
+ * @size: sizeof(struct amd_viommu_cmdbuf_data)
+ * @iommu_id: PCI device ID of the AMD IOMMU instance
+ * @gid: guest ID
+ * @gcmdbuf_size: guest command buffer size
+ * @hva: host virtual address for the guest command buffer
+ *
+ * Trap guest command buffer initialization to setup HW-vIOMMU command buffer
+ * for the specified guest.
+ */
+struct amd_viommu_cmdbuf_data {
+	__u32	size;
+	__u32	iommu_id;
+	__u32	gid;
+	__u32	cmdbuf_size;
+	__u64	hva;
+};
+
+#define VIOMMU_CMDBUF_UPDATE	_IO(IOMMUFD_TYPE, IOMMUFD_CMD_CMDBUF_UPDATE)
+
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 13/21] iommu/amd: Introduce vIOMMU command-line option
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (11 preceding siblings ...)
  2023-06-21 23:54 ` [RFC PATCH 12/21] iommu/amd: Introduce AMD vIOMMU-specific UAPI Suravee Suthikulpanit
@ 2023-06-21 23:55 ` Suravee Suthikulpanit
  2023-06-21 23:55 ` [RFC PATCH 14/21] iommu/amd: Initialize vIOMMU private address space regions Suravee Suthikulpanit
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:55 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

To disable vIOMMU feature, specify option "amd_iommu=viommu_disable".

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu.h       |  2 ++
 drivers/iommu/amd/amd_iommu_types.h |  1 +
 drivers/iommu/amd/init.c            | 10 ++++++++++
 3 files changed, 13 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index c9dfa4734801..a65d22384ab8 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -46,6 +46,8 @@ void amd_iommu_debugfs_setup(struct amd_iommu *iommu);
 static inline void amd_iommu_debugfs_setup(struct amd_iommu *iommu) {}
 #endif
 
+extern bool amd_iommu_viommu;
+
 /* Needed for interrupt remapping */
 extern int amd_iommu_prepare(void);
 extern int amd_iommu_enable(void);
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index aa16a7079b5c..019a9182df87 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -96,6 +96,7 @@
 #define FEATURE_GAM_VAPIC	(1ULL<<21)
 #define FEATURE_GIOSUP		(1ULL<<48)
 #define FEATURE_EPHSUP		(1ULL<<50)
+#define FEATURE_VIOMMU		(1ULL<<55)
 #define FEATURE_SNP		(1ULL<<63)
 
 #define FEATURE_GATS_5LEVEL	1ULL
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 6a045a187971..4dd9f09e16c4 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -194,6 +194,9 @@ bool amdr_ivrs_remap_support __read_mostly;
 
 bool amd_iommu_force_isolation __read_mostly;
 
+/* VIOMMU enabling flag */
+bool amd_iommu_viommu = true;
+
 /*
  * AMD IOMMU allows up to 2^16 different protection domains. This is a bitmap
  * to know which ones are already in use.
@@ -2154,6 +2157,9 @@ static void print_iommu_info(void)
 			if (iommu->features & FEATURE_SNP)
 				pr_cont(" SNP");
 
+			if (iommu->features & FEATURE_VIOMMU)
+				pr_cont(" vIOMMU");
+
 			pr_cont("\n");
 		}
 	}
@@ -2166,6 +2172,8 @@ static void print_iommu_info(void)
 		pr_info("V2 page table enabled (Paging mode : %d level)\n",
 			amd_iommu_gpt_level);
 	}
+	if (amd_iommu_viommu)
+		pr_info("AMD-Vi: vIOMMU enabled\n");
 }
 
 static int __init amd_iommu_init_pci(void)
@@ -3402,6 +3410,8 @@ static int __init parse_amd_iommu_options(char *str)
 			amd_iommu_pgtable = AMD_IOMMU_V1;
 		} else if (strncmp(str, "pgtbl_v2", 8) == 0) {
 			amd_iommu_pgtable = AMD_IOMMU_V2;
+		} else if (strncmp(str, "viommu_disable", 14) == 0) {
+			amd_iommu_viommu = false;
 		} else {
 			pr_notice("Unknown option - '%s'\n", str);
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 14/21] iommu/amd: Initialize vIOMMU private address space regions
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (12 preceding siblings ...)
  2023-06-21 23:55 ` [RFC PATCH 13/21] iommu/amd: Introduce vIOMMU command-line option Suravee Suthikulpanit
@ 2023-06-21 23:55 ` Suravee Suthikulpanit
  2023-06-21 23:55 ` [RFC PATCH 15/21] iommu/amd: Introduce vIOMMU vminit and vmdestroy ioctl Suravee Suthikulpanit
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:55 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

Initialing vIOMMU private address space regions includes parsing
PCI vendor-specific capability (VSC), and use information to
setup vIOMMU private address space regions.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/Makefile          |   2 +-
 drivers/iommu/amd/amd_iommu_types.h |  40 +++++
 drivers/iommu/amd/amd_viommu.h      |  57 +++++++
 drivers/iommu/amd/init.c            |   3 +
 drivers/iommu/amd/viommu.c          | 227 ++++++++++++++++++++++++++++
 5 files changed, 328 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/amd/amd_viommu.h
 create mode 100644 drivers/iommu/amd/viommu.c

diff --git a/drivers/iommu/amd/Makefile b/drivers/iommu/amd/Makefile
index 773d8aa00283..89c045716448 100644
--- a/drivers/iommu/amd/Makefile
+++ b/drivers/iommu/amd/Makefile
@@ -1,4 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_AMD_IOMMU) += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o
+obj-$(CONFIG_AMD_IOMMU) += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o viommu.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += iommu_v2.o
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 019a9182df87..5cb5a709b31b 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -34,6 +34,17 @@
 #define MMIO_RANGE_OFFSET	0x0c
 #define MMIO_MISC_OFFSET	0x10
 
+/* vIOMMU Capability offsets (from IOMMU Capability Header) */
+#define MMIO_VSC_HDR_OFFSET		0x00
+#define MMIO_VSC_INFO_OFFSET		0x00
+#define MMIO_VSC_VF_BAR_LO_OFFSET	0x08
+#define MMIO_VSC_VF_BAR_HI_OFFSET	0x0c
+#define MMIO_VSC_VF_CNTL_BAR_LO_OFFSET	0x10
+#define MMIO_VSC_VF_CNTL_BAR_HI_OFFSET	0x14
+
+#define IOMMU_VSC_INFO_REV(x)	((x >> 16) & 0xFF)
+#define IOMMU_VSC_INFO_ID(x)	(x & 0xFFFF)
+
 /* Masks, shifts and macros to parse the device range capability */
 #define MMIO_RANGE_LD_MASK	0xff000000
 #define MMIO_RANGE_FD_MASK	0x00ff0000
@@ -61,12 +72,15 @@
 #define MMIO_PPR_LOG_OFFSET	0x0038
 #define MMIO_GA_LOG_BASE_OFFSET	0x00e0
 #define MMIO_GA_LOG_TAIL_OFFSET	0x00e8
+#define MMIO_PPRB_LOG_OFFSET	0x00f0
+#define MMIO_EVTB_LOG_OFFSET	0x00f8
 #define MMIO_MSI_ADDR_LO_OFFSET	0x015C
 #define MMIO_MSI_ADDR_HI_OFFSET	0x0160
 #define MMIO_MSI_DATA_OFFSET	0x0164
 #define MMIO_INTCAPXT_EVT_OFFSET	0x0170
 #define MMIO_INTCAPXT_PPR_OFFSET	0x0178
 #define MMIO_INTCAPXT_GALOG_OFFSET	0x0180
+#define MMIO_VIOMMU_STATUS_OFFSET	0x0190
 #define MMIO_EXT_FEATURES2	0x01A0
 #define MMIO_CMD_HEAD_OFFSET	0x2000
 #define MMIO_CMD_TAIL_OFFSET	0x2008
@@ -180,8 +194,16 @@
 #define CONTROL_GAM_EN		25
 #define CONTROL_GALOG_EN	28
 #define CONTROL_GAINT_EN	29
+#define CONTROL_DUALPPRLOG_EN   30
+#define CONTROL_DUALEVTLOG_EN   32
+
+#define CONTROL_PPR_AUTO_RSP_EN 39
+#define CONTROL_BLKSTOPMRK_EN   41
+#define CONTROL_PPR_AUTO_RSP_AON 48
 #define CONTROL_XT_EN		50
 #define CONTROL_INTCAPXT_EN	51
+#define CONTROL_VCMD_EN         52
+#define CONTROL_VIOMMU_EN       53
 #define CONTROL_SNPAVIC_EN	61
 
 #define CTRL_INV_TO_MASK	(7 << CONTROL_INV_TIMEOUT)
@@ -414,6 +436,13 @@
 
 #define DTE_GPT_LEVEL_SHIFT	54
 
+/* vIOMMU bit fields */
+#define DTE_VIOMMU_EN_SHIFT		15
+#define DTE_VIOMMU_GUESTID_SHIFT	16
+#define DTE_VIOMMU_GUESTID_MASK		0xFFFF
+#define DTE_VIOMMU_GDEVICEID_SHIFT	32
+#define DTE_VIOMMU_GUESTID_MASK		0xFFFF
+
 #define GCR3_VALID		0x01ULL
 
 #define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
@@ -694,6 +723,17 @@ struct amd_iommu {
 	 */
 	u16 cap_ptr;
 
+	/* Vendor-Specific Capability (VSC) pointer. */
+	u16 vsc_offset;
+
+	/* virtual addresses of vIOMMU VF/VF_CNTL BAR */
+	u8 __iomem *vf_base;
+	u8 __iomem *vfctrl_base;
+
+	struct protection_domain *viommu_pdom;
+	void *guest_mmio;
+	void *cmdbuf_dirty_mask;
+
 	/* pci domain of this IOMMU */
 	struct amd_iommu_pci_seg *pci_seg;
 
diff --git a/drivers/iommu/amd/amd_viommu.h b/drivers/iommu/amd/amd_viommu.h
new file mode 100644
index 000000000000..c1dbc2e37eab
--- /dev/null
+++ b/drivers/iommu/amd/amd_viommu.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 Advanced Micro Devices, Inc.
+ * Author: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
+ */
+
+#ifndef AMD_VIOMMU_H
+#define AMD_VIOMMU_H
+
+#define VIOMMU_MAX_GUESTID	(1 << 16)
+
+#define VIOMMU_VF_MMIO_ENTRY_SIZE		4096
+#define VIOMMU_VFCTRL_MMIO_ENTRY_SIZE		64
+
+#define VIOMMU_VFCTRL_GUEST_DID_MAP_CONTROL0_OFFSET	0x00
+#define VIOMMU_VFCTRL_GUEST_DID_MAP_CONTROL1_OFFSET	0x08
+#define VIOMMU_VFCTRL_GUEST_MISC_CONTROL_OFFSET		0x10
+
+#define VIOMMU_VFCTRL_GUEST_CMD_CONTROL_OFFSET	0x20
+#define VIOMMU_VFCTRL_GUEST_EVT_CONTROL_OFFSET	0x28
+#define VIOMMU_VFCTRL_GUEST_PPR_CONTROL_OFFSET	0x30
+
+#define VIOMMU_VF_MMIO_BASE(iommu, guestId) \
+	(iommu->vf_base + (guestId * VIOMMU_VF_MMIO_ENTRY_SIZE))
+#define VIOMMU_VFCTRL_MMIO_BASE(iommu, guestId) \
+	(iommu->vfctrl_base + (guestId * VIOMMU_VFCTRL_MMIO_ENTRY_SIZE))
+
+#define VIOMMU_GUEST_MMIO_BASE		0
+#define VIOMMU_GUEST_MMIO_SIZE		(64 * VIOMMU_MAX_GUESTID)
+
+#define VIOMMU_CMDBUF_DIRTY_STATUS_BASE	0x400000ULL
+#define VIOMMU_CMDBUF_DIRTY_STATUS_SIZE	0x2000
+
+#define VIOMMU_DEVID_MAPPING_BASE	0x1000000000ULL
+#define VIOMMU_DEVID_MAPPING_ENTRY_SIZE	(1 << 20)
+
+#define VIOMMU_DOMID_MAPPING_BASE	0x2000000000ULL
+#define VIOMMU_DOMID_MAPPING_ENTRY_SIZE	(1 << 19)
+
+#define VIOMMU_GUEST_CMDBUF_BASE	0x2800000000ULL
+#define VIOMMU_GUEST_CMDBUF_SIZE	(1 << 19)
+
+#define VIOMMU_GUEST_PPR_LOG_BASE	0x3000000000ULL
+#define VIOMMU_GUEST_PPR_LOG_SIZE	(1 << 19)
+
+#define VIOMMU_GUEST_PPR_B_LOG_BASE	0x3800000000ULL
+#define VIOMMU_GUEST_PPR_B_LOG_SIZE	(1 << 19)
+
+#define VIOMMU_GUEST_EVT_LOG_BASE	0x4000000000ULL
+#define VIOMMU_GUEST_EVT_LOG_SIZE	(1 << 19)
+
+#define VIOMMU_GUEST_EVT_B_LOG_BASE	0x4800000000ULL
+#define VIOMMU_GUEST_EVT_B_LOG_SIZE	(1 << 19)
+
+extern int iommu_init_viommu(struct amd_iommu *iommu);
+
+#endif /* AMD_VIOMMU_H */
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 4dd9f09e16c4..48aa71fe76dc 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -34,6 +34,7 @@
 #include <linux/crash_dump.h>
 
 #include "amd_iommu.h"
+#include "amd_viommu.h"
 #include "../irq_remapping.h"
 
 /*
@@ -2068,6 +2069,8 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
 	if (iommu_feature(iommu, FEATURE_PPR) && alloc_ppr_log(iommu))
 		return -ENOMEM;
 
+	iommu_init_viommu(iommu);
+
 	if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) {
 		pr_info("Using strict mode due to virtualization\n");
 		iommu_set_dma_strict();
diff --git a/drivers/iommu/amd/viommu.c b/drivers/iommu/amd/viommu.c
new file mode 100644
index 000000000000..18036d03c747
--- /dev/null
+++ b/drivers/iommu/amd/viommu.c
@@ -0,0 +1,227 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 Advanced Micro Devices, Inc.
+ * Author: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
+ */
+
+#define pr_fmt(fmt)     "AMD-Vi: " fmt
+#define dev_fmt(fmt)    pr_fmt(fmt)
+
+#include <linux/iommu.h>
+#include <linux/amd-iommu.h>
+
+#include <linux/fs.h>
+#include <linux/cdev.h>
+#include <linux/ioctl.h>
+#include <linux/iommufd.h>
+#include <linux/mem_encrypt.h>
+#include <uapi/linux/amd_viommu.h>
+
+#include <asm/iommu.h>
+#include <asm/set_memory.h>
+
+#include "amd_iommu.h"
+#include "amd_iommu_types.h"
+#include "amd_viommu.h"
+
+#define GET_CTRL_BITS(reg, bit, msk)	(((reg) >> (bit)) & (ULL(msk)))
+#define SET_CTRL_BITS(reg, bit1, bit2, msk) \
+	((((reg) >> (bit1)) & (ULL(msk))) << (bit2))
+
+LIST_HEAD(viommu_devid_map);
+
+struct amd_iommu *get_amd_iommu_from_devid(u16 devid)
+{
+	struct amd_iommu *iommu;
+
+	for_each_iommu(iommu)
+		if (iommu->devid == devid)
+			return iommu;
+	return NULL;
+}
+
+static void viommu_enable(struct amd_iommu *iommu)
+{
+	if (!amd_iommu_viommu)
+		return;
+	iommu_feature_enable(iommu, CONTROL_VCMD_EN);
+	iommu_feature_enable(iommu, CONTROL_VIOMMU_EN);
+}
+
+static int viommu_init_pci_vsc(struct amd_iommu *iommu)
+{
+	iommu->vsc_offset = pci_find_capability(iommu->dev, PCI_CAP_ID_VNDR);
+	if (!iommu->vsc_offset)
+		return -ENODEV;
+
+	DUMP_printk("device:%s, vsc offset:%04x\n",
+		    pci_name(iommu->dev), iommu->vsc_offset);
+	return 0;
+}
+
+static int __init viommu_vf_vfcntl_init(struct amd_iommu *iommu)
+{
+	u32 lo, hi;
+	u64 vf_phys, vf_cntl_phys;
+
+	/* Setting up VF and VF_CNTL MMIOs */
+	pci_read_config_dword(iommu->dev, iommu->vsc_offset + MMIO_VSC_VF_BAR_LO_OFFSET, &lo);
+	pci_read_config_dword(iommu->dev, iommu->vsc_offset + MMIO_VSC_VF_BAR_HI_OFFSET, &hi);
+	vf_phys = hi;
+	vf_phys = (vf_phys << 32) | lo;
+	if (!(vf_phys & 1)) {
+		pr_err(FW_BUG "vf_phys disabled\n");
+		return -EINVAL;
+	}
+
+	pci_read_config_dword(iommu->dev, iommu->vsc_offset + MMIO_VSC_VF_CNTL_BAR_LO_OFFSET, &lo);
+	pci_read_config_dword(iommu->dev, iommu->vsc_offset + MMIO_VSC_VF_CNTL_BAR_HI_OFFSET, &hi);
+	vf_cntl_phys = hi;
+	vf_cntl_phys = (vf_cntl_phys << 32) | lo;
+	if (!(vf_cntl_phys & 1)) {
+		pr_err(FW_BUG "vf_cntl_phys disabled\n");
+		return -EINVAL;
+	}
+
+	if (!vf_phys || !vf_cntl_phys) {
+		pr_err(FW_BUG "AMD-Vi: Unassigned VF resources.\n");
+		return -ENOMEM;
+	}
+
+	/* Mapping 256MB of VF and 4MB of VF_CNTL BARs */
+	vf_phys &= ~1ULL;
+	iommu->vf_base = iommu_map_mmio_space(vf_phys, 0x10000000);
+	if (!iommu->vf_base) {
+		pr_err("Can't reserve vf_base\n");
+		return -ENOMEM;
+	}
+
+	vf_cntl_phys &= ~1ULL;
+	iommu->vfctrl_base = iommu_map_mmio_space(vf_cntl_phys, 0x400000);
+
+	if (!iommu->vfctrl_base) {
+		pr_err("Can't reserve vfctrl_base\n");
+		return -ENOMEM;
+	}
+
+	pr_debug("%s: IOMMU device:%s, vf_base:%#llx, vfctrl_base:%#llx\n",
+		 __func__, pci_name(iommu->dev), vf_phys, vf_cntl_phys);
+	return 0;
+}
+
+static void *alloc_private_region(struct amd_iommu *iommu,
+				  u64 base, size_t size)
+{
+	int ret;
+	void *region;
+
+	region  = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+						get_order(size));
+	if (!region)
+		return NULL;
+
+	ret = set_memory_uc((unsigned long)region, size >> PAGE_SHIFT);
+	if (ret)
+		goto err_out;
+
+	if (amd_iommu_v1_map_pages(&iommu->viommu_pdom->iop.iop.ops, base,
+				   iommu_virt_to_phys(region), PAGE_SIZE, (size / PAGE_SIZE),
+				   IOMMU_PROT_IR | IOMMU_PROT_IW, GFP_KERNEL, NULL))
+		goto err_out;
+
+	pr_debug("%s: base=%#llx, size=%#lx\n", __func__, base, size);
+
+	return region;
+
+err_out:
+	free_pages((unsigned long)region, get_order(size));
+	return NULL;
+}
+
+static int viommu_private_space_init(struct amd_iommu *iommu)
+{
+	u64 pte_root = 0;
+	struct iommu_domain *dom;
+	struct protection_domain *pdom;
+
+	/*
+	 * Setup page table root pointer, Guest MMIO and
+	 * Cmdbuf Dirty Status regions.
+	 */
+	dom = amd_iommu_domain_alloc(IOMMU_DOMAIN_UNMANAGED);
+	if (!dom)
+		goto err_out;
+
+	pdom = to_pdomain(dom);
+	iommu->viommu_pdom = pdom;
+	set_dte_entry(iommu, iommu->devid, pdom, NULL, pdom->gcr3_tbl,
+		      false, false);
+
+	iommu->guest_mmio = alloc_private_region(iommu,
+						 VIOMMU_GUEST_MMIO_BASE,
+						 VIOMMU_GUEST_MMIO_SIZE);
+	if (!iommu->guest_mmio)
+		goto err_out;
+
+	iommu->cmdbuf_dirty_mask = alloc_private_region(iommu,
+							VIOMMU_CMDBUF_DIRTY_STATUS_BASE,
+							VIOMMU_CMDBUF_DIRTY_STATUS_SIZE);
+	if (!iommu->cmdbuf_dirty_mask)
+		goto err_out;
+
+	pte_root = iommu_virt_to_phys(pdom->iop.root);
+	pr_debug("%s: devid=%#x, pte_root=%#llx(%#llx), guest_mmio=%#llx(%#llx), cmdbuf_dirty_mask=%#llx(%#llx)\n",
+		 __func__, iommu->devid, (unsigned long long)pdom->iop.root, pte_root,
+		 (unsigned long long)iommu->guest_mmio, iommu_virt_to_phys(iommu->guest_mmio),
+		 (unsigned long long)iommu->cmdbuf_dirty_mask,
+		 iommu_virt_to_phys(iommu->cmdbuf_dirty_mask));
+
+	return 0;
+err_out:
+	if (iommu->guest_mmio)
+		free_pages((unsigned long)iommu->guest_mmio, get_order(VIOMMU_GUEST_MMIO_SIZE));
+
+	if (dom)
+		amd_iommu_domain_free(dom);
+	return -ENOMEM;
+}
+
+/*
+ * When IOMMU Virtualization is enabled, host software must:
+ *	- allocate system memory for IOMMU private space
+ *	- program IOMMU as an I/O device in Device Table
+ *	- maintain the I/O page table for IOMMU private addressing to SPA translations.
+ *	- specify the base address of the IOMMU Virtual Function MMIO and
+ *	  IOMMU Virtual Function Control MMIO region.
+ *	- enable Guest Virtual APIC enable (MMIO Offset 0x18[GAEn]).
+ */
+int __init iommu_init_viommu(struct amd_iommu *iommu)
+{
+	int ret = -EINVAL;
+
+	if (!amd_iommu_viommu)
+		return 0;
+
+	if (!iommu_feature(iommu, FEATURE_VIOMMU))
+		goto err_out;
+
+	ret = viommu_init_pci_vsc(iommu);
+	if (ret)
+		goto err_out;
+
+	ret = viommu_vf_vfcntl_init(iommu);
+	if (ret)
+		goto err_out;
+
+	ret = viommu_private_space_init(iommu);
+	if (ret)
+		goto err_out;
+
+	viommu_enable(iommu);
+
+	return ret;
+
+err_out:
+	amd_iommu_viommu = false;
+	return ret;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 15/21] iommu/amd: Introduce vIOMMU vminit and vmdestroy ioctl
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (13 preceding siblings ...)
  2023-06-21 23:55 ` [RFC PATCH 14/21] iommu/amd: Initialize vIOMMU private address space regions Suravee Suthikulpanit
@ 2023-06-21 23:55 ` Suravee Suthikulpanit
  2023-06-21 23:55 ` [RFC PATCH 16/21] iommu/amd: Introduce vIOMMU ioctl for updating device mapping table Suravee Suthikulpanit
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:55 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

These ioctl interfaces are called when QEMU initialize and destroy VMs.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu.h |   2 +
 drivers/iommu/amd/iommu.c     |   4 +-
 drivers/iommu/amd/viommu.c    | 294 ++++++++++++++++++++++++++++++++++
 3 files changed, 298 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index a65d22384ab8..fccae07e8c9f 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -85,6 +85,8 @@ extern int amd_iommu_flush_tlb(struct iommu_domain *dom, u32 pasid);
 extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, u32 pasid,
 				     unsigned long cr3);
 extern int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, u32 pasid);
+extern void amd_iommu_iotlb_sync(struct iommu_domain *domain,
+				 struct iommu_iotlb_gather *gather);
 
 extern void amd_iommu_build_efr(u64 *efr, u64 *efr2);
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b5c62bc8249c..f22b2a5a8bfc 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2447,8 +2447,8 @@ static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain)
 	spin_unlock_irqrestore(&dom->lock, flags);
 }
 
-static void amd_iommu_iotlb_sync(struct iommu_domain *domain,
-				 struct iommu_iotlb_gather *gather)
+void amd_iommu_iotlb_sync(struct iommu_domain *domain,
+			  struct iommu_iotlb_gather *gather)
 {
 	struct protection_domain *dom = to_pdomain(domain);
 	unsigned long flags;
diff --git a/drivers/iommu/amd/viommu.c b/drivers/iommu/amd/viommu.c
index 18036d03c747..2bafa5102ffa 100644
--- a/drivers/iommu/amd/viommu.c
+++ b/drivers/iommu/amd/viommu.c
@@ -12,6 +12,7 @@
 
 #include <linux/fs.h>
 #include <linux/cdev.h>
+#include <linux/hashtable.h>
 #include <linux/ioctl.h>
 #include <linux/iommufd.h>
 #include <linux/mem_encrypt.h>
@@ -28,8 +29,25 @@
 #define SET_CTRL_BITS(reg, bit1, bit2, msk) \
 	((((reg) >> (bit1)) & (ULL(msk))) << (bit2))
 
+#define VIOMMU_MAX_GDEVID	0xFFFF
+#define VIOMMU_MAX_GDOMID	0xFFFF
+
+#define VIOMMU_GID_HASH_BITS	16
+static DEFINE_HASHTABLE(viommu_gid_hash, VIOMMU_GID_HASH_BITS);
+static DEFINE_SPINLOCK(viommu_gid_hash_lock);
+static u32 viommu_next_gid;
+static bool next_viommu_gid_wrapped;
+
 LIST_HEAD(viommu_devid_map);
 
+struct amd_iommu_vminfo {
+	u16 gid;
+	bool init;
+	struct hlist_node hnode;
+	u64 *devid_table;
+	u64 *domid_table;
+};
+
 struct amd_iommu *get_amd_iommu_from_devid(u16 devid)
 {
 	struct amd_iommu *iommu;
@@ -138,6 +156,50 @@ static void *alloc_private_region(struct amd_iommu *iommu,
 	return NULL;
 }
 
+static int alloc_private_vm_region(struct amd_iommu *iommu, u64 **entry,
+				   u64 base, size_t size, u16 guestId)
+{
+	int ret;
+	u64 addr = base + (guestId * size);
+
+	*entry = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, get_order(size));
+
+	ret = set_memory_uc((unsigned long)*entry, size >> PAGE_SHIFT);
+	if (ret)
+		return ret;
+
+	pr_debug("%s: entry=%#llx(%#llx), addr=%#llx\n", __func__,
+		 (unsigned long  long)*entry, iommu_virt_to_phys(*entry), addr);
+
+	ret = amd_iommu_v1_map_pages(&iommu->viommu_pdom->iop.iop.ops, addr,
+				     iommu_virt_to_phys(*entry), PAGE_SIZE, (size / PAGE_SIZE),
+				     IOMMU_PROT_IR | IOMMU_PROT_IW, GFP_KERNEL, NULL);
+
+	return ret;
+}
+
+static void free_private_vm_region(struct amd_iommu *iommu, u64 **entry,
+					u64 base, size_t size, u16 guestId)
+{
+	size_t ret;
+	struct iommu_iotlb_gather gather;
+	u64 addr = base + (guestId * size);
+
+	pr_debug("entry=%#llx(%#llx), addr=%#llx\n",
+		 (unsigned long  long)*entry,
+		 iommu_virt_to_phys(*entry), addr);
+
+	if (!iommu || iommu->viommu_pdom)
+		return;
+	ret = amd_iommu_v1_unmap_pages(&iommu->viommu_pdom->iop.iop.ops,
+				       addr, PAGE_SIZE, (size / PAGE_SIZE), &gather);
+	if (ret)
+		amd_iommu_iotlb_sync(&iommu->viommu_pdom->domain, &gather);
+
+	free_pages((unsigned long)*entry, get_order(size));
+	*entry = NULL;
+}
+
 static int viommu_private_space_init(struct amd_iommu *iommu)
 {
 	u64 pte_root = 0;
@@ -225,3 +287,235 @@ int __init iommu_init_viommu(struct amd_iommu *iommu)
 	amd_iommu_viommu = false;
 	return ret;
 }
+
+static void viommu_uninit_one(struct amd_iommu *iommu, struct amd_iommu_vminfo *vminfo, u16 guestId)
+{
+	free_private_vm_region(iommu, &vminfo->devid_table,
+			       VIOMMU_DEVID_MAPPING_BASE,
+			       VIOMMU_DEVID_MAPPING_ENTRY_SIZE,
+			       guestId);
+	free_private_vm_region(iommu, &vminfo->domid_table,
+			       VIOMMU_DOMID_MAPPING_BASE,
+			       VIOMMU_DOMID_MAPPING_ENTRY_SIZE,
+			       guestId);
+}
+
+/*
+ * Clear the DevID via VFCTRL registers
+ * This function will be called during VM destroy via VFIO.
+ */
+static void clear_device_mapping(struct amd_iommu *iommu, u16 hDevId, u16 guestId,
+				 u16 queueId, u16 gDevId)
+{
+	u64 val, tmp1, tmp2;
+	u8 __iomem *vfctrl;
+
+	/*
+	 * Clear the DevID in VFCTRL registers
+	 */
+	tmp1 = gDevId;
+	tmp1 = ((tmp1 & 0xFFFFULL) << 46);
+	tmp2 = hDevId;
+	tmp2 = ((tmp2 & 0xFFFFULL) << 14);
+	val = tmp1 | tmp2 | 0x8000000000000001ULL;
+	vfctrl = VIOMMU_VFCTRL_MMIO_BASE(iommu, guestId);
+	writeq(val, vfctrl + VIOMMU_VFCTRL_GUEST_DID_MAP_CONTROL0_OFFSET);
+}
+
+/*
+ * Clear the DomID via VFCTRL registers
+ * This function will be called during VM destroy via VFIO.
+ */
+static void clear_domain_mapping(struct amd_iommu *iommu, u16 hDomId, u16 guestId, u16 gDomId)
+{
+	u64 val, tmp1, tmp2;
+	u8 __iomem *vfctrl = VIOMMU_VFCTRL_MMIO_BASE(iommu, guestId);
+
+	tmp1 = gDomId;
+	tmp1 = ((tmp1 & 0xFFFFULL) << 46);
+	tmp2 = hDomId;
+	tmp2 = ((tmp2 & 0xFFFFULL) << 14);
+	val = tmp1 | tmp2 | 0x8000000000000001UL;
+	writeq(val, vfctrl + VIOMMU_VFCTRL_GUEST_DID_MAP_CONTROL1_OFFSET);
+}
+
+static void viommu_clear_mapping(struct amd_iommu *iommu, u16 guestId)
+{
+	int i;
+
+	for (i = 0; i <= VIOMMU_MAX_GDEVID; i++)
+		clear_device_mapping(iommu, 0, guestId, 0, i);
+
+	for (i = 0; i <= VIOMMU_MAX_GDOMID; i++)
+		clear_domain_mapping(iommu, 0, guestId, i);
+}
+
+static void viommu_clear_dirty_status_mask(struct amd_iommu *iommu, unsigned int gid)
+{
+	u32 offset, index, bits;
+	u64 *group, val;
+
+	if (gid >= 256 * 256)
+		return;
+
+	group = (u64 *)(iommu->cmdbuf_dirty_mask +
+		(((gid & 0xFF) << 4) | (((gid >> 13) & 0x7) << 2)));
+	offset = (gid >> 8) & 0x1F;
+	index = offset >> 6;
+	bits = offset & 0x3F;
+
+	val = READ_ONCE(group[index]);
+	val &= ~(1ULL << bits);
+	WRITE_ONCE(group[index], val);
+}
+
+/*
+ * Allocate pages for the following regions:
+ * - Guest MMIO
+ * - DeviceID/DomainId Mapping Table
+ * - Cmd buffer
+ * - Event/PRR (A/B) logs
+ */
+static int viommu_init_one(struct amd_iommu *iommu, struct amd_iommu_vminfo *vminfo)
+{
+	int ret;
+
+	ret = alloc_private_vm_region(iommu, &vminfo->devid_table,
+				      VIOMMU_DEVID_MAPPING_BASE,
+				      VIOMMU_DEVID_MAPPING_ENTRY_SIZE,
+				      vminfo->gid);
+	if (ret)
+		goto err_out;
+
+	ret = alloc_private_vm_region(iommu, &vminfo->domid_table,
+				      VIOMMU_DOMID_MAPPING_BASE,
+				      VIOMMU_DOMID_MAPPING_ENTRY_SIZE,
+				      vminfo->gid);
+	if (ret)
+		goto err_out;
+
+	viommu_clear_mapping(iommu, vminfo->gid);
+	viommu_clear_dirty_status_mask(iommu, vminfo->gid);
+
+	return 0;
+err_out:
+	viommu_uninit_one(iommu, vminfo, vminfo->gid);
+	return -ENOMEM;
+}
+
+int viommu_gid_alloc(struct amd_iommu *iommu, struct amd_iommu_vminfo *vminfo)
+{
+	u32 gid;
+	struct amd_iommu_vminfo *tmp;
+	unsigned long flags;
+
+	spin_lock_irqsave(&viommu_gid_hash_lock, flags);
+again:
+	gid = viommu_next_gid = (viommu_next_gid + 1) & 0xFFFF;
+
+	if (gid == 0) { /* id is 1-based, zero is not allowed */
+		next_viommu_gid_wrapped = 1;
+		goto again;
+	}
+	/* Is it still in use? Only possible if wrapped at least once */
+	if (next_viommu_gid_wrapped) {
+		hash_for_each_possible(viommu_gid_hash, tmp, hnode, gid) {
+			if (tmp->gid == gid)
+				goto again;
+		}
+	}
+
+	pr_debug("%s: gid=%u\n", __func__, gid);
+	vminfo->gid = gid;
+	hash_add(viommu_gid_hash, &vminfo->hnode, vminfo->gid);
+	spin_unlock_irqrestore(&viommu_gid_hash_lock, flags);
+	return 0;
+}
+
+static void viommu_gid_free(struct amd_iommu *iommu,
+			    struct amd_iommu_vminfo *vminfo)
+{
+	unsigned long flags;
+
+	pr_debug("%s: gid=%u\n", __func__, vminfo->gid);
+	spin_lock_irqsave(&viommu_gid_hash_lock, flags);
+	hash_del(&vminfo->hnode);
+	spin_unlock_irqrestore(&viommu_gid_hash_lock, flags);
+}
+
+struct amd_iommu_vminfo *get_vminfo(struct amd_iommu *iommu, int gid)
+{
+	unsigned long flags;
+	struct amd_iommu_vminfo *tmp, *ptr = NULL;
+
+	spin_lock_irqsave(&viommu_gid_hash_lock, flags);
+	hash_for_each_possible(viommu_gid_hash, tmp, hnode, gid) {
+		if (tmp->gid == gid) {
+			ptr = tmp;
+			break;
+		}
+	}
+	if (!ptr)
+		pr_debug("%s : gid=%u not found\n", __func__, gid);
+	spin_unlock_irqrestore(&viommu_gid_hash_lock, flags);
+	return ptr;
+}
+
+int amd_viommu_iommu_init(struct amd_viommu_iommu_info *data)
+{
+	int ret;
+	struct amd_iommu_vminfo *vminfo;
+	unsigned int iommu_id = data->iommu_id;
+	struct amd_iommu *iommu = get_amd_iommu_from_devid(iommu_id);
+
+	if (!iommu)
+		return -ENODEV;
+
+	vminfo = kzalloc(sizeof(*vminfo), GFP_KERNEL);
+	if (!vminfo)
+		return -ENOMEM;
+
+	ret = viommu_gid_alloc(iommu, vminfo);
+	if (ret)
+		goto err_out;
+
+	ret = viommu_init_one(iommu, vminfo);
+	if (ret)
+		goto err_out;
+
+	vminfo->init = true;
+	data->gid = vminfo->gid;
+	pr_debug("%s: iommu_id=%#x, gid=%#x\n", __func__,
+		pci_dev_id(iommu->dev), vminfo->gid);
+
+	return ret;
+
+err_out:
+	viommu_gid_free(iommu, vminfo);
+	kfree(vminfo);
+	return ret;
+}
+EXPORT_SYMBOL(amd_viommu_iommu_init);
+
+int amd_viommu_iommu_destroy(struct amd_viommu_iommu_info *data)
+{
+	unsigned int gid = data->gid;
+	struct amd_iommu_vminfo *vminfo;
+	unsigned int iommu_id = data->iommu_id;
+	struct amd_iommu *iommu = get_amd_iommu_from_devid(iommu_id);
+
+	if (!iommu)
+		return -ENODEV;
+
+	vminfo = get_vminfo(iommu, gid);
+	if (!vminfo)
+		return -EINVAL;
+
+	viommu_uninit_one(iommu, vminfo, gid);
+
+	if (vminfo->init)
+		vminfo->init = false;
+	return 0;
+
+}
+EXPORT_SYMBOL(amd_viommu_iommu_destroy);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 16/21] iommu/amd: Introduce vIOMMU ioctl for updating device mapping table
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (14 preceding siblings ...)
  2023-06-21 23:55 ` [RFC PATCH 15/21] iommu/amd: Introduce vIOMMU vminit and vmdestroy ioctl Suravee Suthikulpanit
@ 2023-06-21 23:55 ` Suravee Suthikulpanit
  2023-06-21 23:55 ` [RFC PATCH 17/21] iommu/amd: Introduce vIOMMU ioctl for updating domain mapping Suravee Suthikulpanit
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:55 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

These ioctl interfaces are used for updating device host-to-guest
device ID mappings.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/viommu.c | 130 +++++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/drivers/iommu/amd/viommu.c b/drivers/iommu/amd/viommu.c
index 2bafa5102ffa..f6f0056c7fe6 100644
--- a/drivers/iommu/amd/viommu.c
+++ b/drivers/iommu/amd/viommu.c
@@ -519,3 +519,133 @@ int amd_viommu_iommu_destroy(struct amd_viommu_iommu_info *data)
 
 }
 EXPORT_SYMBOL(amd_viommu_iommu_destroy);
+
+static void set_dte_viommu(struct amd_iommu *iommu, u16 hDevId, u16 gid, u16 gDevId)
+{
+	u64 tmp, dte;
+	struct dev_table_entry *dev_table = get_dev_table(iommu);
+
+	// vImuEn
+	dte = dev_table[hDevId].data[3];
+	dte |= (1ULL << DTE_VIOMMU_EN_SHIFT);
+
+	// GDeviceID
+	tmp = gDevId & DTE_VIOMMU_GUESTID_MASK;
+	dte |= (tmp << DTE_VIOMMU_GUESTID_SHIFT);
+
+	// GuestID
+	tmp = gid & DTE_VIOMMU_GUESTID_MASK;
+	dte |= (tmp << DTE_VIOMMU_GDEVICEID_SHIFT);
+
+	dev_table[hDevId].data[3] = dte;
+
+	dte = dev_table[hDevId].data[0];
+	dte |= DTE_FLAG_GV;
+	dev_table[hDevId].data[0] = dte;
+
+	iommu_flush_dte(iommu, hDevId);
+}
+
+void dump_device_mapping(struct amd_iommu *iommu, u16 guestId, u16 gdev_id)
+{
+	void *addr;
+	u64 offset, val;
+	struct amd_iommu_vminfo *vminfo;
+
+	vminfo = get_vminfo(iommu, guestId);
+	if (!vminfo)
+		return;
+
+	addr = vminfo->devid_table;
+	offset = gdev_id << 4;
+	val = *((u64 *)(addr + offset));
+
+	pr_debug("%s: guestId=%#x, gdev_id=%#x, base=%#llx, offset=%#llx(val=%#llx)\n", __func__,
+		 guestId, gdev_id, (unsigned long long)iommu_virt_to_phys(vminfo->devid_table),
+		 (unsigned long long)offset, (unsigned long long)val);
+}
+
+/*
+ * Program the DevID via VFCTRL registers
+ * This function will be called during VM init via VFIO.
+ */
+static void set_device_mapping(struct amd_iommu *iommu, u16 hDevId,
+			       u16 guestId, u16 queueId, u16 gDevId)
+{
+	u64 val, tmp1, tmp2;
+	u8 __iomem *vfctrl;
+
+	pr_debug("%s: iommu_id=%#x, gid=%#x, hDevId=%#x, gDevId=%#x\n",
+		__func__, pci_dev_id(iommu->dev), guestId, hDevId, gDevId);
+
+	set_dte_viommu(iommu, hDevId, guestId, gDevId);
+
+	tmp1 = gDevId;
+	tmp1 = ((tmp1 & 0xFFFFULL) << 46);
+	tmp2 = hDevId;
+	tmp2 = ((tmp2 & 0xFFFFULL) << 14);
+	val = tmp1 | tmp2 | 0x8000000000000001ULL;
+	vfctrl = VIOMMU_VFCTRL_MMIO_BASE(iommu, guestId);
+	writeq(val, vfctrl + VIOMMU_VFCTRL_GUEST_DID_MAP_CONTROL0_OFFSET);
+	wbinvd_on_all_cpus();
+
+	tmp1 = hDevId;
+	val = ((tmp1 & 0xFFFFULL) << 16);
+	writeq(val, vfctrl + VIOMMU_VFCTRL_GUEST_MISC_CONTROL_OFFSET);
+}
+
+static void clear_dte_viommu(struct amd_iommu *iommu, u16 hDevId)
+{
+	struct dev_table_entry *dev_table = get_dev_table(iommu);
+	u64 dte = dev_table[hDevId].data[3];
+
+	dte &= ~(1ULL << DTE_VIOMMU_EN_SHIFT);
+	dte &= ~(0xFFFFULL << DTE_VIOMMU_GUESTID_SHIFT);
+	dte &= ~(0xFFFFULL << DTE_VIOMMU_GDEVICEID_SHIFT);
+
+	dev_table[hDevId].data[3] = dte;
+
+	dte = dev_table[hDevId].data[0];
+	dte &= ~DTE_FLAG_GV;
+	dev_table[hDevId].data[0] = dte;
+
+	iommu_flush_dte(iommu, hDevId);
+}
+
+int amd_viommu_device_update(struct amd_viommu_dev_info *data, bool is_set)
+{
+	struct pci_dev *pdev;
+	struct iommu_domain *dom;
+	int gid = data->gid;
+	struct amd_iommu *iommu = get_amd_iommu_from_devid(data->iommu_id);
+
+	if (!iommu)
+		return -ENODEV;
+
+	clear_dte_viommu(iommu, data->hdev_id);
+
+	if (is_set) {
+		set_device_mapping(iommu, data->hdev_id, gid,
+				   data->queue_id, data->gdev_id);
+
+		pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(data->hdev_id),
+						   data->hdev_id & 0xff);
+		dom = iommu_get_domain_for_dev(&pdev->dev);
+		if (!dom) {
+			pr_err("%s: Domain not found (devid=%#x)\n",
+			       __func__, pci_dev_id(pdev));
+			return -EINVAL;
+		}
+
+		/* TODO: Only support pasid 0 for now */
+		amd_iommu_flush_tlb(dom, 0);
+		dump_device_mapping(iommu, gid, data->gdev_id);
+
+	} else {
+		clear_device_mapping(iommu, data->hdev_id, gid,
+				     data->queue_id, data->gdev_id);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(amd_viommu_device_update);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 17/21] iommu/amd: Introduce vIOMMU ioctl for updating domain mapping
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (15 preceding siblings ...)
  2023-06-21 23:55 ` [RFC PATCH 16/21] iommu/amd: Introduce vIOMMU ioctl for updating device mapping table Suravee Suthikulpanit
@ 2023-06-21 23:55 ` Suravee Suthikulpanit
  2023-06-21 23:55 ` [RFC PATCH 18/21] iommu/amd: Introduce vIOMMU ioctl for handling guest MMIO accesses Suravee Suthikulpanit
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:55 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

These ioctl interfaces are used for updating device host-to-guest
domain ID mappings.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/viommu.c | 95 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 95 insertions(+)

diff --git a/drivers/iommu/amd/viommu.c b/drivers/iommu/amd/viommu.c
index f6f0056c7fe6..1bcb895cffbf 100644
--- a/drivers/iommu/amd/viommu.c
+++ b/drivers/iommu/amd/viommu.c
@@ -520,6 +520,101 @@ int amd_viommu_iommu_destroy(struct amd_viommu_iommu_info *data)
 }
 EXPORT_SYMBOL(amd_viommu_iommu_destroy);
 
+/*
+ * Program the DomID via VFCTRL registers
+ * This function will be called during VM init via VFIO.
+ */
+static void set_domain_mapping(struct amd_iommu *iommu, u16 guestId, u16 hDomId, u16 gDomId)
+{
+	u64 val, tmp1, tmp2;
+	u8 __iomem *vfctrl = VIOMMU_VFCTRL_MMIO_BASE(iommu, guestId);
+
+	pr_debug("%s: iommu_id=%#x, gid=%#x, dom_id=%#x, gdom_id=%#x, val=%#llx\n",
+		 __func__, pci_dev_id(iommu->dev), guestId, hDomId, gDomId, val);
+
+	tmp1 = gDomId;
+	tmp1 = ((tmp1 & 0xFFFFULL) << 46);
+	tmp2 = hDomId;
+	tmp2 = ((tmp2 & 0xFFFFULL) << 14);
+	val = tmp1 | tmp2 | 0x8000000000000001UL;
+	writeq(val, vfctrl + VIOMMU_VFCTRL_GUEST_DID_MAP_CONTROL1_OFFSET);
+	wbinvd_on_all_cpus();
+}
+
+u64 get_domain_mapping(struct amd_iommu *iommu, u16 gid, u16 gdom_id)
+{
+	void *addr;
+	u64 offset, val;
+	struct amd_iommu_vminfo *vminfo;
+
+	vminfo = get_vminfo(iommu, gid);
+	if (!vminfo)
+		return -EINVAL;
+
+	addr = vminfo->domid_table;
+	offset = gdom_id << 3;
+	val = *((u64 *)(addr + offset));
+
+	return val;
+}
+
+void dump_domain_mapping(struct amd_iommu *iommu, u16 gid, u16 gdom_id)
+{
+	void *addr;
+	u64 offset, val;
+	struct amd_iommu_vminfo *vminfo;
+
+	vminfo = get_vminfo(iommu, gid);
+	if (!vminfo)
+		return;
+
+	addr = vminfo->domid_table;
+	offset = gdom_id << 3;
+	val = *((u64 *)(addr + offset));
+
+	pr_debug("%s: offset=%#llx(val=%#llx)\n", __func__,
+		(unsigned long long)offset,
+		(unsigned long long)val);
+}
+
+static u16 viommu_get_hdev_id(struct amd_iommu *iommu, u16 guestId, u16 gdev_id)
+{
+	struct amd_iommu_vminfo *vminfo;
+	void *addr;
+	u64 offset;
+
+	vminfo = get_vminfo(iommu, guestId);
+	if (!vminfo)
+		return -1;
+
+	addr = vminfo->devid_table;
+	offset = gdev_id << 4;
+	return (*((u64 *)(addr + offset)) >> 24) & 0xFFFF;
+}
+
+int amd_viommu_domain_update(struct amd_viommu_dom_info *data, bool is_set)
+{
+	u16 hdom_id, hdev_id;
+	int gid = data->gid;
+	struct amd_iommu *iommu = get_amd_iommu_from_devid(data->iommu_id);
+	struct dev_table_entry *dev_table = get_dev_table(iommu);
+
+	if (!iommu)
+		return -ENODEV;
+
+	hdev_id = viommu_get_hdev_id(iommu, gid, data->gdev_id);
+	hdom_id = dev_table[hdev_id].data[1] & 0xFFFFULL;
+
+	if (is_set) {
+		set_domain_mapping(iommu, gid, hdom_id, data->gdom_id);
+		dump_domain_mapping(iommu, 0, data->gdom_id);
+	} else
+		clear_domain_mapping(iommu, gid, hdom_id, data->gdom_id);
+
+	return 0;
+}
+EXPORT_SYMBOL(amd_viommu_domain_update);
+
 static void set_dte_viommu(struct amd_iommu *iommu, u16 hDevId, u16 gid, u16 gDevId)
 {
 	u64 tmp, dte;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 18/21] iommu/amd: Introduce vIOMMU ioctl for handling guest MMIO accesses
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (16 preceding siblings ...)
  2023-06-21 23:55 ` [RFC PATCH 17/21] iommu/amd: Introduce vIOMMU ioctl for updating domain mapping Suravee Suthikulpanit
@ 2023-06-21 23:55 ` Suravee Suthikulpanit
  2023-06-21 23:55 ` [RFC PATCH 19/21] iommu/amd: Introduce vIOMMU ioctl for handling command buffer mapping Suravee Suthikulpanit
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:55 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

This ioctl interface is used for handling guest MMIO read / write
to IOMMU MMIO registers.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/viommu.c | 250 +++++++++++++++++++++++++++++++++++++
 1 file changed, 250 insertions(+)

diff --git a/drivers/iommu/amd/viommu.c b/drivers/iommu/amd/viommu.c
index 1bcb895cffbf..9ddbdbec4a75 100644
--- a/drivers/iommu/amd/viommu.c
+++ b/drivers/iommu/amd/viommu.c
@@ -744,3 +744,253 @@ int amd_viommu_device_update(struct amd_viommu_dev_info *data, bool is_set)
 	return 0;
 }
 EXPORT_SYMBOL(amd_viommu_device_update);
+
+int amd_viommu_guest_mmio_read(struct amd_viommu_mmio_data *data)
+{
+	u8 __iomem *vfctrl, *vf;
+	u64 val, tmp = 0;
+	int gid = data->gid;
+	struct amd_iommu *iommu = get_amd_iommu_from_devid(data->iommu_id);
+
+	if (!iommu)
+		return -ENODEV;
+
+	vf = VIOMMU_VF_MMIO_BASE(iommu, gid);
+	vfctrl = VIOMMU_VFCTRL_MMIO_BASE(iommu, gid);
+
+	switch (data->offset) {
+	case MMIO_CONTROL_OFFSET:
+	{
+		/* VFCTRL offset 20h */
+		val = readq(vfctrl + 0x20);
+		tmp |= SET_CTRL_BITS(val, 8, CONTROL_CMDBUF_EN, 1); // [12]
+		tmp |= SET_CTRL_BITS(val, 9, CONTROL_COMWAIT_EN, 1); // [4]
+
+		/* VFCTRL offset 28h */
+		val = readq(vfctrl + 0x28);
+		tmp |= SET_CTRL_BITS(val, 8, CONTROL_EVT_LOG_EN, 1); // [2]
+		tmp |= SET_CTRL_BITS(val, 9, CONTROL_EVT_INT_EN, 1); // [3]
+		tmp |= SET_CTRL_BITS(val, 10, CONTROL_DUALEVTLOG_EN, 3); // [33:32]
+
+		/* VFCTRL offset 30h */
+		val = readq(vfctrl + 0x30);
+		tmp |= SET_CTRL_BITS(val, 8, CONTROL_PPRLOG_EN, 1); // [13]
+		tmp |= SET_CTRL_BITS(val, 9, CONTROL_PPRINT_EN, 1); // [14]
+		tmp |= SET_CTRL_BITS(val, 10, CONTROL_PPR_EN, 1); // [15]
+		tmp |= SET_CTRL_BITS(val, 11, CONTROL_DUALPPRLOG_EN, 3); // [31:30]
+		tmp |= SET_CTRL_BITS(val, 13, CONTROL_PPR_AUTO_RSP_EN, 1); // [39]
+		tmp |= SET_CTRL_BITS(val, 14, CONTROL_BLKSTOPMRK_EN, 1); // [41]
+		tmp |= SET_CTRL_BITS(val, 15, CONTROL_PPR_AUTO_RSP_AON, 1); // [42]
+
+		data->value = tmp;
+		break;
+	}
+	case MMIO_CMD_BUF_OFFSET:
+	{
+		val = readq(vfctrl + 0x20);
+		/* CmdLen [59:56] */
+		tmp |= SET_CTRL_BITS(val, 0, 56, 0xF);
+		data->value = tmp;
+		break;
+	}
+	case MMIO_EVT_BUF_OFFSET:
+	{
+		val = readq(vfctrl + 0x28);
+		/* EventLen [59:56] */
+		tmp |= SET_CTRL_BITS(val, 0, 56, 0xF);
+		data->value = tmp;
+		break;
+	}
+	case MMIO_EVTB_LOG_OFFSET:
+	{
+		val = readq(vfctrl + 0x28);
+		/* EventLenB [59:56] */
+		tmp |= SET_CTRL_BITS(val, 4, 56, 0xF);
+		data->value = tmp;
+		break;
+	}
+	case MMIO_PPR_LOG_OFFSET:
+	{
+		val = readq(vfctrl + 0x30);
+		/* PPRLogLen [59:56] */
+		tmp |= SET_CTRL_BITS(val, 0, 56, 0xF);
+		data->value = tmp;
+		break;
+	}
+	case MMIO_PPRB_LOG_OFFSET:
+	{
+		val = readq(vfctrl + 0x30);
+		/* PPRLogLenB [59:56] */
+		tmp |= SET_CTRL_BITS(val, 4, 56, 0xF);
+		data->value |= tmp;
+		break;
+	}
+	case MMIO_CMD_HEAD_OFFSET:
+	{
+		val = readq(vf + 0x0);
+		data->value = (val & 0x7FFF0);
+		break;
+	}
+	case MMIO_CMD_TAIL_OFFSET:
+	{
+		val = readq(vf + 0x8);
+		data->value = (val & 0x7FFF0);
+		break;
+	}
+	case MMIO_EXT_FEATURES:
+	{
+		amd_iommu_build_efr(&data->value, NULL);
+		break;
+	}
+	default:
+		break;
+	}
+
+	pr_debug("%s: iommu_id=%#x, gid=%u, offset=%#x, value=%#llx, mmio_size=%u, is_write=%u\n",
+		 __func__, data->iommu_id, gid, data->offset,
+		 data->value, data->mmio_size, data->is_write);
+	return 0;
+}
+EXPORT_SYMBOL(amd_viommu_guest_mmio_read);
+
+/* Note:
+ * This function maps the guest MMIO write to AMD IOMMU MMIO registers
+ * into vIOMMU VFCTRL register bits.
+ */
+int amd_viommu_guest_mmio_write(struct amd_viommu_mmio_data *data)
+{
+	u8 __iomem *vfctrl, *vf;
+	int gid = data->gid;
+	u64 val, tmp, ctrl = data->value;
+	struct amd_iommu *iommu = get_amd_iommu_from_devid(data->iommu_id);
+
+	if (!iommu)
+		return -ENODEV;
+
+	pr_debug("%s: iommu_id=%#x, gid=%u, offset=%#x, value=%#llx, mmio_size=%u, is_write=%u\n",
+		 __func__, data->iommu_id, gid, data->offset,
+		 ctrl, data->mmio_size, data->is_write);
+
+	vf = VIOMMU_VF_MMIO_BASE(iommu, gid);
+	vfctrl = VIOMMU_VFCTRL_MMIO_BASE(iommu, gid);
+
+	switch (data->offset) {
+	case MMIO_CONTROL_OFFSET:
+	{
+		/* VFCTRL offset 20h */
+		val = readq(vfctrl + 0x20);
+		val &= ~(0x3ULL << 8);
+		tmp = GET_CTRL_BITS(ctrl, CONTROL_CMDBUF_EN, 1); // [12]
+		val |= (tmp << 8);
+		tmp = GET_CTRL_BITS(ctrl, CONTROL_COMWAIT_EN, 1); // [4]
+		val |= (tmp << 9);
+		writeq(val, vfctrl + 0x20);
+
+		/* VFCTRL offset 28h */
+		val = readq(vfctrl + 0x28);
+		val &= ~(0xFULL << 8);
+		tmp = GET_CTRL_BITS(ctrl, CONTROL_EVT_LOG_EN, 1); // [2]
+		val |= (tmp << 8);
+		tmp = GET_CTRL_BITS(ctrl, CONTROL_EVT_INT_EN, 1); // [3]
+		val |= (tmp << 9);
+		tmp = GET_CTRL_BITS(ctrl, CONTROL_DUALEVTLOG_EN, 3); // [33:32]
+		val |= (tmp << 10);
+		writeq(val, vfctrl + 0x28);
+
+		/* VFCTRL offset 30h */
+		val = readq(vfctrl + 0x30);
+		val &= ~(0xFFULL << 8);
+		tmp = GET_CTRL_BITS(ctrl, CONTROL_PPRLOG_EN, 1); // [13]
+		val |= (tmp << 8);
+		tmp = GET_CTRL_BITS(ctrl, CONTROL_PPRINT_EN, 1); // [14]
+		val |= (tmp << 9);
+		tmp = GET_CTRL_BITS(ctrl, CONTROL_PPR_EN, 1); // [15]
+		val |= (tmp << 10);
+		tmp = GET_CTRL_BITS(ctrl, CONTROL_DUALPPRLOG_EN, 3); // [31:30]
+		val |= (tmp << 11);
+		tmp = GET_CTRL_BITS(ctrl, CONTROL_PPR_AUTO_RSP_EN, 1); // [39]
+		val |= (tmp << 13);
+		tmp = GET_CTRL_BITS(ctrl, CONTROL_BLKSTOPMRK_EN, 1); // [41]
+		val |= (tmp << 14);
+		tmp = GET_CTRL_BITS(ctrl, CONTROL_PPR_AUTO_RSP_AON, 1); // [42]
+		val |= (tmp << 15);
+		writeq(val, vfctrl + 0x30);
+		break;
+	}
+	case MMIO_CMD_BUF_OFFSET:
+	{
+		val = readq(vfctrl + 0x20);
+		val &= ~(0xFULL);
+		/* CmdLen [59:56] */
+		tmp = GET_CTRL_BITS(ctrl, 56, 0xF);
+		val |= tmp;
+		writeq(val, vfctrl + 0x20);
+		break;
+	}
+	case MMIO_EVT_BUF_OFFSET:
+	{
+		val = readq(vfctrl + 0x28);
+		val &= ~(0xFULL);
+		/* EventLen [59:56] */
+		tmp = GET_CTRL_BITS(ctrl, 56, 0xF);
+		val |= tmp;
+		writeq(val, vfctrl + 0x28);
+		break;
+	}
+	case MMIO_EVTB_LOG_OFFSET:
+	{
+		val = readq(vfctrl + 0x28);
+		val &= ~(0xF0ULL);
+		/* EventLenB [59:56] */
+		tmp = GET_CTRL_BITS(ctrl, 56, 0xF);
+		val |= (tmp << 4);
+		writeq(val, vfctrl + 0x28);
+		break;
+	}
+	case MMIO_PPR_LOG_OFFSET:
+	{
+		val = readq(vfctrl + 0x30);
+		val &= ~(0xFULL);
+		/* PPRLogLen [59:56] */
+		tmp = GET_CTRL_BITS(ctrl, 56, 0xF);
+		val |= tmp;
+		writeq(val, vfctrl + 0x30);
+		break;
+	}
+	case MMIO_PPRB_LOG_OFFSET:
+	{
+		val = readq(vfctrl + 0x30);
+		val &= ~(0xF0ULL);
+		/* PPRLogLenB [59:56] */
+		tmp = GET_CTRL_BITS(ctrl, 56, 0xF);
+		val |= (tmp << 4);
+		writeq(val, vfctrl + 0x30);
+		break;
+	}
+	case MMIO_CMD_HEAD_OFFSET:
+	{
+		val = readq(vf + 0x0);
+		val &= ~(0x7FFFULL << 4);
+		tmp = GET_CTRL_BITS(ctrl, 4, 0x7FFF);
+		val |= (tmp << 4);
+		writeq(val, vf + 0x0);
+		break;
+	}
+	case MMIO_CMD_TAIL_OFFSET:
+	{
+		val = readq(vf + 0x8);
+		val &= ~(0x7FFFULL << 4);
+		tmp = GET_CTRL_BITS(ctrl, 4, 0x7FFF);
+		val |= (tmp << 4);
+		writeq(val, vf + 0x8);
+		break;
+	}
+	default:
+		break;
+	}
+
+	pr_debug("%s: offset=%#x, val=%#llx, ctrl=%#llx\n",
+		 __func__, data->offset, val, ctrl);
+	return 0;
+}
+EXPORT_SYMBOL(amd_viommu_guest_mmio_write);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 19/21] iommu/amd: Introduce vIOMMU ioctl for handling command buffer mapping
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (17 preceding siblings ...)
  2023-06-21 23:55 ` [RFC PATCH 18/21] iommu/amd: Introduce vIOMMU ioctl for handling guest MMIO accesses Suravee Suthikulpanit
@ 2023-06-21 23:55 ` Suravee Suthikulpanit
  2023-06-21 23:55 ` [RFC PATCH 20/21] iommu/amd: Introduce vIOMMU ioctl for setting up guest CR3 Suravee Suthikulpanit
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:55 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

This ioctl interface is used for handling vIOMMU command buffer mapping.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu_types.h |  1 +
 drivers/iommu/amd/viommu.c          | 78 +++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 5cb5a709b31b..dd3c79e454d8 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -261,6 +261,7 @@
 #define CMD_BUFFER_SIZE    8192
 #define CMD_BUFFER_UNINITIALIZED 1
 #define CMD_BUFFER_ENTRIES 512
+#define CMD_BUFFER_MAXSIZE 0x80000
 #define MMIO_CMD_SIZE_SHIFT 56
 #define MMIO_CMD_SIZE_512 (0x9ULL << MMIO_CMD_SIZE_SHIFT)
 
diff --git a/drivers/iommu/amd/viommu.c b/drivers/iommu/amd/viommu.c
index 9ddbdbec4a75..1bd4282384c4 100644
--- a/drivers/iommu/amd/viommu.c
+++ b/drivers/iommu/amd/viommu.c
@@ -994,3 +994,81 @@ int amd_viommu_guest_mmio_write(struct amd_viommu_mmio_data *data)
 	return 0;
 }
 EXPORT_SYMBOL(amd_viommu_guest_mmio_write);
+
+static void viommu_cmdbuf_free(struct protection_domain *dom, struct io_pgtable_ops *ops,
+				   unsigned long iova, struct page **pages, unsigned long npages)
+{
+	int i;
+	unsigned long flags;
+	unsigned long tmp = iova;
+
+	spin_lock_irqsave(&dom->lock, flags);
+	for (i = 0; i < npages; i++, tmp += PAGE_SIZE) {
+		amd_iommu_v1_unmap_pages(ops, tmp, PAGE_SIZE, 1, NULL);
+		/*
+		 * Flush domain TLB(s) and wait for completion. Any Device-Table
+		 * Updates and flushing already happened in
+		 * increase_address_space().
+		 */
+		amd_iommu_domain_flush_tlb_pde(dom);
+		amd_iommu_domain_flush_complete(dom);
+
+		unpin_user_pages(&pages[i], 1);
+	}
+	spin_unlock_irqrestore(&dom->lock, flags);
+}
+
+int amd_viommu_cmdbuf_update(struct amd_viommu_cmdbuf_data *data)
+{
+	int i, numpg = data->cmdbuf_size >> PAGE_SHIFT;
+	struct amd_iommu *iommu = get_amd_iommu_from_devid(data->iommu_id);
+	struct amd_iommu_vminfo *vminfo;
+	unsigned int gid = data->gid;
+	struct page **pages;
+	unsigned long npages = 0;
+	unsigned long iova;
+	unsigned long hva = data->hva;
+
+	pages = kcalloc(numpg, sizeof(struct page *), GFP_KERNEL);
+	if (!pages)
+		return -ENOMEM;
+
+	vminfo = get_vminfo(iommu, gid);
+	if (!vminfo)
+		return -EINVAL;
+
+	/*
+	 * Setup vIOMMU guest command buffer in IOMMU Private Address (IPA) space
+	 * for the specified GID.
+	 */
+	for (i = 0 ; i < numpg; i++, hva += (0x1000 * i)) {
+		int ret;
+		u64 phys;
+
+		if (get_user_pages_fast(hva, 1, FOLL_WRITE, &pages[i]) != 1) {
+			pr_err("%s: Failure locking page:%#lx.\n", __func__, hva);
+			goto err_out;
+		}
+
+		phys = __sme_set(page_to_pfn(pages[i]) << PAGE_SHIFT);
+		iova = VIOMMU_GUEST_CMDBUF_BASE + (i * PAGE_SIZE) + (gid * CMD_BUFFER_MAXSIZE);
+
+		pr_debug("%s: iova=%#lx, phys=%#llx\n", __func__, iova, phys);
+		ret = amd_iommu_v1_map_pages(&iommu->viommu_pdom->iop.iop.ops,
+					     iova, phys, PAGE_SIZE, 1,
+					     IOMMU_PROT_IR | IOMMU_PROT_IW,
+					     GFP_KERNEL, NULL);
+		if (ret) {
+			pr_err("%s: Failure to map page iova:%#lx, phys=%#llx\n",
+			       __func__, iova, phys);
+			goto err_out;
+		}
+		npages++;
+	}
+	return 0;
+err_out:
+	viommu_cmdbuf_free(iommu->viommu_pdom, &iommu->viommu_pdom->iop.iop.ops,
+			   iova, pages, npages);
+	return -EINVAL;
+}
+EXPORT_SYMBOL(amd_viommu_cmdbuf_update);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 20/21] iommu/amd: Introduce vIOMMU ioctl for setting up guest CR3
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (18 preceding siblings ...)
  2023-06-21 23:55 ` [RFC PATCH 19/21] iommu/amd: Introduce vIOMMU ioctl for handling command buffer mapping Suravee Suthikulpanit
@ 2023-06-21 23:55 ` Suravee Suthikulpanit
  2023-06-21 23:55 ` [RFC PATCH 21/21] iommufd: Introduce AMD HW-vIOMMU IOCTL Suravee Suthikulpanit
  2023-06-22 13:46 ` [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Jason Gunthorpe
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:55 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

This ioctl interface sets up guest CR3 (gCR3) table, which
is defined by guest IOMMU driver. It also enables nested
I/O page translation in the host.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu.h |  12 ++++
 drivers/iommu/amd/iommu.c     | 107 ++++++++++++++++++++++++++++++++++
 drivers/iommu/amd/viommu.c    |  36 ++++++++++++
 include/linux/iommu.h         |   1 +
 include/uapi/linux/iommufd.h  |  20 +++++++
 5 files changed, 176 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index fccae07e8c9f..463cd59127b7 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -84,6 +84,18 @@ extern void amd_iommu_domain_flush_tlb_pde(struct protection_domain *domain);
 extern int amd_iommu_flush_tlb(struct iommu_domain *dom, u32 pasid);
 extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, u32 pasid,
 				     unsigned long cr3);
+extern int amd_viommu_user_gcr3_update(const void *user_data,
+				       struct iommu_domain *udom);
+extern int amd_iommu_setup_gcr3_table(struct amd_iommu *iommu,
+				      struct pci_dev *pdev,
+				      struct iommu_domain *dom,
+				      struct iommu_domain *udom,
+				      int pasids, bool giov);
+extern int amd_iommu_user_set_gcr3(struct amd_iommu *iommu,
+				   struct iommu_domain *dom,
+				   struct iommu_domain *udom,
+				   struct pci_dev *pdev, u32 pasid,
+				   unsigned long cr3);
 extern int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, u32 pasid);
 extern void amd_iommu_iotlb_sync(struct iommu_domain *domain,
 				 struct iommu_iotlb_gather *gather);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index f22b2a5a8bfc..bff53977f8f7 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -80,6 +80,8 @@ struct kmem_cache *amd_iommu_irq_cache;
 
 static void detach_device(struct device *dev);
 static int domain_enable_v2(struct protection_domain *domain, int pasids, bool giov);
+static int __set_gcr3(struct protection_domain *domain, u32 pasid,
+		      unsigned long cr3);
 
 /****************************************************************************
  *
@@ -2525,10 +2527,43 @@ static void *amd_iommu_hw_info(struct device *dev, u32 *length)
 	return hwinfo;
 }
 
+static struct iommu_domain *
+amd_iommu_domain_alloc_user(struct device *dev,
+			    enum iommu_hwpt_type hwpt_type,
+			    struct iommu_domain *parent,
+			    const union iommu_domain_user_data *user_data)
+{
+	int ret;
+	struct iommu_domain *dom = iommu_domain_alloc(dev->bus);
+
+	if (!dom || !parent)
+		return dom;
+
+	/*
+	 * The parent is not null only when external driver calls IOMMUFD kAPI
+	 * to create IOMMUFD_OBJ_HW_PAGETABLE to attach a bound device to IOAS.
+	 * This is for nested (v2) page table.
+	 *
+	 * TODO: Currently, only support nested table w/ 1 pasid for GIOV use case.
+	 *       Add support for multiple pasids.
+	 */
+	dom->type = IOMMU_DOMAIN_NESTED;
+
+	ret = amd_viommu_user_gcr3_update(user_data, dom);
+	if (ret)
+		goto err_out;
+
+	return dom;
+err_out:
+	iommu_domain_free(dom);
+	return NULL;
+}
+
 const struct iommu_ops amd_iommu_ops = {
 	.capable		= amd_iommu_capable,
 	.hw_info		= amd_iommu_hw_info,
 	.domain_alloc		= amd_iommu_domain_alloc,
+	.domain_alloc_user	= amd_iommu_domain_alloc_user,
 	.probe_device		= amd_iommu_probe_device,
 	.release_device		= amd_iommu_release_device,
 	.probe_finalize		= amd_iommu_probe_finalize,
@@ -2537,6 +2572,7 @@ const struct iommu_ops amd_iommu_ops = {
 	.is_attach_deferred	= amd_iommu_is_attach_deferred,
 	.pgsize_bitmap		= AMD_IOMMU_PGSIZES,
 	.def_domain_type	= amd_iommu_def_domain_type,
+	.hw_info_type		= IOMMU_HW_INFO_TYPE_AMD,
 	.default_domain_ops	= &(const struct iommu_domain_ops) {
 		.attach_dev	= amd_iommu_attach_device,
 		.map_pages	= amd_iommu_map_pages,
@@ -2639,6 +2675,77 @@ int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids, bool giov)
 }
 EXPORT_SYMBOL(amd_iommu_domain_enable_v2);
 
+int amd_iommu_setup_gcr3_table(struct amd_iommu *iommu, struct pci_dev *pdev,
+			       struct iommu_domain *dom,
+			       struct iommu_domain *udom,
+			       int pasids, bool giov)
+{
+	int levels;
+	struct protection_domain *pdom = to_pdomain(dom);
+	struct protection_domain *updom = to_pdomain(udom);
+	struct iommu_dev_data *dev_data = dev_iommu_priv_get(&pdev->dev);
+
+	if (updom->gcr3_tbl)
+		return -EINVAL;
+
+	/* Number of GCR3 table levels required */
+	for (levels = 0; (pasids - 1) & ~0x1ff; pasids >>= 9)
+		levels += 1;
+
+	if (levels > amd_iommu_max_glx_val)
+		return -EINVAL;
+
+	updom->gcr3_tbl = (void *)get_zeroed_page(GFP_ATOMIC);
+	if (updom->gcr3_tbl == NULL)
+		return -ENOMEM;
+
+	updom->glx = levels;
+	updom->flags |= PD_IOMMUV2_MASK;
+	if (giov)
+		updom->flags |= PD_GIOV_MASK;
+
+	set_dte_entry(iommu, dev_data->devid, pdom, updom,
+		      updom->gcr3_tbl,
+		      dev_data->ats.enabled, false);
+	clone_aliases(iommu, dev_data->dev);
+
+	iommu_flush_dte(iommu, dev_data->devid);
+	iommu_completion_wait(iommu);
+	return 0;
+}
+
+/*
+ * Note: For vIOMMU, the guest could be using different
+ *       GCR3 table for each VFIO pass-through device.
+ *       Therefore, we need to per-device GCR3 table.
+ */
+int amd_iommu_user_set_gcr3(struct amd_iommu *iommu,
+			    struct iommu_domain *dom,
+			    struct iommu_domain *udom,
+			    struct pci_dev *pdev, u32 pasid,
+			    unsigned long cr3)
+{
+	struct iommu_dev_data *dev_data = dev_iommu_priv_get(&pdev->dev);
+	struct protection_domain *domain = to_pdomain(dom);
+	struct protection_domain *udomain = to_pdomain(udom);
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&domain->lock, flags);
+	spin_lock_irqsave(&udomain->lock, flags);
+
+	ret = __set_gcr3(udomain, pasid, cr3);
+	if (!ret) {
+		device_flush_dte(dev_data);
+		iommu_completion_wait(iommu);
+	}
+
+	spin_unlock_irqrestore(&udomain->lock, flags);
+	spin_unlock_irqrestore(&domain->lock, flags);
+
+	return ret;
+}
+
 static int __flush_pasid(struct protection_domain *domain, u32 pasid,
 			 u64 address, bool size)
 {
diff --git a/drivers/iommu/amd/viommu.c b/drivers/iommu/amd/viommu.c
index 1bd4282384c4..8ce3ee3d6bf5 100644
--- a/drivers/iommu/amd/viommu.c
+++ b/drivers/iommu/amd/viommu.c
@@ -1072,3 +1072,39 @@ int amd_viommu_cmdbuf_update(struct amd_viommu_cmdbuf_data *data)
 	return -EINVAL;
 }
 EXPORT_SYMBOL(amd_viommu_cmdbuf_update);
+
+int amd_viommu_user_gcr3_update(const void *user_data, struct iommu_domain *udom)
+{
+	int ret;
+	struct pci_dev *pdev;
+	unsigned long npinned;
+	struct page *pages[2];
+	struct iommu_domain *dom;
+	struct iommu_hwpt_amd_v2 *hwpt = (struct iommu_hwpt_amd_v2 *)user_data;
+	struct amd_iommu *iommu = get_amd_iommu_from_devid(hwpt->iommu_id);
+	u16 hdev_id = viommu_get_hdev_id(iommu, hwpt->gid, hwpt->gdev_id);
+
+	pr_debug("%s: gid=%u, hdev_id=%#x, gcr3_va=%#llx\n",
+		 __func__, hwpt->gid, hdev_id, (unsigned long long) hwpt->gcr3_va);
+
+	npinned = get_user_pages_fast(hwpt->gcr3_va, 1, FOLL_WRITE, pages);
+	if (!npinned) {
+		pr_err("Failure locking grc3 page (%#llx).\n", hwpt->gcr3_va);
+		return -EINVAL;
+	}
+
+	/* Allocate gcr3 table */
+	pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(hdev_id),
+					   hdev_id & 0xff);
+	dom = iommu_get_domain_for_dev(&pdev->dev);
+	if (!dom)
+		return -EINVAL;
+
+	/* TODO: Only support 1 pasid (zero) for now */
+	ret = amd_iommu_setup_gcr3_table(iommu, pdev, dom, udom, 1,
+					 iommu_feature(iommu, FEATURE_GIOSUP));
+	if (ret)
+		pr_err("%s: Fail to enable gcr3 (devid=%#x)\n", __func__, pci_dev_id(pdev));
+
+	return amd_iommu_user_set_gcr3(iommu, dom, udom, pdev, 0, hwpt->gcr3);
+}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 4116f12d5f97..9239cd01d77c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -236,6 +236,7 @@ union iommu_domain_user_data {
 #endif
 	struct iommu_hwpt_vtd_s1 vtd;
 	struct iommu_hwpt_arm_smmuv3 smmuv3;
+	struct iommu_hwpt_amd_v2 amdv2;
 };
 
 /**
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index f8ea9faf6770..4147171429e1 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -408,6 +408,23 @@ struct iommu_hwpt_arm_smmuv3 {
 	__aligned_u64 out_event_uptr;
 };
 
+/**
+ * struct iommu_hwpt_amd_v2 - AMD IOMMU specific user-managed
+ *                            v2 I/O page table data
+ * @gcr3: GCR3 guest physical ddress
+ * @gcr3_va: GCR3 host virtual address
+ * @gid: Guest ID
+ * @iommu_id: IOMMU host device ID
+ * @gdev_id: Guest device ID
+ */
+struct iommu_hwpt_amd_v2 {
+	__u64 gcr3;
+	__u64 gcr3_va;
+	__u32 gid;
+	__u32 iommu_id;
+	__u16 gdev_id;
+};
+
 /**
  * enum iommu_hwpt_type - IOMMU HWPT Type
  * @IOMMU_HWPT_TYPE_DEFAULT: default
@@ -418,6 +435,7 @@ enum iommu_hwpt_type {
 	IOMMU_HWPT_TYPE_DEFAULT,
 	IOMMU_HWPT_TYPE_VTD_S1,
 	IOMMU_HWPT_TYPE_ARM_SMMUV3,
+	IOMMU_HWPT_TYPE_AMD_V2,
 };
 
 /**
@@ -523,11 +541,13 @@ struct iommu_hw_info_amd {
  * enum iommu_hw_info_type - IOMMU Hardware Info Types
  * @IOMMU_HW_INFO_TYPE_INTEL_VTD: Intel VT-d iommu info type
  * @IOMMU_HW_INFO_TYPE_ARM_SMMUV3: ARM SMMUv3 iommu info type
+ * @IOMMU_HW_INFO_TYPE_AMD: AMD IOMMU info type
  */
 enum iommu_hw_info_type {
 	IOMMU_HW_INFO_TYPE_NONE,
 	IOMMU_HW_INFO_TYPE_INTEL_VTD,
 	IOMMU_HW_INFO_TYPE_ARM_SMMUV3,
+	IOMMU_HW_INFO_TYPE_AMD,
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 21/21] iommufd: Introduce AMD HW-vIOMMU IOCTL
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (19 preceding siblings ...)
  2023-06-21 23:55 ` [RFC PATCH 20/21] iommu/amd: Introduce vIOMMU ioctl for setting up guest CR3 Suravee Suthikulpanit
@ 2023-06-21 23:55 ` Suravee Suthikulpanit
  2023-06-22 13:46 ` [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Jason Gunthorpe
  21 siblings, 0 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2023-06-21 23:55 UTC (permalink / raw)
  To: linux-kernel, iommu, kvm
  Cc: joro, robin.murphy, yi.l.liu, alex.williamson, jgg, nicolinc,
	baolu.lu, eric.auger, pandoh, kumaranand, jon.grimm,
	santosh.shukla, vasant.hegde, jay.chen, joseph.chung,
	Suravee Suthikulpanit

Add support for AMD HW-vIOMMU in the iommufd /dev/iommu devfs.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/iommufd/Makefile     |   3 +-
 drivers/iommu/iommufd/amd_viommu.c | 158 +++++++++++++++++++++++++++++
 drivers/iommu/iommufd/main.c       |  17 ++--
 include/linux/amd-viommu.h         |  26 +++++
 include/linux/iommufd.h            |   8 ++
 5 files changed, 203 insertions(+), 9 deletions(-)
 create mode 100644 drivers/iommu/iommufd/amd_viommu.c
 create mode 100644 include/linux/amd-viommu.h

diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
index 8aeba81800c5..84d771c9cfba 100644
--- a/drivers/iommu/iommufd/Makefile
+++ b/drivers/iommu/iommufd/Makefile
@@ -6,7 +6,8 @@ iommufd-y := \
 	ioas.o \
 	main.o \
 	pages.o \
-	vfio_compat.o
+	vfio_compat.o \
+	amd_viommu.o
 
 iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
 
diff --git a/drivers/iommu/iommufd/amd_viommu.c b/drivers/iommu/iommufd/amd_viommu.c
new file mode 100644
index 000000000000..1836e19cb37d
--- /dev/null
+++ b/drivers/iommu/iommufd/amd_viommu.c
@@ -0,0 +1,158 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 Advanced Micro Devices, Inc.
+ * Author: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
+ */
+
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/amd-viommu.h>
+#include <uapi/linux/amd_viommu.h>
+#include <linux/iommufd.h>
+
+#include "iommufd_private.h"
+
+union amd_viommu_ucmd_buffer {
+	struct amd_viommu_iommu_info iommu;
+	struct amd_viommu_dev_info dev;
+	struct amd_viommu_dom_info dom;
+	struct amd_viommu_mmio_data mmio;
+	struct amd_viommu_cmdbuf_data cmdbuf;
+};
+
+#define IOCTL_OP(_ioctl, _fn, _struct, _last)                                  \
+	[_IOC_NR(_ioctl) - IOMMUFD_VIOMMU_CMD_BASE] = {                        \
+		.size = sizeof(_struct) +                                      \
+			BUILD_BUG_ON_ZERO(sizeof(union amd_viommu_ucmd_buffer) <          \
+					  sizeof(_struct)),                    \
+		.min_size = offsetofend(_struct, _last),                       \
+		.ioctl_num = _ioctl,                                           \
+		.execute = _fn,                                                \
+	}
+
+int viommu_iommu_init(struct iommufd_ucmd *ucmd)
+{
+	int ret;
+	struct amd_viommu_iommu_info *data = ucmd->cmd;
+
+	ret = amd_viommu_iommu_init(data);
+	if (ret)
+		return ret;
+
+	if (copy_to_user(ucmd->ubuffer, data, sizeof(*data)))
+		ret = -EFAULT;
+	return ret;
+}
+
+int viommu_iommu_destroy(struct iommufd_ucmd *ucmd)
+{
+	struct amd_viommu_iommu_info *data = ucmd->cmd;
+
+	return amd_viommu_iommu_destroy(data);
+}
+
+int viommu_domain_attach(struct iommufd_ucmd *ucmd)
+{
+	struct amd_viommu_dom_info *data = ucmd->cmd;
+
+	return amd_viommu_domain_update(data, true);
+}
+
+int viommu_domain_detach(struct iommufd_ucmd *ucmd)
+{
+	struct amd_viommu_dom_info *data = ucmd->cmd;
+
+	return amd_viommu_domain_update(data, false);
+}
+
+int viommu_device_attach(struct iommufd_ucmd *ucmd)
+{
+	struct amd_viommu_dev_info *data = ucmd->cmd;
+
+	return amd_viommu_device_update(data, true);
+}
+
+int viommu_device_detach(struct iommufd_ucmd *ucmd)
+{
+	struct amd_viommu_dev_info *data = ucmd->cmd;
+
+	return amd_viommu_device_update(data, false);
+}
+
+int viommu_mmio_access(struct iommufd_ucmd *ucmd)
+{
+	int ret;
+	struct amd_viommu_mmio_data *data = ucmd->cmd;
+
+	if (data->is_write) {
+		ret = amd_viommu_guest_mmio_write(data);
+	} else {
+		ret = amd_viommu_guest_mmio_read(data);
+		if (ret)
+			return ret;
+
+		if (copy_to_user(ucmd->ubuffer, data, sizeof(*data)))
+			ret = -EFAULT;
+	}
+	return ret;
+}
+
+int viommu_cmdbuf_update(struct iommufd_ucmd *ucmd)
+{
+	struct amd_viommu_cmdbuf_data *data = ucmd->cmd;
+
+	return amd_viommu_cmdbuf_update(data);
+}
+
+struct iommufd_ioctl_op viommu_ioctl_ops[] = {
+	IOCTL_OP(VIOMMU_IOMMU_INIT, viommu_iommu_init,
+		 struct amd_viommu_iommu_info, gid),
+	IOCTL_OP(VIOMMU_IOMMU_DESTROY, viommu_iommu_destroy,
+		 struct amd_viommu_iommu_info, gid),
+	IOCTL_OP(VIOMMU_DEVICE_ATTACH, viommu_device_attach,
+		 struct amd_viommu_dev_info, queue_id),
+	IOCTL_OP(VIOMMU_DEVICE_DETACH, viommu_device_detach,
+		 struct amd_viommu_dev_info, queue_id),
+	IOCTL_OP(VIOMMU_DOMAIN_ATTACH, viommu_domain_attach,
+		 struct amd_viommu_dom_info, gdom_id),
+	IOCTL_OP(VIOMMU_DOMAIN_DETACH, viommu_domain_detach,
+		 struct amd_viommu_dom_info, gdom_id),
+	IOCTL_OP(VIOMMU_MMIO_ACCESS, viommu_mmio_access,
+		 struct amd_viommu_mmio_data, is_write),
+	IOCTL_OP(VIOMMU_CMDBUF_UPDATE, viommu_cmdbuf_update,
+		 struct amd_viommu_cmdbuf_data, hva),
+};
+
+long iommufd_amd_viommu_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+	struct iommufd_ctx *ictx = filp->private_data;
+	struct iommufd_ucmd ucmd = {};
+	struct iommufd_ioctl_op *op;
+	union amd_viommu_ucmd_buffer buf;
+	unsigned int nr;
+	int ret;
+
+	nr = _IOC_NR(cmd);
+	if (nr < IOMMUFD_VIOMMU_CMD_BASE ||
+	    (nr - IOMMUFD_VIOMMU_CMD_BASE) >= ARRAY_SIZE(viommu_ioctl_ops))
+		return -ENOIOCTLCMD;
+
+	ucmd.ictx = ictx;
+	ucmd.ubuffer = (void __user *)arg;
+	ret = get_user(ucmd.user_size, (u32 __user *)ucmd.ubuffer);
+	if (ret)
+		return ret;
+
+	op = &viommu_ioctl_ops[nr - IOMMUFD_VIOMMU_CMD_BASE];
+	if (op->ioctl_num != cmd)
+		return -ENOIOCTLCMD;
+	if (ucmd.user_size < op->min_size)
+		return -EOPNOTSUPP;
+
+	ucmd.cmd = &buf;
+	ret = copy_struct_from_user(ucmd.cmd, op->size, ucmd.ubuffer,
+				    ucmd.user_size);
+	if (ret)
+		return ret;
+	return op->execute(&ucmd);
+}
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 83f8b8f19bcb..d5c2738a8355 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -17,6 +17,8 @@
 #include <linux/bug.h>
 #include <uapi/linux/iommufd.h>
 #include <linux/iommufd.h>
+#include <uapi/linux/amd_viommu.h>
+#include <linux/amd-viommu.h>
 #include "../iommu-priv.h"
 
 #include "io_pagetable.h"
@@ -442,13 +444,6 @@ union ucmd_buffer {
 	struct iommu_hwpt_arm_smmuv3_invalidate smmuv3;
 };
 
-struct iommufd_ioctl_op {
-	unsigned int size;
-	unsigned int min_size;
-	unsigned int ioctl_num;
-	int (*execute)(struct iommufd_ucmd *ucmd);
-};
-
 #define IOCTL_OP(_ioctl, _fn, _struct, _last)                                  \
 	[_IOC_NR(_ioctl) - IOMMUFD_CMD_BASE] = {                               \
 		.size = sizeof(_struct) +                                      \
@@ -503,8 +498,14 @@ static long iommufd_fops_ioctl(struct file *filp, unsigned int cmd,
 
 	nr = _IOC_NR(cmd);
 	if (nr < IOMMUFD_CMD_BASE ||
-	    (nr - IOMMUFD_CMD_BASE) >= ARRAY_SIZE(iommufd_ioctl_ops))
+	    (nr - IOMMUFD_CMD_BASE) >= ARRAY_SIZE(iommufd_ioctl_ops)) {
+		/* AMD VIOMMU ioctl */
+		if (!iommufd_amd_viommu_ioctl(filp, cmd, arg))
+			return 0;
+
+		/* VFIO ioctl */
 		return iommufd_vfio_ioctl(ictx, cmd, arg);
+	}
 
 	ucmd.ictx = ictx;
 	ucmd.ubuffer = (void __user *)arg;
diff --git a/include/linux/amd-viommu.h b/include/linux/amd-viommu.h
new file mode 100644
index 000000000000..645e25c493c2
--- /dev/null
+++ b/include/linux/amd-viommu.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2022 Advanced Micro Devices, Inc.
+ */
+
+#ifndef _LINUX_AMD_VIOMMU_H
+#define _LINUX_AMD_VIOMMU_H
+
+#include <uapi/linux/amd_viommu.h>
+
+extern long iommufd_amd_viommu_ioctl(struct file *filp,
+				     unsigned int cmd,
+				     unsigned long arg);
+
+extern long iommufd_viommu_ioctl(struct file *filp, unsigned int cmd,
+			  unsigned long arg);
+
+extern int amd_viommu_iommu_init(struct amd_viommu_iommu_info *data);
+extern int amd_viommu_iommu_destroy(struct amd_viommu_iommu_info *data);
+extern int amd_viommu_domain_update(struct amd_viommu_dom_info *data, bool is_set);
+extern int amd_viommu_device_update(struct amd_viommu_dev_info *data, bool is_set);
+extern int amd_viommu_guest_mmio_write(struct amd_viommu_mmio_data *data);
+extern int amd_viommu_guest_mmio_read(struct amd_viommu_mmio_data *data);
+extern int amd_viommu_cmdbuf_update(struct amd_viommu_cmdbuf_data *data);
+
+#endif /* _LINUX_AMD_VIOMMU_H */
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 9269ce668d9b..91912e044038 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -17,6 +17,14 @@ struct iommufd_ctx;
 struct iommufd_access;
 struct file;
 struct iommu_group;
+struct iommufd_ucmd;
+
+struct iommufd_ioctl_op {
+	unsigned int size;
+	unsigned int min_size;
+	unsigned int ioctl_num;
+	int (*execute)(struct iommufd_ucmd *ucmd);
+};
 
 struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 					   struct device *dev, u32 *id);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table
  2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
                   ` (20 preceding siblings ...)
  2023-06-21 23:55 ` [RFC PATCH 21/21] iommufd: Introduce AMD HW-vIOMMU IOCTL Suravee Suthikulpanit
@ 2023-06-22 13:46 ` Jason Gunthorpe
  2023-06-23  1:15   ` Suthikulpanit, Suravee
  21 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2023-06-22 13:46 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: linux-kernel, iommu, kvm, joro, robin.murphy, yi.l.liu,
	alex.williamson, nicolinc, baolu.lu, eric.auger, pandoh,
	kumaranand, jon.grimm, santosh.shukla, vasant.hegde, jay.chen,
	joseph.chung

On Wed, Jun 21, 2023 at 06:54:47PM -0500, Suravee Suthikulpanit wrote:

> Since the IOMMU hardware virtualizes the guest command buffer, this allows
> IOMMU operations to be accelerated such as invalidation of guest pages
> (i.e. stage1) when the command is issued by the guest kernel without
> intervention from the hypervisor.

This is similar to what we are doing on ARM as well.
 
> This series is implemented on top of the IOMMUFD framework. It leverages
> the exisiting APIs and ioctls for providing guest iommu information
> (i.e. struct iommu_hw_info_amd), and allowing guest to provide guest page
> table information (i.e. struct iommu_hwpt_amd_v2) for setting up user
> domain.
> 
> Please see the [4],[5], and [6] for more detail on the AMD HW-vIOMMU.
> 
> NOTES
> -----
> This series is organized into two parts:
>   * Part1: Preparing IOMMU driver for HW-vIOMMU support (Patch 1-8).
> 
>   * Part2: Introducing HW-vIOMMU support (Patch 9-21).
> 
>   * Patch 12 and 21 extends the existing IOMMUFD ioctls to support
>     additional opterations, which can be categorized into:
>     - Ioctls to init/destroy AMD HW-vIOMMU instance
>     - Ioctls to attach/detach guest devices to the AMD HW-vIOMMU instance.
>     - Ioctls to attach/detach guest domains to the AMD HW-vIOMMU instance.
>     - Ioctls to trap certain AMD HW-vIOMMU MMIO register accesses.
>     - Ioctls to trap AMD HW-vIOMMU command buffer initialization.

No one else seems to need this kind of stuff, why is AMD different?

Emulation and mediation to create the vIOMMU is supposed to be in the
VMM side, not in the kernel. I don't want to see different models by
vendor.

Even stuff like setting up the gcr3 should not be it's own ioctls,
that is now how we are modeling things at all.

I think you need to take smaller steps in line with the other
drivers so we can all progress through this step by step together.

To start focus only on user space page tables and kernel mediated
invalidation and fit into the same model as everyone else. This is
approx the same patches and uAPI you see for ARM and Intel. AFAICT
AMD's HW is very similar to ARM's, so you should be aligning to the
ARM design.

Then maybe we can argue if a kernel vIOMMU emulation/mediation is
appropriate or not, but this series is just too much as is.

I also want to see the AMD driver align with the new APIs for
PASID/etc before we start shovling more stuff into it. This is going
to be part of the iommufd contract as well, I'm very unhappy to see
drivers pick and choosing what part of the contract they implement.

Regards,
Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table
  2023-06-22 13:46 ` [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Jason Gunthorpe
@ 2023-06-23  1:15   ` Suthikulpanit, Suravee
  2023-06-23 11:45     ` Jason Gunthorpe
  0 siblings, 1 reply; 29+ messages in thread
From: Suthikulpanit, Suravee @ 2023-06-23  1:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, iommu, kvm, joro, robin.murphy, yi.l.liu,
	alex.williamson, nicolinc, baolu.lu, eric.auger, pandoh,
	kumaranand, jon.grimm, santosh.shukla, vasant.hegde, jay.chen,
	joseph.chung

Jason,

On 6/22/2023 6:46 AM, Jason Gunthorpe wrote:
> On Wed, Jun 21, 2023 at 06:54:47PM -0500, Suravee Suthikulpanit wrote:
> 
>> Since the IOMMU hardware virtualizes the guest command buffer, this allows
>> IOMMU operations to be accelerated such as invalidation of guest pages
>> (i.e. stage1) when the command is issued by the guest kernel without
>> intervention from the hypervisor.
> 
> This is similar to what we are doing on ARM as well.

Ok

>> This series is implemented on top of the IOMMUFD framework. It leverages
>> the exisiting APIs and ioctls for providing guest iommu information
>> (i.e. struct iommu_hw_info_amd), and allowing guest to provide guest page
>> table information (i.e. struct iommu_hwpt_amd_v2) for setting up user
>> domain.
>>
>> Please see the [4],[5], and [6] for more detail on the AMD HW-vIOMMU.
>>
>> NOTES
>> -----
>> This series is organized into two parts:
>>    * Part1: Preparing IOMMU driver for HW-vIOMMU support (Patch 1-8).
>>
>>    * Part2: Introducing HW-vIOMMU support (Patch 9-21).
>>
>>    * Patch 12 and 21 extends the existing IOMMUFD ioctls to support
>>      additional opterations, which can be categorized into:
>>      - Ioctls to init/destroy AMD HW-vIOMMU instance
>>      - Ioctls to attach/detach guest devices to the AMD HW-vIOMMU instance.
>>      - Ioctls to attach/detach guest domains to the AMD HW-vIOMMU instance.
>>      - Ioctls to trap certain AMD HW-vIOMMU MMIO register accesses.
>>      - Ioctls to trap AMD HW-vIOMMU command buffer initialization.
> 
> No one else seems to need this kind of stuff, why is AMD different?
> 
> Emulation and mediation to create the vIOMMU is supposed to be in the
> VMM side, not in the kernel. I don't want to see different models by
> vendor.

These ioctl is not necessary for emulation, which I would agree that it 
should be done on the VMM side (e.g. QEMU). These ioctls provides 
necessary information for programming the AMD IOMMU hardware to provide 
hardware-assisted virtualized IOMMU. This includes programing certain 
data structures i.e. Domain ID mapping table (DomIDMap), Device ID 
mapping table (DevIDMap), and certain MMIO registers for controlling the 
HW-vIOMMU feature.

> Even stuff like setting up the gcr3 should not be it's own ioctls,
> that is now how we are modeling things at all.

Sorry for miscommunication regarding the ioctl for setting up gcr3 in 
the commit log message for patch 20 and causing confusion. I'll update 
the message accordingly. Please allow me to clarify this briefly here.

In this series, AMD IOMMU GCR3 table is actually setup when the 
IOMMUFD_CMD_HWPT_ALLOC is called, which the driver provides a hook to 
struct iommu_ops.domain_alloc_user(). The AMD-specific information is 
communicated from QEMU via iommu_domain_user_data.iommu_hwpt_amd_v2. 
This is similar to INTEL and ARM.

Please also note that for the AMD HW-vIOMMU device model in QEMU, the 
guest memory used for IOMMU device table is trapped on when guest IOMMU 
driver programs the guest Device Table Entry (gDTE). Then QEMU reads the 
content of gDTE to extract necessary information for setting up guest 
(stage-1) page table, and calls iommufd_backend_alloc_hwpt().

There are still work to be done in this to fully support PASID. I'll 
take a look at this next.

> I think you need to take smaller steps in line with the other
> drivers so we can all progress through this step by step together.

I can certainly breakdown the patch series in to smaller parts to align 
with the rest.

> To start focus only on user space page tables and kernel mediated
> invalidation and fit into the same model as everyone else. This is
> approx the same patches and uAPI you see for ARM and Intel. AFAICT
> AMD's HW is very similar to ARM's, so you should be aligning to the
> ARM design.

I think the user space page table is covered as described above.

As for the kernel mediated invalidation, IIUC from looking at the patches:

* iommufd: Add nesting related data structures for ARM SMMUv3
(https://github.com/yiliu1765/iommufd/commit/b6a5c8991dcc96ca895b53175c93e5fc522f42fe)

* iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
(https://github.com/yiliu1765/iommufd/commit/0ae59149474ad0cb8a42ff7e71ed6b4e9df00204) 


it seems that user-space is supposed to call the ioctl 
IOMMUFD_CMD_HWPT_INVALIDATE for both INTEL and ARM to issue invalidation 
for stage 1 page table. Please lemme know if I misunderstand the purpose 
of this ioctl.

However, for AMD since the HW-vIOMMU virtualizes the guest command 
buffer, and when it sees the page table invalidation command in the 
guest command buffer, it takes care of the invalidation using 
information in the DomIDMap, which maps guest domain ID (gDomID) of a 
particular guest to the corresponding host domain ID (hDomID) of the 
device and invalidate the nested translation according to the specified 
PASID, DomID, and GVA.

The DomIDMap is setup by the host IOMMU driver during VM initialization. 
When the guest IOMMU driver attaches the VFIO pass-through device to a 
guest iommu_group (i.e domain), it programs the gDTE with the gDomID. 
This action is trapped into QEMU and the gDomID is read from the gDTE 
and communicated to hypervisor via the newly proposed ioctl 
VIOMMU_DOMAIN_ATTACH. Now the DomIDMap is created for the VFIO device.

> Then maybe we can argue if a kernel vIOMMU emulation/mediation is
> appropriate or not, but this series is just too much as is.

Sure, we can continue to discuss the implementation detail for each part 
separately.

> I also want to see the AMD driver align with the new APIs for
> PASID/etc before we start shovling more stuff into it. 

Are you referring to the IOMMU API for SVA/PASID stuff:
   * struct iommu_domain_ops.set_dev_pasid()
   * struct iommu_ops.remove_dev_pasid()
   * ...

If so, we are working on it separately in parallel, and will be sending 
out RFC soon.

Otherwise, could you please point me what "new APIs for PASID/etc" you 
are referring to in particular? I might have missed something here.

> This is going to be part of the iommufd contract as well, I'm very unhappy to see
> drivers pick and choosing what part of the contract they implement.

Sorry, didn't mean to disappoint :) Lemme look into this part more and 
will try to be more compliance with the contract in the next RFC.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table
  2023-06-23  1:15   ` Suthikulpanit, Suravee
@ 2023-06-23 11:45     ` Jason Gunthorpe
  2023-06-23 22:05       ` Suthikulpanit, Suravee
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 11:45 UTC (permalink / raw)
  To: Suthikulpanit, Suravee
  Cc: linux-kernel, iommu, kvm, joro, robin.murphy, yi.l.liu,
	alex.williamson, nicolinc, baolu.lu, eric.auger, pandoh,
	kumaranand, jon.grimm, santosh.shukla, vasant.hegde, jay.chen,
	joseph.chung

On Thu, Jun 22, 2023 at 06:15:17PM -0700, Suthikulpanit, Suravee wrote:
> Jason,
> 
> On 6/22/2023 6:46 AM, Jason Gunthorpe wrote:
> > On Wed, Jun 21, 2023 at 06:54:47PM -0500, Suravee Suthikulpanit wrote:
> > 
> > > Since the IOMMU hardware virtualizes the guest command buffer, this allows
> > > IOMMU operations to be accelerated such as invalidation of guest pages
> > > (i.e. stage1) when the command is issued by the guest kernel without
> > > intervention from the hypervisor.
> > 
> > This is similar to what we are doing on ARM as well.
> 
> Ok
> 
> > > This series is implemented on top of the IOMMUFD framework. It leverages
> > > the exisiting APIs and ioctls for providing guest iommu information
> > > (i.e. struct iommu_hw_info_amd), and allowing guest to provide guest page
> > > table information (i.e. struct iommu_hwpt_amd_v2) for setting up user
> > > domain.
> > > 
> > > Please see the [4],[5], and [6] for more detail on the AMD HW-vIOMMU.
> > > 
> > > NOTES
> > > -----
> > > This series is organized into two parts:
> > >    * Part1: Preparing IOMMU driver for HW-vIOMMU support (Patch 1-8).
> > > 
> > >    * Part2: Introducing HW-vIOMMU support (Patch 9-21).
> > > 
> > >    * Patch 12 and 21 extends the existing IOMMUFD ioctls to support
> > >      additional opterations, which can be categorized into:
> > >      - Ioctls to init/destroy AMD HW-vIOMMU instance
> > >      - Ioctls to attach/detach guest devices to the AMD HW-vIOMMU instance.
> > >      - Ioctls to attach/detach guest domains to the AMD HW-vIOMMU instance.
> > >      - Ioctls to trap certain AMD HW-vIOMMU MMIO register accesses.
> > >      - Ioctls to trap AMD HW-vIOMMU command buffer initialization.
> > 
> > No one else seems to need this kind of stuff, why is AMD different?
> > 
> > Emulation and mediation to create the vIOMMU is supposed to be in the
> > VMM side, not in the kernel. I don't want to see different models by
> > vendor.
> 
> These ioctl is not necessary for emulation, which I would agree that it
> should be done on the VMM side (e.g. QEMU). These ioctls provides necessary
> information for programming the AMD IOMMU hardware to provide
> hardware-assisted virtualized IOMMU.

You have one called 'trap', it shouldn't be like this. It seems like
this is trying to parse the command buffer in the kernel, it should be
done in the VMM.

> In this series, AMD IOMMU GCR3 table is actually setup when the
> IOMMUFD_CMD_HWPT_ALLOC is called, which the driver provides a hook to struct
> iommu_ops.domain_alloc_user(). 

That isn't entirely right either, the GCR3 should be programmed into
HW during iommu_domain attach.

> The AMD-specific information is communicated from QEMU via
> iommu_domain_user_data.iommu_hwpt_amd_v2. This is similar to INTEL
> and ARM.

This is only for requesting the iommu_domain and supplying the gcr3 VA
for later use.
> 
> Please also note that for the AMD HW-vIOMMU device model in QEMU, the guest
> memory used for IOMMU device table is trapped on when guest IOMMU driver
> programs the guest Device Table Entry (gDTE). Then QEMU reads the content of
> gDTE to extract necessary information for setting up guest (stage-1) page
> table, and calls iommufd_backend_alloc_hwpt().

This is the same as ARM. It is a two step operation, you de-duplicate
the gDTE entries (eg to share vDIDs), allocating a HWPT if it doesn't
already exist, then you attach the HWPT to the physical device the
gDTE's vRID implies.

> There are still work to be done in this to fully support PASID. I'll
> take a look at this next.

I would expect PASID work is only about invalidation?
 
> > To start focus only on user space page tables and kernel mediated
> > invalidation and fit into the same model as everyone else. This is
> > approx the same patches and uAPI you see for ARM and Intel. AFAICT
> > AMD's HW is very similar to ARM's, so you should be aligning to the
> > ARM design.
> 
> I think the user space page table is covered as described above.

I'm not sure, it doesn't look like it is what I would expect.

> it seems that user-space is supposed to call the ioctl
> IOMMUFD_CMD_HWPT_INVALIDATE for both INTEL and ARM to issue invalidation for
> stage 1 page table. Please lemme know if I misunderstand the purpose of this
> ioctl.

Yes, the VMM traps the invalidation and issues it like this.
 
> However, for AMD since the HW-vIOMMU virtualizes the guest command buffer,
> and when it sees the page table invalidation command in the guest command
> buffer, it takes care of the invalidation using information in the DomIDMap,
> which maps guest domain ID (gDomID) of a particular guest to the
> corresponding host domain ID (hDomID) of the device and invalidate the
> nested translation according to the specified PASID, DomID, and GVA.

The VMM should do all of this stuff. The VMM parses the command buffer
and the VMM converts the commands to invalidation ioctls.

I'm a unclear if AMD supports a mode where the HW can directly operate
a command/invalidation queue in the VM without virtualization. Eg DMA
from guest memory and deliver directly to the guest completion
interrupts.

If it always needs SW then the SW part should be in the VMM, not the
kernel. Then you don't need to load all these tables into the kernel.

> The DomIDMap is setup by the host IOMMU driver during VM initialization.
> When the guest IOMMU driver attaches the VFIO pass-through device to a guest
> iommu_group (i.e domain), it programs the gDTE with the gDomID. This action
> is trapped into QEMU and the gDomID is read from the gDTE and communicated
> to hypervisor via the newly proposed ioctl VIOMMU_DOMAIN_ATTACH. Now the
> DomIDMap is created for the VFIO device.

The gDomID should be supplied when the HWPT is allocated, not via new
ioctls.

> Are you referring to the IOMMU API for SVA/PASID stuff:
>   * struct iommu_domain_ops.set_dev_pasid()
>   * struct iommu_ops.remove_dev_pasid()
>   * ...

Yes
 
> If so, we are working on it separately in parallel, and will be sending out
> RFC soon.

Great

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table
  2023-06-23 11:45     ` Jason Gunthorpe
@ 2023-06-23 22:05       ` Suthikulpanit, Suravee
  2023-06-23 22:56         ` Jason Gunthorpe
  0 siblings, 1 reply; 29+ messages in thread
From: Suthikulpanit, Suravee @ 2023-06-23 22:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, iommu, kvm, joro, robin.murphy, yi.l.liu,
	alex.williamson, nicolinc, baolu.lu, eric.auger, pandoh,
	kumaranand, jon.grimm, santosh.shukla, vasant.hegde, jay.chen,
	joseph.chung

Jason,

On 6/23/2023 4:45 AM, Jason Gunthorpe wrote:
> On Thu, Jun 22, 2023 at 06:15:17PM -0700, Suthikulpanit, Suravee wrote:
>> Jason,
>>
>> On 6/22/2023 6:46 AM, Jason Gunthorpe wrote:
>>> On Wed, Jun 21, 2023 at 06:54:47PM -0500, Suravee Suthikulpanit wrote:
>>>
>>>> Since the IOMMU hardware virtualizes the guest command buffer, this allows
>>>> IOMMU operations to be accelerated such as invalidation of guest pages
>>>> (i.e. stage1) when the command is issued by the guest kernel without
>>>> intervention from the hypervisor.
>>>
>>> This is similar to what we are doing on ARM as well.
>>
>> Ok
>>
>>>> This series is implemented on top of the IOMMUFD framework. It leverages
>>>> the exisiting APIs and ioctls for providing guest iommu information
>>>> (i.e. struct iommu_hw_info_amd), and allowing guest to provide guest page
>>>> table information (i.e. struct iommu_hwpt_amd_v2) for setting up user
>>>> domain.
>>>>
>>>> Please see the [4],[5], and [6] for more detail on the AMD HW-vIOMMU.
>>>>
>>>> NOTES
>>>> -----
>>>> This series is organized into two parts:
>>>>     * Part1: Preparing IOMMU driver for HW-vIOMMU support (Patch 1-8).
>>>>
>>>>     * Part2: Introducing HW-vIOMMU support (Patch 9-21).
>>>>
>>>>     * Patch 12 and 21 extends the existing IOMMUFD ioctls to support
>>>>       additional opterations, which can be categorized into:
>>>>       - Ioctls to init/destroy AMD HW-vIOMMU instance
>>>>       - Ioctls to attach/detach guest devices to the AMD HW-vIOMMU instance.
>>>>       - Ioctls to attach/detach guest domains to the AMD HW-vIOMMU instance.

I'm re-looking into these three a bit and will get back.

>>>>       - Ioctls to trap certain AMD HW-vIOMMU MMIO register accesses.
To describe the need for this ioctl, AMD IOMMU has two set of MMIO 
registers:
   1. Control MMIO
   2. Data MMIO

For AMD HW-vIOMMU, the hardware define a private memory address space 
(PAS) containing VF Control MMIO and VF MMIO register for each guest 
IOMMU instance, which represents the guest view of the AMD IOMMU MMIO 
registers. This memory is also accessed by the IOMMU hardware to 
virtualize the guest MMIO register.

When the guest IOMMU driver write to guest control MMIO register of the 
QEMU AMD HW-vIOMMU device model, it traps into QEMU. QEMU reads the 
value call VIOMMU_MMIO_ACCESS to tell the AMD IOMMU driver in the host 
to program VFCtrlMMIO or VFMMIO register for this guest.

Similar for the read on guest control MMIO register, QEMU calls ioctl to 
get the value from AMD iommu driver, which reads the guest VFCtrlMMIO or 
VFMMIO register and provide back to the guest.

>>>>       - Ioctls to trap AMD HW-vIOMMU command buffer initialization.

For this ioctl, the IOMMU hardware define an IOMMU PAS containing a 
command buffer for each guest IOMMU instance. This memory is also 
accessed by IOMMU hardware to virtualize the guest command buffer.

When the guest IOMMU driver write to guest Command Buffer Base Address 
MMIO register of the QEMU AMD HW-vIOMMU device model, it traps into 
QEMU. QEMU reads the value, parse the GPA, and translate to HVA. Then it 
calls VIOMMU_CMDBUF_UPDATE to communicate the HVA to IOMMU driver to map 
it in the IOMMU PAS so that it use this memory to virtualize the guest 
command buffer.

>>>
>>> No one else seems to need this kind of stuff, why is AMD different?
>>>
>>> Emulation and mediation to create the vIOMMU is supposed to be in the
>>> VMM side, not in the kernel. I don't want to see different models by
>>> vendor.
>>
>> These ioctl is not necessary for emulation, which I would agree that it
>> should be done on the VMM side (e.g. QEMU). These ioctls provides necessary
>> information for programming the AMD IOMMU hardware to provide
>> hardware-assisted virtualized IOMMU.
> 
> You have one called 'trap', it shouldn't be like this. It seems like
> this is trying to parse the command buffer in the kernel, it should be
> done in the VMM.

Please see the more detail description above. Basically, all parsing is 
done in the VMM, and it use the ioctl to tell IOMMU driver to program 
the VFCtrlMMIO/VFMMIO registers or IOMMU PAS for the hardware to access.

>> In this series, AMD IOMMU GCR3 table is actually setup when the
>> IOMMUFD_CMD_HWPT_ALLOC is called, which the driver provides a hook to struct
>> iommu_ops.domain_alloc_user().
> 
> That isn't entirely right either, the GCR3 should be programmed into
> HW during iommu_domain attach.
> >> The AMD-specific information is communicated from QEMU via
>> iommu_domain_user_data.iommu_hwpt_amd_v2. This is similar to INTEL
>> and ARM.
> 
> This is only for requesting the iommu_domain and supplying the gcr3 VA
> for later use.

Ah, ok. Lemme look into this again and get back to you.

>.... 
>
>> There are still work to be done in this to fully support PASID. I'll
>> take a look at this next.
> 
> I would expect PASID work is only about invalidation?

Actually, I am referring to supporting non-zero PASID, which requires 
walking the guest IOMMU gCR3 table and communicate this to the hypervisor.

>>> To start focus only on user space page tables and kernel mediated
>>> invalidation and fit into the same model as everyone else. This is
>>> approx the same patches and uAPI you see for ARM and Intel. AFAICT
>>> AMD's HW is very similar to ARM's, so you should be aligning to the
>>> ARM design.
>>
>> I think the user space page table is covered as described above.
> 
> I'm not sure, it doesn't look like it is what I would expect.

Lemme clean up this part and get back in next RFC.

>> It seems that user-space is supposed to call the ioctl
>> IOMMUFD_CMD_HWPT_INVALIDATE for both INTEL and ARM to issue invalidation for
>> stage 1 page table. Please lemme know if I misunderstand the purpose of this
>> ioctl.
> 
> Yes, the VMM traps the invalidation and issues it like this.
>   
>> However, for AMD since the HW-vIOMMU virtualizes the guest command buffer,
>> and when it sees the page table invalidation command in the guest command
>> buffer, it takes care of the invalidation using information in the DomIDMap,
>> which maps guest domain ID (gDomID) of a particular guest to the
>> corresponding host domain ID (hDomID) of the device and invalidate the
>> nested translation according to the specified PASID, DomID, and GVA.
> 
> The VMM should do all of this stuff. The VMM parses the command buffer
> and the VMM converts the commands to invalidation ioctls.
>
> I'm a unclear if AMD supports a mode where the HW can directly operate
> a command/invalidation queue in the VM without virtualization. Eg DMA
> from guest memory and deliver directly to the guest completion
> interrupts.

Correct, VMM does not need to parse the command buffer. The hardware 
takes care of virtualizing the invalidation commands in the guest 
command buffer directly buffer w/o VMM helps to do invalidation from the 
host side.

For AMD IOMMU, the invalidation command is normally followed by the 
COMPLETION_WAIT command on a memory semaphore, in which the hardware 
updates after all the prior commands are completed.

For Linux, we are not using Completion Wait interrupt. The iommu driver 
polls on the memory semphore in a loop.

> If it always needs SW then the SW part should be in the VMM, not the
> kernel. Then you don't need to load all these tables into the kernel.
> 

As described, the IOMMU driver needs to program the IOMMU PAS. IOMMU 
hardware uses its own IOMMU page table to access the PAS.

For example, an AMD IOMMU hardware is normally listed as a PCI device 
(e.g. PCI ID 00:00.2). To setup IOMMU PAS for this IOMMU instance, the 
IOMMU driver allocate an IOMMU v1 page table for this device, which 
contains PAS mapping.

The IOMMU hardware use the PAS for storing Guest IOMMU information such 
as Guest MMIOs, DevID Mapping Table, DomID Mapping Table, and Guest 
Command/Event/PPR logs.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table
  2023-06-23 22:05       ` Suthikulpanit, Suravee
@ 2023-06-23 22:56         ` Jason Gunthorpe
  2023-06-24  2:08           ` Suthikulpanit, Suravee
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 22:56 UTC (permalink / raw)
  To: Suthikulpanit, Suravee
  Cc: linux-kernel, iommu, kvm, joro, robin.murphy, yi.l.liu,
	alex.williamson, nicolinc, baolu.lu, eric.auger, pandoh,
	kumaranand, jon.grimm, santosh.shukla, vasant.hegde, jay.chen,
	joseph.chung

On Fri, Jun 23, 2023 at 03:05:06PM -0700, Suthikulpanit, Suravee wrote:

> For example, an AMD IOMMU hardware is normally listed as a PCI device (e.g.
> PCI ID 00:00.2). To setup IOMMU PAS for this IOMMU instance, the IOMMU
> driver allocate an IOMMU v1 page table for this device, which contains PAS
> mapping.

So it is just system dram?
 
> The IOMMU hardware use the PAS for storing Guest IOMMU information such as
> Guest MMIOs, DevID Mapping Table, DomID Mapping Table, and Guest
> Command/Event/PPR logs.

Why does it have to be in kernel memory?

Why not store the whole thing in user mapped memory and have the VMM
manipulate it directly?

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table
  2023-06-23 22:56         ` Jason Gunthorpe
@ 2023-06-24  2:08           ` Suthikulpanit, Suravee
  2023-06-26 13:20             ` Jason Gunthorpe
  0 siblings, 1 reply; 29+ messages in thread
From: Suthikulpanit, Suravee @ 2023-06-24  2:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, iommu, kvm, joro, robin.murphy, yi.l.liu,
	alex.williamson, nicolinc, baolu.lu, eric.auger, pandoh,
	kumaranand, jon.grimm, santosh.shukla, vasant.hegde, jay.chen,
	joseph.chung



On 6/23/2023 3:56 PM, Jason Gunthorpe wrote:
> On Fri, Jun 23, 2023 at 03:05:06PM -0700, Suthikulpanit, Suravee wrote:
> 
>> For example, an AMD IOMMU hardware is normally listed as a PCI device (e.g.
>> PCI ID 00:00.2). To setup IOMMU PAS for this IOMMU instance, the IOMMU
>> driver allocate an IOMMU v1 page table for this device, which contains PAS
>> mapping.
> 
> So it is just system dram?

Yes, this is no different than the IOMMU page table for a particular 
device, contain mapping from IOMMU Private Address (IPA) to SPA. The IPA 
is defined in the IOMMU spec. Please see Figure 79 and 80 of this 
documentation for IPA mapping used by the hardware.

https://www.amd.com/system/files/TechDocs/48882_3.07_PUB.pdf

>> The IOMMU hardware use the PAS for storing Guest IOMMU information such as
>> Guest MMIOs, DevID Mapping Table, DomID Mapping Table, and Guest
>> Command/Event/PPR logs.
> 
> Why does it have to be in kernel memory?
> 
> Why not store the whole thing in user mapped memory and have the VMM
> manipulate it directly?

The Guest MMIO, CmdBuf Dirty Status, are allocated per IOMMU instance. 
So, these data structure cannot be allocated by VMM. In this case, the 
IOMMUFD_CMD_MMIO_ACCESS might still be needed.

The DomID and DevID mapping tables are allocated per-VM:
   * DomID Mapping Table (512 KB contiguous memory)
   * DevID Mapping Table (1 MB contiguous memory)

Let's say we can use IOMMU_SET_DEV_DATA to communicate the memory 
address of Dom/DevID Mapping tables to IOMMU driver to pin and map in 
the PAS IOMMU page table. Then, this might work. Does that go along the 
line of what you are thinking (mainly to try to avoid introducing 
additional ioctl)?

By the way, I think I can try getting rid of the 
IOMMUFD_CMD_CMDBUF_UPDATE. Lemme do that in next RFC.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table
  2023-06-24  2:08           ` Suthikulpanit, Suravee
@ 2023-06-26 13:20             ` Jason Gunthorpe
  0 siblings, 0 replies; 29+ messages in thread
From: Jason Gunthorpe @ 2023-06-26 13:20 UTC (permalink / raw)
  To: Suthikulpanit, Suravee
  Cc: linux-kernel, iommu, kvm, joro, robin.murphy, yi.l.liu,
	alex.williamson, nicolinc, baolu.lu, eric.auger, pandoh,
	kumaranand, jon.grimm, santosh.shukla, vasant.hegde, jay.chen,
	joseph.chung

On Fri, Jun 23, 2023 at 07:08:54PM -0700, Suthikulpanit, Suravee wrote:
> > > The IOMMU hardware use the PAS for storing Guest IOMMU information such as
> > > Guest MMIOs, DevID Mapping Table, DomID Mapping Table, and Guest
> > > Command/Event/PPR logs.
> > 
> > Why does it have to be in kernel memory?
> > 
> > Why not store the whole thing in user mapped memory and have the VMM
> > manipulate it directly?
> 
> The Guest MMIO, CmdBuf Dirty Status, are allocated per IOMMU instance. So,
> these data structure cannot be allocated by VMM. 

Yes, that is unfortunate so much stuff here wasn't 4k aligned so it
could be mapped sensibly. It doesn't really make any sense to have a
giant repeated register map that still has to be hypervisor trapped, a
command queue would have been more logical :(

> In this case, the IOMMUFD_CMD_MMIO_ACCESS might still be needed.

It seems this is unavoidable, but it needs a clearer name and purpose.

But more importantly we don't really have any object to hang this off
of - we don't have the notion of a "VM" in iommufd right now.

We had sort of been handwaving that maybe the entire FD is a "VM" and
maybe that works for some scenarios, but I don't think it works for
what you need, especially if you consider multi-instance.

So, it is good that you brought this series right now as I think it
needs harmonizing with what ARM needs to do, and this is the more
complex version of the two.

> The DomID and DevID mapping tables are allocated per-VM:
>   * DomID Mapping Table (512 KB contiguous memory)
>   * DevID Mapping Table (1 MB contiguous memory)

But these can be mapped into that IPA space at 4k granularity?
They just need contiguous IOVA? So the VMM could provide this memory
and we don't need calls to manipulate it?

> Let's say we can use IOMMU_SET_DEV_DATA to communicate the memory address of
> Dom/DevID Mapping tables to IOMMU driver to pin and map in the PAS IOMMU
> page table. Then, this might work. Does that go along the line of what you
> are thinking (mainly to try to avoid introducing additional ioctl)?

I think it makes more sense if memory that is logically part of the
VMM is mmap'd to the VMM. Since we have the general design of passing
user pointers and pinning them it makes some sense. You could do the
same trick as your IPA space and use a IPA IOAS plus an access to set
this all up.

This has the same issue as above, it needs some formal VM object, as
fundamentally you are asking the driver to allocate a limited resource
on a specific IOMMU instance and then link that to other actions.

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-06-26 13:20 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-21 23:54 [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 01/21] iommu/amd: Declare helper functions as extern Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 02/21] iommu/amd: Clean up spacing in amd_iommu_ops declaration Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 03/21] iommu/amd: Update PASID, GATS, and GLX feature related macros Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 04/21] iommu/amd: Modify domain_enable_v2() to add giov parameter Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 05/21] iommu/amd: Refactor set_dte_entry() helper function Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 06/21] iommu/amd: Modify set_dte_entry() to add gcr3 input parameter Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 07/21] iommu/amd: Modify set_dte_entry() to add user domain " Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 08/21] iommu/amd: Allow nested IOMMU page tables Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 09/21] iommu/amd: Add support for hw_info for iommu capability query Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 10/21] iommu/amd: Introduce vIOMMU-specific events and event info Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 11/21] iommu/amd: Introduce Reset vMMIO Command Suravee Suthikulpanit
2023-06-21 23:54 ` [RFC PATCH 12/21] iommu/amd: Introduce AMD vIOMMU-specific UAPI Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 13/21] iommu/amd: Introduce vIOMMU command-line option Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 14/21] iommu/amd: Initialize vIOMMU private address space regions Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 15/21] iommu/amd: Introduce vIOMMU vminit and vmdestroy ioctl Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 16/21] iommu/amd: Introduce vIOMMU ioctl for updating device mapping table Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 17/21] iommu/amd: Introduce vIOMMU ioctl for updating domain mapping Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 18/21] iommu/amd: Introduce vIOMMU ioctl for handling guest MMIO accesses Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 19/21] iommu/amd: Introduce vIOMMU ioctl for handling command buffer mapping Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 20/21] iommu/amd: Introduce vIOMMU ioctl for setting up guest CR3 Suravee Suthikulpanit
2023-06-21 23:55 ` [RFC PATCH 21/21] iommufd: Introduce AMD HW-vIOMMU IOCTL Suravee Suthikulpanit
2023-06-22 13:46 ` [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table Jason Gunthorpe
2023-06-23  1:15   ` Suthikulpanit, Suravee
2023-06-23 11:45     ` Jason Gunthorpe
2023-06-23 22:05       ` Suthikulpanit, Suravee
2023-06-23 22:56         ` Jason Gunthorpe
2023-06-24  2:08           ` Suthikulpanit, Suravee
2023-06-26 13:20             ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox