* [PATCH v4 00/12] Initial support for SMMUv3 nested translation
@ 2024-10-31 0:20 Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 01/12] vfio: Remove VFIO_TYPE1_NESTING_IOMMU Jason Gunthorpe
` (14 more replies)
0 siblings, 15 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2024-10-31 0:20 UTC (permalink / raw)
To: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon
Cc: Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
[This is now based on Nicolin's iommufd patches for vIOMMU and will need
to go through the iommufd tree, please ack]
This brings support for the IOMMFD ioctls:
- IOMMU_GET_HW_INFO
- IOMMU_HWPT_ALLOC_NEST_PARENT
- IOMMU_VIOMMU_ALLOC
- IOMMU_DOMAIN_NESTED
- IOMMU_HWPT_INVALIDATE
- ops->enforce_cache_coherency()
This is quite straightforward as the nested STE can just be built in the
special NESTED domain op and fed through the generic update machinery.
The design allows the user provided STE fragment to control several
aspects of the translation, including putting the STE into a "virtual
bypass" or a aborting state. This duplicates functionality available by
other means, but it allows trivially preserving the VMID in the STE as we
eventually move towards the vIOMMU owning the VMID.
Nesting support requires the system to either support S2FWB or the
stronger CANWBS ACPI flag. This is to ensure the VM cannot bypass the
cache and view incoherent data, currently VFIO lacks any cache flushing
that would make this safe.
Yan has a series to add some of the needed infrastructure for VFIO cache
flushing here:
https://lore.kernel.org/linux-iommu/20240507061802.20184-1-yan.y.zhao@intel.com/
Which may someday allow relaxing this further.
The VIOMMU object provides the framework to allow the invalidation path to
translate the vSID to a pSID and then issue the correct physical
invalidation. This is all done in the kernel as pSID has to
limited. Future patches will extend VIOMMU to handle specific HW features
like vMPAM and NVIDIA's vCMDQ.
Remove VFIO_TYPE1_NESTING_IOMMU since it was never used and superseded by
this.
This is the first series in what will be several to complete nesting
support. At least:
- IOMMU_RESV_SW_MSI related fixups
https://lore.kernel.org/linux-iommu/cover.1722644866.git.nicolinc@nvidia.com/
- vCMDQ hypervisor support for direct invalidation queue assignment
https://lore.kernel.org/linux-iommu/cover.1712978212.git.nicolinc@nvidia.com/
- KVM pinned VMID using vIOMMU for vBTM
https://lore.kernel.org/linux-iommu/20240208151837.35068-1-shameerali.kolothum.thodi@huawei.com/
- Cross instance S2 sharing
- Virtual Machine Structure using vIOMMU (for vMPAM?)
- Fault forwarding support through IOMMUFD's fault fd for vSVA
The vIOMMU series is essential to allow the invalidations to be processed
for the CD as well.
It is enough to allow qemu work to progress.
This is on github: https://github.com/jgunthorpe/linux/commits/smmuv3_nesting
v4:
- Rebase on Nicolin's patches
- Add user_pasid_table=1 to support fault reporting on NESTED domains
- Reorder STRTAB constants
- Fix whitespace
- Roll in the patches Nicolin had and merge together into a logical order
Includes vIOMMU, ATS and invalidation patches
v3: https://patch.msgid.link/r/0-v3-e2e16cd7467f+2a6a1-smmuv3_nesting_jgg@nvidia.com
- Rebase on v6.12-rc2
- Revise commit messages
- Consolidate CANWB checks into arm_smmu_master_canwbs()
- Add CONFIG_ARM_SMMU_V3_IOMMUFD to compile out iommufd only features
like nesting
- Shift code into arm-smmu-v3-iommufd.c
- Add missed IS_ERR check
- Add S2FWB to arm_smmu_get_ste_used()
- Fixup quirks checks
- Drop ARM_SMMU_FEAT_COHERENCY checks for S2FWB
- Limit S2FWB to S2 Nesting Parent domains "just in case"
v2: https://patch.msgid.link/r/0-v2-621370057090+91fec-smmuv3_nesting_jgg@nvidia.com
- Revise commit messages
- Guard S2FWB support with ARM_SMMU_FEAT_COHERENCY, since it doesn't make
sense to use S2FWB to enforce coherency on inherently non-coherent hardware.
- Add missing IO_PGTABLE_QUIRK_ARM_S2FWB validation
- Include formal ACPIA commit for IORT built using
generate/linux/gen-patch.sh
- Use FEAT_NESTING to block creating a NESTING_PARENT
- Use an abort STE instead of non-valid if the user requests a non-valid
vSTE
- Consistently use 'nest_parent' for naming variables
- Use the right domain for arm_smmu_remove_master_domain() when it
removes the master
- Join bitfields together
- Drop arm_smmu_cache_invalidate_user patch, invalidation will
exclusively go via viommu
v1: https://patch.msgid.link/r/0-v1-54e734311a7f+14f72-smmuv3_nesting_jgg@nvidia.com
Jason Gunthorpe (7):
vfio: Remove VFIO_TYPE1_NESTING_IOMMU
iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS
iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT
iommu/arm-smmu-v3: Expose the arm_smmu_attach interface
iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED
iommu/arm-smmu-v3: Use S2FWB for NESTED domains
iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED
Nicolin Chen (5):
ACPICA: IORT: Update for revision E.f
ACPI/IORT: Support CANWBS memory access flag
iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct
arm_smmu_hw_info
iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC
iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object
drivers/acpi/arm64/iort.c | 13 +
drivers/iommu/Kconfig | 9 +
drivers/iommu/arm/arm-smmu-v3/Makefile | 1 +
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 393 ++++++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 139 +++++--
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 92 +++-
drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 -
drivers/iommu/io-pgtable-arm.c | 27 +-
drivers/iommu/iommu.c | 10 -
drivers/iommu/iommufd/vfio_compat.c | 7 +-
drivers/vfio/vfio_iommu_type1.c | 12 +-
include/acpi/actbl2.h | 3 +-
include/linux/io-pgtable.h | 2 +
include/linux/iommu.h | 5 +-
include/uapi/linux/iommufd.h | 83 ++++
include/uapi/linux/vfio.h | 2 +-
16 files changed, 712 insertions(+), 102 deletions(-)
create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
base-commit: 9ffbeb478d44c57b9b2e263750b1056e5faebc9b
--
2.43.0
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v4 01/12] vfio: Remove VFIO_TYPE1_NESTING_IOMMU
2024-10-31 0:20 [PATCH v4 00/12] Initial support for SMMUv3 nested translation Jason Gunthorpe
@ 2024-10-31 0:20 ` Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 02/12] ACPICA: IORT: Update for revision E.f Jason Gunthorpe
` (13 subsequent siblings)
14 siblings, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2024-10-31 0:20 UTC (permalink / raw)
To: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon
Cc: Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
This control causes the ARM SMMU drivers to choose a stage 2
implementation for the IO pagetable (vs the stage 1 usual default),
however this choice has no significant visible impact to the VFIO
user. Further qemu never implemented this and no other userspace user is
known.
The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add
new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide
SMMU translation services to the guest operating system" however the rest
of the API to set the guest table pointer for the stage 1 and manage
invalidation was never completed, or at least never upstreamed, rendering
this part useless dead code.
Upstream has now settled on iommufd as the uAPI for controlling nested
translation. Choosing the stage 2 implementation should be done by through
the IOMMU_HWPT_ALLOC_NEST_PARENT flag during domain allocation.
Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the
enable_nesting iommu_domain_op.
Just in-case there is some userspace using this continue to treat
requesting it as a NOP, but do not advertise support any more.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 ----------------
drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 ----------------
drivers/iommu/iommu.c | 10 ----------
drivers/iommu/iommufd/vfio_compat.c | 7 +------
drivers/vfio/vfio_iommu_type1.c | 12 +-----------
include/linux/iommu.h | 3 ---
include/uapi/linux/vfio.h | 2 +-
7 files changed, 3 insertions(+), 63 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 737c5b88235510..acf250aeb18b27 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3378,21 +3378,6 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
return group;
}
-static int arm_smmu_enable_nesting(struct iommu_domain *domain)
-{
- struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
- int ret = 0;
-
- mutex_lock(&smmu_domain->init_mutex);
- if (smmu_domain->smmu)
- ret = -EPERM;
- else
- smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
- mutex_unlock(&smmu_domain->init_mutex);
-
- return ret;
-}
-
static int arm_smmu_of_xlate(struct device *dev,
const struct of_phandle_args *args)
{
@@ -3514,7 +3499,6 @@ static struct iommu_ops arm_smmu_ops = {
.flush_iotlb_all = arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys = arm_smmu_iova_to_phys,
- .enable_nesting = arm_smmu_enable_nesting,
.free = arm_smmu_domain_free_paging,
}
};
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 8321962b37148b..12b173eec4540d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1558,21 +1558,6 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
return group;
}
-static int arm_smmu_enable_nesting(struct iommu_domain *domain)
-{
- struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
- int ret = 0;
-
- mutex_lock(&smmu_domain->init_mutex);
- if (smmu_domain->smmu)
- ret = -EPERM;
- else
- smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
- mutex_unlock(&smmu_domain->init_mutex);
-
- return ret;
-}
-
static int arm_smmu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirks)
{
@@ -1656,7 +1641,6 @@ static struct iommu_ops arm_smmu_ops = {
.flush_iotlb_all = arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys = arm_smmu_iova_to_phys,
- .enable_nesting = arm_smmu_enable_nesting,
.set_pgtable_quirks = arm_smmu_set_pgtable_quirks,
.free = arm_smmu_domain_free,
}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 83c8e617a2c588..dbd70d5a4702cc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2723,16 +2723,6 @@ static int __init iommu_init(void)
}
core_initcall(iommu_init);
-int iommu_enable_nesting(struct iommu_domain *domain)
-{
- if (domain->type != IOMMU_DOMAIN_UNMANAGED)
- return -EINVAL;
- if (!domain->ops->enable_nesting)
- return -EINVAL;
- return domain->ops->enable_nesting(domain);
-}
-EXPORT_SYMBOL_GPL(iommu_enable_nesting);
-
int iommu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirk)
{
diff --git a/drivers/iommu/iommufd/vfio_compat.c b/drivers/iommu/iommufd/vfio_compat.c
index a3ad5f0b6c59dd..514aacd6400949 100644
--- a/drivers/iommu/iommufd/vfio_compat.c
+++ b/drivers/iommu/iommufd/vfio_compat.c
@@ -291,12 +291,7 @@ static int iommufd_vfio_check_extension(struct iommufd_ctx *ictx,
case VFIO_DMA_CC_IOMMU:
return iommufd_vfio_cc_iommu(ictx);
- /*
- * This is obsolete, and to be removed from VFIO. It was an incomplete
- * idea that got merged.
- * https://lore.kernel.org/kvm/0-v1-0093c9b0e345+19-vfio_no_nesting_jgg@nvidia.com/
- */
- case VFIO_TYPE1_NESTING_IOMMU:
+ case __VFIO_RESERVED_TYPE1_NESTING_IOMMU:
return 0;
/*
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index bf391b40e576fc..50ebc9593c9d70 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -72,7 +72,6 @@ struct vfio_iommu {
uint64_t pgsize_bitmap;
uint64_t num_non_pinned_groups;
bool v2;
- bool nesting;
bool dirty_page_tracking;
struct list_head emulated_iommu_groups;
};
@@ -2195,12 +2194,6 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
goto out_free_domain;
}
- if (iommu->nesting) {
- ret = iommu_enable_nesting(domain->domain);
- if (ret)
- goto out_domain;
- }
-
ret = iommu_attach_group(domain->domain, group->iommu_group);
if (ret)
goto out_domain;
@@ -2541,9 +2534,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
switch (arg) {
case VFIO_TYPE1_IOMMU:
break;
- case VFIO_TYPE1_NESTING_IOMMU:
- iommu->nesting = true;
- fallthrough;
+ case __VFIO_RESERVED_TYPE1_NESTING_IOMMU:
case VFIO_TYPE1v2_IOMMU:
iommu->v2 = true;
break;
@@ -2638,7 +2629,6 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
switch (arg) {
case VFIO_TYPE1_IOMMU:
case VFIO_TYPE1v2_IOMMU:
- case VFIO_TYPE1_NESTING_IOMMU:
case VFIO_UNMAP_ALL:
return 1;
case VFIO_UPDATE_VADDR:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 11de66237eaa19..099d8aa292c25d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -695,7 +695,6 @@ struct iommu_ops {
* @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE,
* including no-snoop TLPs on PCIe or other platform
* specific mechanisms.
- * @enable_nesting: Enable nesting
* @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
* @free: Release the domain after use.
*/
@@ -723,7 +722,6 @@ struct iommu_domain_ops {
dma_addr_t iova);
bool (*enforce_cache_coherency)(struct iommu_domain *domain);
- int (*enable_nesting)(struct iommu_domain *domain);
int (*set_pgtable_quirks)(struct iommu_domain *domain,
unsigned long quirks);
@@ -904,7 +902,6 @@ extern void iommu_group_put(struct iommu_group *group);
extern int iommu_group_id(struct iommu_group *group);
extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *);
-int iommu_enable_nesting(struct iommu_domain *domain);
int iommu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirks);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 2b68e6cdf1902f..c8dbf8219c4fcb 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -35,7 +35,7 @@
#define VFIO_EEH 5
/* Two-stage IOMMU */
-#define VFIO_TYPE1_NESTING_IOMMU 6 /* Implies v2 */
+#define __VFIO_RESERVED_TYPE1_NESTING_IOMMU 6 /* Implies v2 */
#define VFIO_SPAPR_TCE_v2_IOMMU 7
--
2.43.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 02/12] ACPICA: IORT: Update for revision E.f
2024-10-31 0:20 [PATCH v4 00/12] Initial support for SMMUv3 nested translation Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 01/12] vfio: Remove VFIO_TYPE1_NESTING_IOMMU Jason Gunthorpe
@ 2024-10-31 0:20 ` Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 03/12] ACPI/IORT: Support CANWBS memory access flag Jason Gunthorpe
` (12 subsequent siblings)
14 siblings, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2024-10-31 0:20 UTC (permalink / raw)
To: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon
Cc: Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
From: Nicolin Chen <nicolinc@nvidia.com>
ACPICA commit c4f5c083d24df9ddd71d5782c0988408cf0fc1ab
The IORT spec, Issue E.f (April 2024), adds a new CANWBS bit to the Memory
Access Flag field in the Memory Access Properties table, mainly for a PCI
Root Complex.
This CANWBS defines the coherency of memory accesses to be not marked IOWB
cacheable/shareable. Its value further implies the coherency impact from a
pair of mismatched memory attributes (e.g. in a nested translation case):
0x0: Use of mismatched memory attributes for accesses made by this
device may lead to a loss of coherency.
0x1: Coherency of accesses made by this device to locations in
Conventional memory are ensured as follows, even if the memory
attributes for the accesses presented by the device or provided by
the SMMU are different from Inner and Outer Write-back cacheable,
Shareable.
Link: https://github.com/acpica/acpica/commit/c4f5c083
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Acked-by: Hanjun Guo <guohanjun@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
include/acpi/actbl2.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
index d3858eebc2553b..2e917a8f8bca82 100644
--- a/include/acpi/actbl2.h
+++ b/include/acpi/actbl2.h
@@ -453,7 +453,7 @@ struct acpi_table_ccel {
* IORT - IO Remapping Table
*
* Conforms to "IO Remapping Table System Software on ARM Platforms",
- * Document number: ARM DEN 0049E.e, Sep 2022
+ * Document number: ARM DEN 0049E.f, Apr 2024
*
******************************************************************************/
@@ -524,6 +524,7 @@ struct acpi_iort_memory_access {
#define ACPI_IORT_MF_COHERENCY (1)
#define ACPI_IORT_MF_ATTRIBUTES (1<<1)
+#define ACPI_IORT_MF_CANWBS (1<<2)
/*
* IORT node specific subtables
--
2.43.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 03/12] ACPI/IORT: Support CANWBS memory access flag
2024-10-31 0:20 [PATCH v4 00/12] Initial support for SMMUv3 nested translation Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 01/12] vfio: Remove VFIO_TYPE1_NESTING_IOMMU Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 02/12] ACPICA: IORT: Update for revision E.f Jason Gunthorpe
@ 2024-10-31 0:20 ` Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 04/12] iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS Jason Gunthorpe
` (11 subsequent siblings)
14 siblings, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2024-10-31 0:20 UTC (permalink / raw)
To: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon
Cc: Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
From: Nicolin Chen <nicolinc@nvidia.com>
The IORT spec, Issue E.f (April 2024), adds a new CANWBS bit to the Memory
Access Flag field in the Memory Access Properties table, mainly for a PCI
Root Complex.
This CANWBS defines the coherency of memory accesses to be not marked IOWB
cacheable/shareable. Its value further implies the coherency impact from a
pair of mismatched memory attributes (e.g. in a nested translation case):
0x0: Use of mismatched memory attributes for accesses made by this
device may lead to a loss of coherency.
0x1: Coherency of accesses made by this device to locations in
Conventional memory are ensured as follows, even if the memory
attributes for the accesses presented by the device or provided by
the SMMU are different from Inner and Outer Write-back cacheable,
Shareable.
Note that the loss of coherency on a CANWBS-unsupported HW typically could
occur to an SMMU that doesn't implement the S2FWB feature where additional
cache flush operations would be required to prevent that from happening.
Add a new ACPI_IORT_MF_CANWBS flag and set IOMMU_FWSPEC_PCI_RC_CANWBS upon
the presence of this new flag.
CANWBS and S2FWB are similar features, in that they both guarantee the VM
can not violate coherency, however S2FWB can be bypassed by PCI No Snoop
TLPs, while CANWBS cannot. Thus CANWBS meets the requirements to set
IOMMU_CAP_ENFORCE_CACHE_COHERENCY.
Architecturally ARM has expected that VFIO would disable No Snoop through
PCI Config space, if this is done then the two would have the same
protections.
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/acpi/arm64/iort.c | 13 +++++++++++++
include/linux/iommu.h | 2 ++
2 files changed, 15 insertions(+)
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 4c745a26226b27..1f7e4c691d9ee3 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1218,6 +1218,17 @@ static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
return pci_rc->ats_attribute & ACPI_IORT_ATS_SUPPORTED;
}
+static bool iort_pci_rc_supports_canwbs(struct acpi_iort_node *node)
+{
+ struct acpi_iort_memory_access *memory_access;
+ struct acpi_iort_root_complex *pci_rc;
+
+ pci_rc = (struct acpi_iort_root_complex *)node->node_data;
+ memory_access =
+ (struct acpi_iort_memory_access *)&pci_rc->memory_properties;
+ return memory_access->memory_flags & ACPI_IORT_MF_CANWBS;
+}
+
static int iort_iommu_xlate(struct device *dev, struct acpi_iort_node *node,
u32 streamid)
{
@@ -1335,6 +1346,8 @@ int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
fwspec = dev_iommu_fwspec_get(dev);
if (fwspec && iort_pci_rc_supports_ats(node))
fwspec->flags |= IOMMU_FWSPEC_PCI_RC_ATS;
+ if (fwspec && iort_pci_rc_supports_canwbs(node))
+ fwspec->flags |= IOMMU_FWSPEC_PCI_RC_CANWBS;
} else {
node = iort_scan_node(ACPI_IORT_NODE_NAMED_COMPONENT,
iort_match_node_callback, dev);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 099d8aa292c25d..8b02adbd14f74c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -1051,6 +1051,8 @@ struct iommu_fwspec {
/* ATS is supported */
#define IOMMU_FWSPEC_PCI_RC_ATS (1 << 0)
+/* CANWBS is supported */
+#define IOMMU_FWSPEC_PCI_RC_CANWBS (1 << 1)
/*
* An iommu attach handle represents a relationship between an iommu domain
--
2.43.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 04/12] iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS
2024-10-31 0:20 [PATCH v4 00/12] Initial support for SMMUv3 nested translation Jason Gunthorpe
` (2 preceding siblings ...)
2024-10-31 0:20 ` [PATCH v4 03/12] ACPI/IORT: Support CANWBS memory access flag Jason Gunthorpe
@ 2024-10-31 0:20 ` Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info Jason Gunthorpe
` (10 subsequent siblings)
14 siblings, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2024-10-31 0:20 UTC (permalink / raw)
To: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon
Cc: Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
HW with CANWBS is always cache coherent and ignores PCI No Snoop requests
as well. This meets the requirement for IOMMU_CAP_ENFORCE_CACHE_COHERENCY,
so let's return it.
Implement the enforce_cache_coherency() op to reject attaching devices
that don't have CANWBS.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 31 +++++++++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 7 +++++
2 files changed, 38 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index acf250aeb18b27..38725810c14eeb 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2293,6 +2293,8 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
case IOMMU_CAP_CACHE_COHERENCY:
/* Assume that a coherent TCU implies coherent TBUs */
return master->smmu->features & ARM_SMMU_FEAT_COHERENCY;
+ case IOMMU_CAP_ENFORCE_CACHE_COHERENCY:
+ return arm_smmu_master_canwbs(master);
case IOMMU_CAP_NOEXEC:
case IOMMU_CAP_DEFERRED_FLUSH:
return true;
@@ -2303,6 +2305,26 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
}
}
+static bool arm_smmu_enforce_cache_coherency(struct iommu_domain *domain)
+{
+ struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+ struct arm_smmu_master_domain *master_domain;
+ unsigned long flags;
+ bool ret = true;
+
+ spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+ list_for_each_entry(master_domain, &smmu_domain->devices,
+ devices_elm) {
+ if (!arm_smmu_master_canwbs(master_domain->master)) {
+ ret = false;
+ break;
+ }
+ }
+ smmu_domain->enforce_cache_coherency = ret;
+ spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+ return ret;
+}
+
struct arm_smmu_domain *arm_smmu_domain_alloc(void)
{
struct arm_smmu_domain *smmu_domain;
@@ -2731,6 +2753,14 @@ static int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state,
* one of them.
*/
spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+ if (smmu_domain->enforce_cache_coherency &&
+ !arm_smmu_master_canwbs(master)) {
+ spin_unlock_irqrestore(&smmu_domain->devices_lock,
+ flags);
+ kfree(master_domain);
+ return -EINVAL;
+ }
+
if (state->ats_enabled)
atomic_inc(&smmu_domain->nr_ats_masters);
list_add(&master_domain->devices_elm, &smmu_domain->devices);
@@ -3493,6 +3523,7 @@ static struct iommu_ops arm_smmu_ops = {
.owner = THIS_MODULE,
.default_domain_ops = &(const struct iommu_domain_ops) {
.attach_dev = arm_smmu_attach_dev,
+ .enforce_cache_coherency = arm_smmu_enforce_cache_coherency,
.set_dev_pasid = arm_smmu_s1_set_dev_pasid,
.map_pages = arm_smmu_map_pages,
.unmap_pages = arm_smmu_unmap_pages,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 1e9952ca989f87..06e3d88932df12 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -811,6 +811,7 @@ struct arm_smmu_domain {
/* List of struct arm_smmu_master_domain */
struct list_head devices;
spinlock_t devices_lock;
+ bool enforce_cache_coherency : 1;
struct mmu_notifier mmu_notifier;
};
@@ -893,6 +894,12 @@ int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
int arm_smmu_cmdq_init(struct arm_smmu_device *smmu,
struct arm_smmu_cmdq *cmdq);
+static inline bool arm_smmu_master_canwbs(struct arm_smmu_master *master)
+{
+ return dev_iommu_fwspec_get(master->dev)->flags &
+ IOMMU_FWSPEC_PCI_RC_CANWBS;
+}
+
#ifdef CONFIG_ARM_SMMU_V3_SVA
bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
bool arm_smmu_master_sva_supported(struct arm_smmu_master *master);
--
2.43.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
2024-10-31 0:20 [PATCH v4 00/12] Initial support for SMMUv3 nested translation Jason Gunthorpe
` (3 preceding siblings ...)
2024-10-31 0:20 ` [PATCH v4 04/12] iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS Jason Gunthorpe
@ 2024-10-31 0:20 ` Jason Gunthorpe
2024-11-04 11:47 ` Will Deacon
2024-10-31 0:20 ` [PATCH v4 06/12] iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT Jason Gunthorpe
` (9 subsequent siblings)
14 siblings, 1 reply; 58+ messages in thread
From: Jason Gunthorpe @ 2024-10-31 0:20 UTC (permalink / raw)
To: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon
Cc: Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
From: Nicolin Chen <nicolinc@nvidia.com>
For virtualization cases the IDR/IIDR/AIDR values of the actual SMMU
instance need to be available to the VMM so it can construct an
appropriate vSMMUv3 that reflects the correct HW capabilities.
For userspace page tables these values are required to constrain the valid
values within the CD table and the IOPTEs.
The kernel does not sanitize these values. If building a VMM then
userspace is required to only forward bits into a VM that it knows it can
implement. Some bits will also require a VMM to detect if appropriate
kernel support is available such as for ATS and BTM.
Start a new file and kconfig for the advanced iommufd support. This lets
it be compiled out for kernels that are not intended to support
virtualization, and allows distros to leave it disabled until they are
shipping a matching qemu too.
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/Kconfig | 9 +++++
drivers/iommu/arm/arm-smmu-v3/Makefile | 1 +
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 31 ++++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 +
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 9 +++++
include/uapi/linux/iommufd.h | 35 +++++++++++++++++++
6 files changed, 86 insertions(+)
create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index b3aa1f5d53218b..0c9bceb1653d5f 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -415,6 +415,15 @@ config ARM_SMMU_V3_SVA
Say Y here if your system supports SVA extensions such as PCIe PASID
and PRI.
+config ARM_SMMU_V3_IOMMUFD
+ bool "Enable IOMMUFD features for ARM SMMUv3 (EXPERIMENTAL)"
+ depends on IOMMUFD
+ help
+ Support for IOMMUFD features intended to support virtual machines
+ with accelerated virtual IOMMUs.
+
+ Say Y here if you are doing development and testing on this feature.
+
config ARM_SMMU_V3_KUNIT_TEST
tristate "KUnit tests for arm-smmu-v3 driver" if !KUNIT_ALL_TESTS
depends on KUNIT
diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index dc98c88b48c827..493a659cc66bb2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
arm_smmu_v3-y := arm-smmu-v3.o
+arm_smmu_v3-$(CONFIG_ARM_SMMU_V3_IOMMUFD) += arm-smmu-v3-iommufd.o
arm_smmu_v3-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
arm_smmu_v3-$(CONFIG_TEGRA241_CMDQV) += tegra241-cmdqv.o
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
new file mode 100644
index 00000000000000..3d2671031c9bb5
--- /dev/null
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
+ */
+
+#include <uapi/linux/iommufd.h>
+
+#include "arm-smmu-v3.h"
+
+void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type)
+{
+ struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+ struct iommu_hw_info_arm_smmuv3 *info;
+ u32 __iomem *base_idr;
+ unsigned int i;
+
+ info = kzalloc(sizeof(*info), GFP_KERNEL);
+ if (!info)
+ return ERR_PTR(-ENOMEM);
+
+ base_idr = master->smmu->base + ARM_SMMU_IDR0;
+ for (i = 0; i <= 5; i++)
+ info->idr[i] = readl_relaxed(base_idr + i);
+ info->iidr = readl_relaxed(master->smmu->base + ARM_SMMU_IIDR);
+ info->aidr = readl_relaxed(master->smmu->base + ARM_SMMU_AIDR);
+
+ *length = sizeof(*info);
+ *type = IOMMU_HW_INFO_TYPE_ARM_SMMUV3;
+
+ return info;
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 38725810c14eeb..996774d461aea2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3506,6 +3506,7 @@ static struct iommu_ops arm_smmu_ops = {
.identity_domain = &arm_smmu_identity_domain,
.blocked_domain = &arm_smmu_blocked_domain,
.capable = arm_smmu_capable,
+ .hw_info = arm_smmu_hw_info,
.domain_alloc_paging = arm_smmu_domain_alloc_paging,
.domain_alloc_sva = arm_smmu_sva_domain_alloc,
.domain_alloc_user = arm_smmu_domain_alloc_user,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 06e3d88932df12..66261fd5bfb2d2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -81,6 +81,8 @@ struct arm_smmu_device;
#define IIDR_REVISION GENMASK(15, 12)
#define IIDR_IMPLEMENTER GENMASK(11, 0)
+#define ARM_SMMU_AIDR 0x1C
+
#define ARM_SMMU_CR0 0x20
#define CR0_ATSCHK (1 << 4)
#define CR0_CMDQEN (1 << 3)
@@ -956,4 +958,11 @@ tegra241_cmdqv_probe(struct arm_smmu_device *smmu)
return ERR_PTR(-ENODEV);
}
#endif /* CONFIG_TEGRA241_CMDQV */
+
+#if IS_ENABLED(CONFIG_ARM_SMMU_V3_IOMMUFD)
+void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type);
+#else
+#define arm_smmu_hw_info NULL
+#endif /* CONFIG_ARM_SMMU_V3_IOMMUFD */
+
#endif /* _ARM_SMMU_V3_H */
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index e266dfa6a38d9d..b227ac16333fe1 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -488,15 +488,50 @@ struct iommu_hw_info_vtd {
__aligned_u64 ecap_reg;
};
+/**
+ * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information
+ * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3)
+ *
+ * @flags: Must be set to 0
+ * @__reserved: Must be 0
+ * @idr: Implemented features for ARM SMMU Non-secure programming interface
+ * @iidr: Information about the implementation and implementer of ARM SMMU,
+ * and architecture version supported
+ * @aidr: ARM SMMU architecture version
+ *
+ * For the details of @idr, @iidr and @aidr, please refer to the chapters
+ * from 6.3.1 to 6.3.6 in the SMMUv3 Spec.
+ *
+ * User space should read the underlying ARM SMMUv3 hardware information for
+ * the list of supported features.
+ *
+ * Note that these values reflect the raw HW capability, without any insight if
+ * any required kernel driver support is present. Bits may be set indicating the
+ * HW has functionality that is lacking kernel software support, such as BTM. If
+ * a VMM is using this information to construct emulated copies of these
+ * registers it should only forward bits that it knows it can support.
+ *
+ * In future, presence of required kernel support will be indicated in flags.
+ */
+struct iommu_hw_info_arm_smmuv3 {
+ __u32 flags;
+ __u32 __reserved;
+ __u32 idr[6];
+ __u32 iidr;
+ __u32 aidr;
+};
+
/**
* enum iommu_hw_info_type - IOMMU Hardware Info Types
* @IOMMU_HW_INFO_TYPE_NONE: Used by the drivers that do not report hardware
* info
* @IOMMU_HW_INFO_TYPE_INTEL_VTD: Intel VT-d iommu info type
+ * @IOMMU_HW_INFO_TYPE_ARM_SMMUV3: ARM SMMUv3 iommu info type
*/
enum iommu_hw_info_type {
IOMMU_HW_INFO_TYPE_NONE = 0,
IOMMU_HW_INFO_TYPE_INTEL_VTD = 1,
+ IOMMU_HW_INFO_TYPE_ARM_SMMUV3 = 2,
};
/**
--
2.43.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 06/12] iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT
2024-10-31 0:20 [PATCH v4 00/12] Initial support for SMMUv3 nested translation Jason Gunthorpe
` (4 preceding siblings ...)
2024-10-31 0:20 ` [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info Jason Gunthorpe
@ 2024-10-31 0:20 ` Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 07/12] iommu/arm-smmu-v3: Expose the arm_smmu_attach interface Jason Gunthorpe
` (8 subsequent siblings)
14 siblings, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2024-10-31 0:20 UTC (permalink / raw)
To: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon
Cc: Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
For SMMUv3 the parent must be a S2 domain, which can be composed
into a IOMMU_DOMAIN_NESTED.
In future the S2 parent will also need a VMID linked to the VIOMMU and
even to KVM.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 996774d461aea2..80847fa386fcd2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3114,7 +3114,8 @@ arm_smmu_domain_alloc_user(struct device *dev, u32 flags,
const struct iommu_user_data *user_data)
{
struct arm_smmu_master *master = dev_iommu_priv_get(dev);
- const u32 PAGING_FLAGS = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
+ const u32 PAGING_FLAGS = IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
+ IOMMU_HWPT_ALLOC_NEST_PARENT;
struct arm_smmu_domain *smmu_domain;
int ret;
@@ -3127,6 +3128,14 @@ arm_smmu_domain_alloc_user(struct device *dev, u32 flags,
if (IS_ERR(smmu_domain))
return ERR_CAST(smmu_domain);
+ if (flags & IOMMU_HWPT_ALLOC_NEST_PARENT) {
+ if (!(master->smmu->features & ARM_SMMU_FEAT_NESTING)) {
+ ret = -EOPNOTSUPP;
+ goto err_free;
+ }
+ smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
+ }
+
smmu_domain->domain.type = IOMMU_DOMAIN_UNMANAGED;
smmu_domain->domain.ops = arm_smmu_ops.default_domain_ops;
ret = arm_smmu_domain_finalise(smmu_domain, master->smmu, flags);
--
2.43.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 07/12] iommu/arm-smmu-v3: Expose the arm_smmu_attach interface
2024-10-31 0:20 [PATCH v4 00/12] Initial support for SMMUv3 nested translation Jason Gunthorpe
` (5 preceding siblings ...)
2024-10-31 0:20 ` [PATCH v4 06/12] iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT Jason Gunthorpe
@ 2024-10-31 0:20 ` Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 08/12] iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC Jason Gunthorpe
` (7 subsequent siblings)
14 siblings, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2024-10-31 0:20 UTC (permalink / raw)
To: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon
Cc: Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
The arm-smmuv3-iommufd.c file will need to call these functions too.
Remove statics and put them in the header file. Remove the kunit
visibility protections from arm_smmu_make_abort_ste() and
arm_smmu_make_s2_domain_ste().
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 ++++-------------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 27 +++++++++++++++++----
2 files changed, 27 insertions(+), 22 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 80847fa386fcd2..b4b03206afbf48 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1549,7 +1549,6 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
}
}
-VISIBLE_IF_KUNIT
void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
{
memset(target, 0, sizeof(*target));
@@ -1632,7 +1631,6 @@ void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
}
EXPORT_SYMBOL_IF_KUNIT(arm_smmu_make_cdtable_ste);
-VISIBLE_IF_KUNIT
void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
struct arm_smmu_master *master,
struct arm_smmu_domain *smmu_domain,
@@ -2505,8 +2503,8 @@ arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid)
}
}
-static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master,
- const struct arm_smmu_ste *target)
+void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master,
+ const struct arm_smmu_ste *target)
{
int i, j;
struct arm_smmu_device *smmu = master->smmu;
@@ -2671,16 +2669,6 @@ static void arm_smmu_remove_master_domain(struct arm_smmu_master *master,
spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
}
-struct arm_smmu_attach_state {
- /* Inputs */
- struct iommu_domain *old_domain;
- struct arm_smmu_master *master;
- bool cd_needs_ats;
- ioasid_t ssid;
- /* Resulting state */
- bool ats_enabled;
-};
-
/*
* Start the sequence to attach a domain to a master. The sequence contains three
* steps:
@@ -2701,8 +2689,8 @@ struct arm_smmu_attach_state {
* new_domain can be a non-paging domain. In this case ATS will not be enabled,
* and invalidations won't be tracked.
*/
-static int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state,
- struct iommu_domain *new_domain)
+int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state,
+ struct iommu_domain *new_domain)
{
struct arm_smmu_master *master = state->master;
struct arm_smmu_master_domain *master_domain;
@@ -2784,7 +2772,7 @@ static int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state,
* completes synchronizing the PCI device's ATC and finishes manipulating the
* smmu_domain->devices list.
*/
-static void arm_smmu_attach_commit(struct arm_smmu_attach_state *state)
+void arm_smmu_attach_commit(struct arm_smmu_attach_state *state)
{
struct arm_smmu_master *master = state->master;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 66261fd5bfb2d2..c9e5290e995a64 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -830,21 +830,22 @@ struct arm_smmu_entry_writer_ops {
void (*sync)(struct arm_smmu_entry_writer *writer);
};
+void arm_smmu_make_abort_ste(struct arm_smmu_ste *target);
+void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
+ struct arm_smmu_master *master,
+ struct arm_smmu_domain *smmu_domain,
+ bool ats_enabled);
+
#if IS_ENABLED(CONFIG_KUNIT)
void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits);
void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *cur,
const __le64 *target);
void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits);
-void arm_smmu_make_abort_ste(struct arm_smmu_ste *target);
void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu,
struct arm_smmu_ste *target);
void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
struct arm_smmu_master *master, bool ats_enabled,
unsigned int s1dss);
-void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
- struct arm_smmu_master *master,
- struct arm_smmu_domain *smmu_domain,
- bool ats_enabled);
void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
struct arm_smmu_master *master, struct mm_struct *mm,
u16 asid);
@@ -902,6 +903,22 @@ static inline bool arm_smmu_master_canwbs(struct arm_smmu_master *master)
IOMMU_FWSPEC_PCI_RC_CANWBS;
}
+struct arm_smmu_attach_state {
+ /* Inputs */
+ struct iommu_domain *old_domain;
+ struct arm_smmu_master *master;
+ bool cd_needs_ats;
+ ioasid_t ssid;
+ /* Resulting state */
+ bool ats_enabled;
+};
+
+int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state,
+ struct iommu_domain *new_domain);
+void arm_smmu_attach_commit(struct arm_smmu_attach_state *state);
+void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master,
+ const struct arm_smmu_ste *target);
+
#ifdef CONFIG_ARM_SMMU_V3_SVA
bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
bool arm_smmu_master_sva_supported(struct arm_smmu_master *master);
--
2.43.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 08/12] iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC
2024-10-31 0:20 [PATCH v4 00/12] Initial support for SMMUv3 nested translation Jason Gunthorpe
` (6 preceding siblings ...)
2024-10-31 0:20 ` [PATCH v4 07/12] iommu/arm-smmu-v3: Expose the arm_smmu_attach interface Jason Gunthorpe
@ 2024-10-31 0:20 ` Jason Gunthorpe
2024-11-29 14:38 ` Shameerali Kolothum Thodi
2024-10-31 0:20 ` [PATCH v4 09/12] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED Jason Gunthorpe
` (6 subsequent siblings)
14 siblings, 1 reply; 58+ messages in thread
From: Jason Gunthorpe @ 2024-10-31 0:20 UTC (permalink / raw)
To: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon
Cc: Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
From: Nicolin Chen <nicolinc@nvidia.com>
Add a new driver-type for ARM SMMUv3 to enum iommu_viommu_type. Implement
an arm_vsmmu_alloc().
As an initial step, copy the VMID from s2_parent. A followup series is
required to give the VIOMMU object it's own VMID that will be used in all
nesting configurations.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 45 +++++++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 +
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 13 ++++++
include/uapi/linux/iommufd.h | 4 ++
4 files changed, 63 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index 3d2671031c9bb5..60dd9e90759571 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -29,3 +29,48 @@ void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type)
return info;
}
+
+static const struct iommufd_viommu_ops arm_vsmmu_ops = {
+};
+
+struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev,
+ struct iommu_domain *parent,
+ struct iommufd_ctx *ictx,
+ unsigned int viommu_type)
+{
+ struct arm_smmu_device *smmu =
+ iommu_get_iommu_dev(dev, struct arm_smmu_device, iommu);
+ struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+ struct arm_smmu_domain *s2_parent = to_smmu_domain(parent);
+ struct arm_vsmmu *vsmmu;
+
+ if (viommu_type != IOMMU_VIOMMU_TYPE_ARM_SMMUV3)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ if (!(smmu->features & ARM_SMMU_FEAT_NESTING))
+ return ERR_PTR(-EOPNOTSUPP);
+
+ if (s2_parent->smmu != master->smmu)
+ return ERR_PTR(-EINVAL);
+
+ /*
+ * Must support some way to prevent the VM from bypassing the cache
+ * because VFIO currently does not do any cache maintenance. canwbs
+ * indicates the device is fully coherent and no cache maintenance is
+ * ever required, even for PCI No-Snoop.
+ */
+ if (!arm_smmu_master_canwbs(master))
+ return ERR_PTR(-EOPNOTSUPP);
+
+ vsmmu = iommufd_viommu_alloc(ictx, struct arm_vsmmu, core,
+ &arm_vsmmu_ops);
+ if (IS_ERR(vsmmu))
+ return ERR_CAST(vsmmu);
+
+ vsmmu->smmu = smmu;
+ vsmmu->s2_parent = s2_parent;
+ /* FIXME Move VMID allocation from the S2 domain allocation to here */
+ vsmmu->vmid = s2_parent->s2_cfg.vmid;
+
+ return &vsmmu->core;
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index b4b03206afbf48..c425fb923eb3de 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3517,6 +3517,7 @@ static struct iommu_ops arm_smmu_ops = {
.dev_disable_feat = arm_smmu_dev_disable_feature,
.page_response = arm_smmu_page_response,
.def_domain_type = arm_smmu_def_domain_type,
+ .viommu_alloc = arm_vsmmu_alloc,
.pgsize_bitmap = -1UL, /* Restricted during device attach */
.owner = THIS_MODULE,
.default_domain_ops = &(const struct iommu_domain_ops) {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index c9e5290e995a64..3b8013afcec0de 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -10,6 +10,7 @@
#include <linux/bitfield.h>
#include <linux/iommu.h>
+#include <linux/iommufd.h>
#include <linux/kernel.h>
#include <linux/mmzone.h>
#include <linux/sizes.h>
@@ -976,10 +977,22 @@ tegra241_cmdqv_probe(struct arm_smmu_device *smmu)
}
#endif /* CONFIG_TEGRA241_CMDQV */
+struct arm_vsmmu {
+ struct iommufd_viommu core;
+ struct arm_smmu_device *smmu;
+ struct arm_smmu_domain *s2_parent;
+ u16 vmid;
+};
+
#if IS_ENABLED(CONFIG_ARM_SMMU_V3_IOMMUFD)
void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type);
+struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev,
+ struct iommu_domain *parent,
+ struct iommufd_ctx *ictx,
+ unsigned int viommu_type);
#else
#define arm_smmu_hw_info NULL
+#define arm_vsmmu_alloc NULL
#endif /* CONFIG_ARM_SMMU_V3_IOMMUFD */
#endif /* _ARM_SMMU_V3_H */
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index b227ac16333fe1..27c5117db985b2 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -400,10 +400,12 @@ struct iommu_hwpt_vtd_s1 {
* enum iommu_hwpt_data_type - IOMMU HWPT Data Type
* @IOMMU_HWPT_DATA_NONE: no data
* @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table
+ * @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table
*/
enum iommu_hwpt_data_type {
IOMMU_HWPT_DATA_NONE = 0,
IOMMU_HWPT_DATA_VTD_S1 = 1,
+ IOMMU_HWPT_DATA_ARM_SMMUV3 = 2,
};
/**
@@ -843,9 +845,11 @@ struct iommu_fault_alloc {
/**
* enum iommu_viommu_type - Virtual IOMMU Type
* @IOMMU_VIOMMU_TYPE_DEFAULT: Reserved for future use
+ * @IOMMU_VIOMMU_TYPE_ARM_SMMUV3: ARM SMMUv3 driver specific type
*/
enum iommu_viommu_type {
IOMMU_VIOMMU_TYPE_DEFAULT = 0,
+ IOMMU_VIOMMU_TYPE_ARM_SMMUV3 = 1,
};
/**
--
2.43.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 09/12] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED
2024-10-31 0:20 [PATCH v4 00/12] Initial support for SMMUv3 nested translation Jason Gunthorpe
` (7 preceding siblings ...)
2024-10-31 0:20 ` [PATCH v4 08/12] iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC Jason Gunthorpe
@ 2024-10-31 0:20 ` Jason Gunthorpe
2024-10-31 6:21 ` Zhangfei Gao
2024-10-31 0:20 ` [PATCH v4 10/12] iommu/arm-smmu-v3: Use S2FWB for NESTED domains Jason Gunthorpe
` (5 subsequent siblings)
14 siblings, 1 reply; 58+ messages in thread
From: Jason Gunthorpe @ 2024-10-31 0:20 UTC (permalink / raw)
To: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon
Cc: Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
For SMMUv3 a IOMMU_DOMAIN_NESTED is composed of a S2 iommu_domain acting
as the parent and a user provided STE fragment that defines the CD table
and related data with addresses translated by the S2 iommu_domain.
The kernel only permits userspace to control certain allowed bits of the
STE that are safe for user/guest control.
IOTLB maintenance is a bit subtle here, the S1 implicitly includes the S2
translation, but there is no way of knowing which S1 entries refer to a
range of S2.
For the IOTLB we follow ARM's guidance and issue a CMDQ_OP_TLBI_NH_ALL to
flush all ASIDs from the VMID after flushing the S2 on any change to the
S2.
The IOMMU_DOMAIN_NESTED can only be created from inside a VIOMMU as the
invalidation path relies on the VIOMMU to translate virtual stream ID used
in the invalidation commands for the CD table and ATS.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 157 ++++++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 17 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 26 +++
include/uapi/linux/iommufd.h | 20 +++
4 files changed, 219 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index 60dd9e90759571..0b9fffc5b2f09b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -30,7 +30,164 @@ void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type)
return info;
}
+static void arm_smmu_make_nested_cd_table_ste(
+ struct arm_smmu_ste *target, struct arm_smmu_master *master,
+ struct arm_smmu_nested_domain *nested_domain, bool ats_enabled)
+{
+ arm_smmu_make_s2_domain_ste(
+ target, master, nested_domain->vsmmu->s2_parent, ats_enabled);
+
+ target->data[0] = cpu_to_le64(STRTAB_STE_0_V |
+ FIELD_PREP(STRTAB_STE_0_CFG,
+ STRTAB_STE_0_CFG_NESTED));
+ target->data[0] |= nested_domain->ste[0] &
+ ~cpu_to_le64(STRTAB_STE_0_CFG);
+ target->data[1] |= nested_domain->ste[1];
+}
+
+/*
+ * Create a physical STE from the virtual STE that userspace provided when it
+ * created the nested domain. Using the vSTE userspace can request:
+ * - Non-valid STE
+ * - Abort STE
+ * - Bypass STE (install the S2, no CD table)
+ * - CD table STE (install the S2 and the userspace CD table)
+ */
+static void arm_smmu_make_nested_domain_ste(
+ struct arm_smmu_ste *target, struct arm_smmu_master *master,
+ struct arm_smmu_nested_domain *nested_domain, bool ats_enabled)
+{
+ unsigned int cfg =
+ FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(nested_domain->ste[0]));
+
+ /*
+ * Userspace can request a non-valid STE through the nesting interface.
+ * We relay that into an abort physical STE with the intention that
+ * C_BAD_STE for this SID can be generated to userspace.
+ */
+ if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V)))
+ cfg = STRTAB_STE_0_CFG_ABORT;
+
+ switch (cfg) {
+ case STRTAB_STE_0_CFG_S1_TRANS:
+ arm_smmu_make_nested_cd_table_ste(target, master, nested_domain,
+ ats_enabled);
+ break;
+ case STRTAB_STE_0_CFG_BYPASS:
+ arm_smmu_make_s2_domain_ste(target, master,
+ nested_domain->vsmmu->s2_parent,
+ ats_enabled);
+ break;
+ case STRTAB_STE_0_CFG_ABORT:
+ default:
+ arm_smmu_make_abort_ste(target);
+ break;
+ }
+}
+
+static int arm_smmu_attach_dev_nested(struct iommu_domain *domain,
+ struct device *dev)
+{
+ struct arm_smmu_nested_domain *nested_domain =
+ to_smmu_nested_domain(domain);
+ struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+ struct arm_smmu_attach_state state = {
+ .master = master,
+ .old_domain = iommu_get_domain_for_dev(dev),
+ .ssid = IOMMU_NO_PASID,
+ /* Currently invalidation of ATC is not supported */
+ .disable_ats = true,
+ };
+ struct arm_smmu_ste ste;
+ int ret;
+
+ if (nested_domain->vsmmu->smmu != master->smmu)
+ return -EINVAL;
+ if (arm_smmu_ssids_in_use(&master->cd_table))
+ return -EBUSY;
+
+ mutex_lock(&arm_smmu_asid_lock);
+ ret = arm_smmu_attach_prepare(&state, domain);
+ if (ret) {
+ mutex_unlock(&arm_smmu_asid_lock);
+ return ret;
+ }
+
+ arm_smmu_make_nested_domain_ste(&ste, master, nested_domain,
+ state.ats_enabled);
+ arm_smmu_install_ste_for_dev(master, &ste);
+ arm_smmu_attach_commit(&state);
+ mutex_unlock(&arm_smmu_asid_lock);
+ return 0;
+}
+
+static void arm_smmu_domain_nested_free(struct iommu_domain *domain)
+{
+ kfree(to_smmu_nested_domain(domain));
+}
+
+static const struct iommu_domain_ops arm_smmu_nested_ops = {
+ .attach_dev = arm_smmu_attach_dev_nested,
+ .free = arm_smmu_domain_nested_free,
+};
+
+static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg)
+{
+ unsigned int cfg;
+
+ if (!(arg->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) {
+ memset(arg->ste, 0, sizeof(arg->ste));
+ return 0;
+ }
+
+ /* EIO is reserved for invalid STE data. */
+ if ((arg->ste[0] & ~STRTAB_STE_0_NESTING_ALLOWED) ||
+ (arg->ste[1] & ~STRTAB_STE_1_NESTING_ALLOWED))
+ return -EIO;
+
+ cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(arg->ste[0]));
+ if (cfg != STRTAB_STE_0_CFG_ABORT && cfg != STRTAB_STE_0_CFG_BYPASS &&
+ cfg != STRTAB_STE_0_CFG_S1_TRANS)
+ return -EIO;
+ return 0;
+}
+
+static struct iommu_domain *
+arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
+ const struct iommu_user_data *user_data)
+{
+ struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core);
+ struct arm_smmu_nested_domain *nested_domain;
+ struct iommu_hwpt_arm_smmuv3 arg;
+ int ret;
+
+ if (flags)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ ret = iommu_copy_struct_from_user(&arg, user_data,
+ IOMMU_HWPT_DATA_ARM_SMMUV3, ste);
+ if (ret)
+ return ERR_PTR(ret);
+
+ ret = arm_smmu_validate_vste(&arg);
+ if (ret)
+ return ERR_PTR(ret);
+
+ nested_domain = kzalloc(sizeof(*nested_domain), GFP_KERNEL_ACCOUNT);
+ if (!nested_domain)
+ return ERR_PTR(-ENOMEM);
+
+ nested_domain->domain.type = IOMMU_DOMAIN_NESTED;
+ nested_domain->domain.ops = &arm_smmu_nested_ops;
+ nested_domain->vsmmu = vsmmu;
+ nested_domain->ste[0] = arg.ste[0];
+ nested_domain->ste[1] = arg.ste[1] & ~cpu_to_le64(STRTAB_STE_1_EATS);
+
+ return &nested_domain->domain;
+}
+
static const struct iommufd_viommu_ops arm_vsmmu_ops = {
+ .alloc_domain_nested = arm_vsmmu_alloc_domain_nested,
};
struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index c425fb923eb3de..53f12b9d78ab21 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -295,6 +295,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
case CMDQ_OP_TLBI_NH_ASID:
cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
fallthrough;
+ case CMDQ_OP_TLBI_NH_ALL:
case CMDQ_OP_TLBI_S12_VMALL:
cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
break;
@@ -2230,6 +2231,15 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
}
__arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain);
+ if (smmu_domain->nest_parent) {
+ /*
+ * When the S2 domain changes all the nested S1 ASIDs have to be
+ * flushed too.
+ */
+ cmd.opcode = CMDQ_OP_TLBI_NH_ALL;
+ arm_smmu_cmdq_issue_cmd_with_sync(smmu_domain->smmu, &cmd);
+ }
+
/*
* Unfortunately, this can't be leaf-only since we may have
* zapped an entire table.
@@ -2644,6 +2654,8 @@ to_smmu_domain_devices(struct iommu_domain *domain)
if ((domain->type & __IOMMU_DOMAIN_PAGING) ||
domain->type == IOMMU_DOMAIN_SVA)
return to_smmu_domain(domain);
+ if (domain->type == IOMMU_DOMAIN_NESTED)
+ return to_smmu_nested_domain(domain)->vsmmu->s2_parent;
return NULL;
}
@@ -2716,7 +2728,8 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state,
* enabled if we have arm_smmu_domain, those always have page
* tables.
*/
- state->ats_enabled = arm_smmu_ats_supported(master);
+ state->ats_enabled = !state->disable_ats &&
+ arm_smmu_ats_supported(master);
}
if (smmu_domain) {
@@ -3122,6 +3135,7 @@ arm_smmu_domain_alloc_user(struct device *dev, u32 flags,
goto err_free;
}
smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
+ smmu_domain->nest_parent = true;
}
smmu_domain->domain.type = IOMMU_DOMAIN_UNMANAGED;
@@ -3518,6 +3532,7 @@ static struct iommu_ops arm_smmu_ops = {
.page_response = arm_smmu_page_response,
.def_domain_type = arm_smmu_def_domain_type,
.viommu_alloc = arm_vsmmu_alloc,
+ .user_pasid_table = 1,
.pgsize_bitmap = -1UL, /* Restricted during device attach */
.owner = THIS_MODULE,
.default_domain_ops = &(const struct iommu_domain_ops) {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 3b8013afcec0de..3fabe187ea7815 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -244,6 +244,7 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
#define STRTAB_STE_0_CFG_BYPASS 4
#define STRTAB_STE_0_CFG_S1_TRANS 5
#define STRTAB_STE_0_CFG_S2_TRANS 6
+#define STRTAB_STE_0_CFG_NESTED 7
#define STRTAB_STE_0_S1FMT GENMASK_ULL(5, 4)
#define STRTAB_STE_0_S1FMT_LINEAR 0
@@ -295,6 +296,15 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
#define STRTAB_STE_3_S2TTB_MASK GENMASK_ULL(51, 4)
+/* These bits can be controlled by userspace for STRTAB_STE_0_CFG_NESTED */
+#define STRTAB_STE_0_NESTING_ALLOWED \
+ cpu_to_le64(STRTAB_STE_0_V | STRTAB_STE_0_CFG | STRTAB_STE_0_S1FMT | \
+ STRTAB_STE_0_S1CTXPTR_MASK | STRTAB_STE_0_S1CDMAX)
+#define STRTAB_STE_1_NESTING_ALLOWED \
+ cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR | \
+ STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH | \
+ STRTAB_STE_1_S1STALLD)
+
/*
* Context descriptors.
*
@@ -514,6 +524,7 @@ struct arm_smmu_cmdq_ent {
};
} cfgi;
+ #define CMDQ_OP_TLBI_NH_ALL 0x10
#define CMDQ_OP_TLBI_NH_ASID 0x11
#define CMDQ_OP_TLBI_NH_VA 0x12
#define CMDQ_OP_TLBI_EL2_ALL 0x20
@@ -815,10 +826,18 @@ struct arm_smmu_domain {
struct list_head devices;
spinlock_t devices_lock;
bool enforce_cache_coherency : 1;
+ bool nest_parent : 1;
struct mmu_notifier mmu_notifier;
};
+struct arm_smmu_nested_domain {
+ struct iommu_domain domain;
+ struct arm_vsmmu *vsmmu;
+
+ __le64 ste[2];
+};
+
/* The following are exposed for testing purposes. */
struct arm_smmu_entry_writer_ops;
struct arm_smmu_entry_writer {
@@ -863,6 +882,12 @@ static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
return container_of(dom, struct arm_smmu_domain, domain);
}
+static inline struct arm_smmu_nested_domain *
+to_smmu_nested_domain(struct iommu_domain *dom)
+{
+ return container_of(dom, struct arm_smmu_nested_domain, domain);
+}
+
extern struct xarray arm_smmu_asid_xa;
extern struct mutex arm_smmu_asid_lock;
@@ -909,6 +934,7 @@ struct arm_smmu_attach_state {
struct iommu_domain *old_domain;
struct arm_smmu_master *master;
bool cd_needs_ats;
+ bool disable_ats;
ioasid_t ssid;
/* Resulting state */
bool ats_enabled;
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 27c5117db985b2..47ee35ce050b63 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -396,6 +396,26 @@ struct iommu_hwpt_vtd_s1 {
__u32 __reserved;
};
+/**
+ * struct iommu_hwpt_arm_smmuv3 - ARM SMMUv3 nested STE
+ * (IOMMU_HWPT_DATA_ARM_SMMUV3)
+ *
+ * @ste: The first two double words of the user space Stream Table Entry for
+ * the translation. Must be little-endian.
+ * Allowed fields: (Refer to "5.2 Stream Table Entry" in SMMUv3 HW Spec)
+ * - word-0: V, Cfg, S1Fmt, S1ContextPtr, S1CDMax
+ * - word-1: S1DSS, S1CIR, S1COR, S1CSH, S1STALLD
+ *
+ * -EIO will be returned if @ste is not legal or contains any non-allowed field.
+ * Cfg can be used to select a S1, Bypass or Abort configuration. A Bypass
+ * nested domain will translate the same as the nesting parent. The S1 will
+ * install a Context Descriptor Table pointing at userspace memory translated
+ * by the nesting parent.
+ */
+struct iommu_hwpt_arm_smmuv3 {
+ __aligned_le64 ste[2];
+};
+
/**
* enum iommu_hwpt_data_type - IOMMU HWPT Data Type
* @IOMMU_HWPT_DATA_NONE: no data
--
2.43.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 10/12] iommu/arm-smmu-v3: Use S2FWB for NESTED domains
2024-10-31 0:20 [PATCH v4 00/12] Initial support for SMMUv3 nested translation Jason Gunthorpe
` (8 preceding siblings ...)
2024-10-31 0:20 ` [PATCH v4 09/12] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED Jason Gunthorpe
@ 2024-10-31 0:20 ` Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 11/12] iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED Jason Gunthorpe
` (4 subsequent siblings)
14 siblings, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2024-10-31 0:20 UTC (permalink / raw)
To: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon
Cc: Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
Force Write Back (FWB) changes how the S2 IOPTE's MemAttr field
works. When S2FWB is supported and enabled the IOPTE will force cachable
access to IOMMU_CACHE memory when nesting with a S1 and deny cachable
access when !IOMMU_CACHE.
When using a single stage of translation, a simple S2 domain, it doesn't
change things for PCI devices as it is just a different encoding for the
existing mapping of the IOMMU protection flags to cachability attributes.
For non-PCI it also changes the combining rules when incoming transactions
have inconsistent attributes.
However, when used with a nested S1, FWB has the effect of preventing the
guest from choosing a MemAttr in it's S1 that would cause ordinary DMA to
bypass the cache. Consistent with KVM we wish to deny the guest the
ability to become incoherent with cached memory the hypervisor believes is
cachable so we don't have to flush it.
Allow NESTED domains to be created if the SMMU has S2FWB support and use
S2FWB for NESTING_PARENTS. This is an additional option to CANWBS.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 7 +++--
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 +++++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 +++
drivers/iommu/io-pgtable-arm.c | 27 ++++++++++++++-----
include/linux/io-pgtable.h | 2 ++
5 files changed, 38 insertions(+), 9 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index 0b9fffc5b2f09b..b835ecce7f611d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -214,9 +214,12 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev,
* Must support some way to prevent the VM from bypassing the cache
* because VFIO currently does not do any cache maintenance. canwbs
* indicates the device is fully coherent and no cache maintenance is
- * ever required, even for PCI No-Snoop.
+ * ever required, even for PCI No-Snoop. S2FWB means the S1 can't make
+ * things non-coherent using the memattr, but No-Snoop behavior is not
+ * effected.
*/
- if (!arm_smmu_master_canwbs(master))
+ if (!arm_smmu_master_canwbs(master) &&
+ !(smmu->features & ARM_SMMU_FEAT_S2FWB))
return ERR_PTR(-EOPNOTSUPP);
vsmmu = iommufd_viommu_alloc(ictx, struct arm_vsmmu, core,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 53f12b9d78ab21..de598d66b5c272 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1046,7 +1046,8 @@ void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
/* S2 translates */
if (cfg & BIT(1)) {
used_bits[1] |=
- cpu_to_le64(STRTAB_STE_1_EATS | STRTAB_STE_1_SHCFG);
+ cpu_to_le64(STRTAB_STE_1_S2FWB | STRTAB_STE_1_EATS |
+ STRTAB_STE_1_SHCFG);
used_bits[2] |=
cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR |
STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI |
@@ -1654,6 +1655,8 @@ void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
FIELD_PREP(STRTAB_STE_1_EATS,
ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0));
+ if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_S2FWB)
+ target->data[1] |= cpu_to_le64(STRTAB_STE_1_S2FWB);
if (smmu->features & ARM_SMMU_FEAT_ATTR_TYPES_OVR)
target->data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG,
STRTAB_STE_1_SHCFG_INCOMING));
@@ -2472,6 +2475,9 @@ static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain,
pgtbl_cfg.oas = smmu->oas;
fmt = ARM_64_LPAE_S2;
finalise_stage_fn = arm_smmu_domain_finalise_s2;
+ if ((smmu->features & ARM_SMMU_FEAT_S2FWB) &&
+ (flags & IOMMU_HWPT_ALLOC_NEST_PARENT))
+ pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_S2FWB;
break;
default:
return -EINVAL;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 3fabe187ea7815..5a025d310dbeb5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -58,6 +58,7 @@ struct arm_smmu_device;
#define IDR1_SIDSIZE GENMASK(5, 0)
#define ARM_SMMU_IDR3 0xc
+#define IDR3_FWB (1 << 8)
#define IDR3_RIL (1 << 10)
#define ARM_SMMU_IDR5 0x14
@@ -265,6 +266,7 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
#define STRTAB_STE_1_S1COR GENMASK_ULL(5, 4)
#define STRTAB_STE_1_S1CSH GENMASK_ULL(7, 6)
+#define STRTAB_STE_1_S2FWB (1UL << 25)
#define STRTAB_STE_1_S1STALLD (1UL << 27)
#define STRTAB_STE_1_EATS GENMASK_ULL(29, 28)
@@ -740,6 +742,7 @@ struct arm_smmu_device {
#define ARM_SMMU_FEAT_ATTR_TYPES_OVR (1 << 20)
#define ARM_SMMU_FEAT_HA (1 << 21)
#define ARM_SMMU_FEAT_HD (1 << 22)
+#define ARM_SMMU_FEAT_S2FWB (1 << 23)
u32 features;
#define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 0e67f1721a3d98..74f58c6ac30cbd 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -106,6 +106,18 @@
#define ARM_LPAE_PTE_HAP_FAULT (((arm_lpae_iopte)0) << 6)
#define ARM_LPAE_PTE_HAP_READ (((arm_lpae_iopte)1) << 6)
#define ARM_LPAE_PTE_HAP_WRITE (((arm_lpae_iopte)2) << 6)
+/*
+ * For !FWB these code to:
+ * 1111 = Normal outer write back cachable / Inner Write Back Cachable
+ * Permit S1 to override
+ * 0101 = Normal Non-cachable / Inner Non-cachable
+ * 0001 = Device / Device-nGnRE
+ * For S2FWB these code:
+ * 0110 Force Normal Write Back
+ * 0101 Normal* is forced Normal-NC, Device unchanged
+ * 0001 Force Device-nGnRE
+ */
+#define ARM_LPAE_PTE_MEMATTR_FWB_WB (((arm_lpae_iopte)0x6) << 2)
#define ARM_LPAE_PTE_MEMATTR_OIWB (((arm_lpae_iopte)0xf) << 2)
#define ARM_LPAE_PTE_MEMATTR_NC (((arm_lpae_iopte)0x5) << 2)
#define ARM_LPAE_PTE_MEMATTR_DEV (((arm_lpae_iopte)0x1) << 2)
@@ -458,12 +470,16 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
*/
if (data->iop.fmt == ARM_64_LPAE_S2 ||
data->iop.fmt == ARM_32_LPAE_S2) {
- if (prot & IOMMU_MMIO)
+ if (prot & IOMMU_MMIO) {
pte |= ARM_LPAE_PTE_MEMATTR_DEV;
- else if (prot & IOMMU_CACHE)
- pte |= ARM_LPAE_PTE_MEMATTR_OIWB;
- else
+ } else if (prot & IOMMU_CACHE) {
+ if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_S2FWB)
+ pte |= ARM_LPAE_PTE_MEMATTR_FWB_WB;
+ else
+ pte |= ARM_LPAE_PTE_MEMATTR_OIWB;
+ } else {
pte |= ARM_LPAE_PTE_MEMATTR_NC;
+ }
} else {
if (prot & IOMMU_MMIO)
pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV
@@ -1035,8 +1051,7 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
struct arm_lpae_io_pgtable *data;
typeof(&cfg->arm_lpae_s2_cfg.vtcr) vtcr = &cfg->arm_lpae_s2_cfg.vtcr;
- /* The NS quirk doesn't apply at stage 2 */
- if (cfg->quirks)
+ if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_S2FWB))
return NULL;
data = arm_lpae_alloc_pgtable(cfg);
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index b1ecfc3cd5bcc0..ce86b09ae80f18 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -87,6 +87,7 @@ struct io_pgtable_cfg {
* attributes set in the TCR for a non-coherent page-table walker.
*
* IO_PGTABLE_QUIRK_ARM_HD: Enables dirty tracking in stage 1 pagetable.
+ * IO_PGTABLE_QUIRK_ARM_S2FWB: Use the FWB format for the MemAttrs bits
*/
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS BIT(1)
@@ -95,6 +96,7 @@ struct io_pgtable_cfg {
#define IO_PGTABLE_QUIRK_ARM_TTBR1 BIT(5)
#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6)
#define IO_PGTABLE_QUIRK_ARM_HD BIT(7)
+ #define IO_PGTABLE_QUIRK_ARM_S2FWB BIT(8)
unsigned long quirks;
unsigned long pgsize_bitmap;
unsigned int ias;
--
2.43.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 11/12] iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED
2024-10-31 0:20 [PATCH v4 00/12] Initial support for SMMUv3 nested translation Jason Gunthorpe
` (9 preceding siblings ...)
2024-10-31 0:20 ` [PATCH v4 10/12] iommu/arm-smmu-v3: Use S2FWB for NESTED domains Jason Gunthorpe
@ 2024-10-31 0:20 ` Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 12/12] iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object Jason Gunthorpe
` (3 subsequent siblings)
14 siblings, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2024-10-31 0:20 UTC (permalink / raw)
To: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon
Cc: Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
The EATS flag needs to flow through the vSTE and into the pSTE, and ensure
physical ATS is enabled on the PCI device.
The physical ATS state must match the VM's idea of EATS as we rely on the
VM to issue the ATS invalidation commands. Thus ATS must remain off at the
device until EATS on a nesting domain turns it on. Attaching a nesting
domain is the point where the invalidation responsibility transfers to
userspace.
Update the ATS logic to track EATS for nesting domains and flush the
ATC whenever the S2 nesting parent changes.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 31 ++++++++++++++++---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 +++++++++++++---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 4 ++-
include/uapi/linux/iommufd.h | 2 +-
4 files changed, 53 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index b835ecce7f611d..ab515706d48463 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -95,8 +95,6 @@ static int arm_smmu_attach_dev_nested(struct iommu_domain *domain,
.master = master,
.old_domain = iommu_get_domain_for_dev(dev),
.ssid = IOMMU_NO_PASID,
- /* Currently invalidation of ATC is not supported */
- .disable_ats = true,
};
struct arm_smmu_ste ste;
int ret;
@@ -107,6 +105,15 @@ static int arm_smmu_attach_dev_nested(struct iommu_domain *domain,
return -EBUSY;
mutex_lock(&arm_smmu_asid_lock);
+ /*
+ * The VM has to control the actual ATS state at the PCI device because
+ * we forward the invalidations directly from the VM. If the VM doesn't
+ * think ATS is on it will not generate ATC flushes and the ATC will
+ * become incoherent. Since we can't access the actual virtual PCI ATS
+ * config bit here base this off the EATS value in the STE. If the EATS
+ * is set then the VM must generate ATC flushes.
+ */
+ state.disable_ats = !nested_domain->enable_ats;
ret = arm_smmu_attach_prepare(&state, domain);
if (ret) {
mutex_unlock(&arm_smmu_asid_lock);
@@ -131,8 +138,10 @@ static const struct iommu_domain_ops arm_smmu_nested_ops = {
.free = arm_smmu_domain_nested_free,
};
-static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg)
+static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg,
+ bool *enable_ats)
{
+ unsigned int eats;
unsigned int cfg;
if (!(arg->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) {
@@ -149,6 +158,18 @@ static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg)
if (cfg != STRTAB_STE_0_CFG_ABORT && cfg != STRTAB_STE_0_CFG_BYPASS &&
cfg != STRTAB_STE_0_CFG_S1_TRANS)
return -EIO;
+
+ /*
+ * Only Full ATS or ATS UR is supported
+ * The EATS field will be set by arm_smmu_make_nested_domain_ste()
+ */
+ eats = FIELD_GET(STRTAB_STE_1_EATS, le64_to_cpu(arg->ste[1]));
+ arg->ste[1] &= ~cpu_to_le64(STRTAB_STE_1_EATS);
+ if (eats != STRTAB_STE_1_EATS_ABT && eats != STRTAB_STE_1_EATS_TRANS)
+ return -EIO;
+
+ if (cfg == STRTAB_STE_0_CFG_S1_TRANS)
+ *enable_ats = (eats == STRTAB_STE_1_EATS_TRANS);
return 0;
}
@@ -159,6 +180,7 @@ arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core);
struct arm_smmu_nested_domain *nested_domain;
struct iommu_hwpt_arm_smmuv3 arg;
+ bool enable_ats = false;
int ret;
if (flags)
@@ -169,7 +191,7 @@ arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
if (ret)
return ERR_PTR(ret);
- ret = arm_smmu_validate_vste(&arg);
+ ret = arm_smmu_validate_vste(&arg, &enable_ats);
if (ret)
return ERR_PTR(ret);
@@ -179,6 +201,7 @@ arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
nested_domain->domain.type = IOMMU_DOMAIN_NESTED;
nested_domain->domain.ops = &arm_smmu_nested_ops;
+ nested_domain->enable_ats = enable_ats;
nested_domain->vsmmu = vsmmu;
nested_domain->ste[0] = arg.ste[0];
nested_domain->ste[1] = arg.ste[1] & ~cpu_to_le64(STRTAB_STE_1_EATS);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index de598d66b5c272..b47f80224781ba 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2107,7 +2107,16 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
if (!master->ats_enabled)
continue;
- arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, &cmd);
+ if (master_domain->nested_ats_flush) {
+ /*
+ * If a S2 used as a nesting parent is changed we have
+ * no option but to completely flush the ATC.
+ */
+ arm_smmu_atc_inv_to_cmd(IOMMU_NO_PASID, 0, 0, &cmd);
+ } else {
+ arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size,
+ &cmd);
+ }
for (i = 0; i < master->num_streams; i++) {
cmd.atc.sid = master->streams[i].id;
@@ -2631,7 +2640,7 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
static struct arm_smmu_master_domain *
arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain,
struct arm_smmu_master *master,
- ioasid_t ssid)
+ ioasid_t ssid, bool nested_ats_flush)
{
struct arm_smmu_master_domain *master_domain;
@@ -2640,7 +2649,8 @@ arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain,
list_for_each_entry(master_domain, &smmu_domain->devices,
devices_elm) {
if (master_domain->master == master &&
- master_domain->ssid == ssid)
+ master_domain->ssid == ssid &&
+ master_domain->nested_ats_flush == nested_ats_flush)
return master_domain;
}
return NULL;
@@ -2671,13 +2681,18 @@ static void arm_smmu_remove_master_domain(struct arm_smmu_master *master,
{
struct arm_smmu_domain *smmu_domain = to_smmu_domain_devices(domain);
struct arm_smmu_master_domain *master_domain;
+ bool nested_ats_flush = false;
unsigned long flags;
if (!smmu_domain)
return;
+ if (domain->type == IOMMU_DOMAIN_NESTED)
+ nested_ats_flush = to_smmu_nested_domain(domain)->enable_ats;
+
spin_lock_irqsave(&smmu_domain->devices_lock, flags);
- master_domain = arm_smmu_find_master_domain(smmu_domain, master, ssid);
+ master_domain = arm_smmu_find_master_domain(smmu_domain, master, ssid,
+ nested_ats_flush);
if (master_domain) {
list_del(&master_domain->devices_elm);
kfree(master_domain);
@@ -2744,6 +2759,9 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state,
return -ENOMEM;
master_domain->master = master;
master_domain->ssid = state->ssid;
+ if (new_domain->type == IOMMU_DOMAIN_NESTED)
+ master_domain->nested_ats_flush =
+ to_smmu_nested_domain(new_domain)->enable_ats;
/*
* During prepare we want the current smmu_domain and new
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 5a025d310dbeb5..01c1d16dc0c81a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -305,7 +305,7 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
#define STRTAB_STE_1_NESTING_ALLOWED \
cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR | \
STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH | \
- STRTAB_STE_1_S1STALLD)
+ STRTAB_STE_1_S1STALLD | STRTAB_STE_1_EATS)
/*
* Context descriptors.
@@ -837,6 +837,7 @@ struct arm_smmu_domain {
struct arm_smmu_nested_domain {
struct iommu_domain domain;
struct arm_vsmmu *vsmmu;
+ bool enable_ats : 1;
__le64 ste[2];
};
@@ -878,6 +879,7 @@ struct arm_smmu_master_domain {
struct list_head devices_elm;
struct arm_smmu_master *master;
ioasid_t ssid;
+ bool nested_ats_flush : 1;
};
static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 47ee35ce050b63..125b51b78ad8f9 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -404,7 +404,7 @@ struct iommu_hwpt_vtd_s1 {
* the translation. Must be little-endian.
* Allowed fields: (Refer to "5.2 Stream Table Entry" in SMMUv3 HW Spec)
* - word-0: V, Cfg, S1Fmt, S1ContextPtr, S1CDMax
- * - word-1: S1DSS, S1CIR, S1COR, S1CSH, S1STALLD
+ * - word-1: EATS, S1DSS, S1CIR, S1COR, S1CSH, S1STALLD
*
* -EIO will be returned if @ste is not legal or contains any non-allowed field.
* Cfg can be used to select a S1, Bypass or Abort configuration. A Bypass
--
2.43.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 12/12] iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object
2024-10-31 0:20 [PATCH v4 00/12] Initial support for SMMUv3 nested translation Jason Gunthorpe
` (10 preceding siblings ...)
2024-10-31 0:20 ` [PATCH v4 11/12] iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED Jason Gunthorpe
@ 2024-10-31 0:20 ` Jason Gunthorpe
2024-10-31 0:53 ` [PATCH v4 00/12] Initial support for SMMUv3 nested translation Nicolin Chen
` (2 subsequent siblings)
14 siblings, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2024-10-31 0:20 UTC (permalink / raw)
To: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon
Cc: Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
From: Nicolin Chen <nicolinc@nvidia.com>
Implement the vIOMMU's cache_invalidate op for user space to invalidate
the IOTLB entries, Device ATS and CD entries that are cached by hardware.
Add struct iommu_viommu_arm_smmuv3_invalidate defining invalidation
entries that are simply in the native format of a 128-bit TLBI
command. Scan those commands against the permitted command list and fix
their VMID/SID fields to match what is stored in the vIOMMU.
Co-developed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 134 ++++++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 6 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 +
include/uapi/linux/iommufd.h | 24 ++++
4 files changed, 166 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index ab515706d48463..2cfa1557817bc1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -209,8 +209,134 @@ arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
return &nested_domain->domain;
}
+static int arm_vsmmu_vsid_to_sid(struct arm_vsmmu *vsmmu, u32 vsid, u32 *sid)
+{
+ struct arm_smmu_master *master;
+ struct device *dev;
+ int ret = 0;
+
+ xa_lock(&vsmmu->core.vdevs);
+ dev = iommufd_viommu_find_dev(&vsmmu->core, (unsigned long)vsid);
+ if (!dev) {
+ ret = -EIO;
+ goto unlock;
+ }
+ master = dev_iommu_priv_get(dev);
+
+ /* At this moment, iommufd only supports PCI device that has one SID */
+ if (sid)
+ *sid = master->streams[0].id;
+unlock:
+ xa_unlock(&vsmmu->core.vdevs);
+ return ret;
+}
+
+/* This is basically iommu_viommu_arm_smmuv3_invalidate in u64 for conversion */
+struct arm_vsmmu_invalidation_cmd {
+ union {
+ u64 cmd[2];
+ struct iommu_viommu_arm_smmuv3_invalidate ucmd;
+ };
+};
+
+/*
+ * Convert, in place, the raw invalidation command into an internal format that
+ * can be passed to arm_smmu_cmdq_issue_cmdlist(). Internally commands are
+ * stored in CPU endian.
+ *
+ * Enforce the VMID or SID on the command.
+ */
+static int arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu,
+ struct arm_vsmmu_invalidation_cmd *cmd)
+{
+ /* Commands are le64 stored in u64 */
+ cmd->cmd[0] = le64_to_cpu(cmd->ucmd.cmd[0]);
+ cmd->cmd[1] = le64_to_cpu(cmd->ucmd.cmd[1]);
+
+ switch (cmd->cmd[0] & CMDQ_0_OP) {
+ case CMDQ_OP_TLBI_NSNH_ALL:
+ /* Convert to NH_ALL */
+ cmd->cmd[0] = CMDQ_OP_TLBI_NH_ALL |
+ FIELD_PREP(CMDQ_TLBI_0_VMID, vsmmu->vmid);
+ cmd->cmd[1] = 0;
+ break;
+ case CMDQ_OP_TLBI_NH_VA:
+ case CMDQ_OP_TLBI_NH_VAA:
+ case CMDQ_OP_TLBI_NH_ALL:
+ case CMDQ_OP_TLBI_NH_ASID:
+ cmd->cmd[0] &= ~CMDQ_TLBI_0_VMID;
+ cmd->cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vsmmu->vmid);
+ break;
+ case CMDQ_OP_ATC_INV:
+ case CMDQ_OP_CFGI_CD:
+ case CMDQ_OP_CFGI_CD_ALL: {
+ u32 sid, vsid = FIELD_GET(CMDQ_CFGI_0_SID, cmd->cmd[0]);
+
+ if (arm_vsmmu_vsid_to_sid(vsmmu, vsid, &sid))
+ return -EIO;
+ cmd->cmd[0] &= ~CMDQ_CFGI_0_SID;
+ cmd->cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, sid);
+ break;
+ }
+ default:
+ return -EIO;
+ }
+ return 0;
+}
+
+static int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
+ struct iommu_user_data_array *array)
+{
+ struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core);
+ struct arm_smmu_device *smmu = vsmmu->smmu;
+ struct arm_vsmmu_invalidation_cmd *last;
+ struct arm_vsmmu_invalidation_cmd *cmds;
+ struct arm_vsmmu_invalidation_cmd *cur;
+ struct arm_vsmmu_invalidation_cmd *end;
+ int ret;
+
+ cmds = kcalloc(array->entry_num, sizeof(*cmds), GFP_KERNEL);
+ if (!cmds)
+ return -ENOMEM;
+ cur = cmds;
+ end = cmds + array->entry_num;
+
+ static_assert(sizeof(*cmds) == 2 * sizeof(u64));
+ ret = iommu_copy_struct_from_full_user_array(
+ cmds, sizeof(*cmds), array,
+ IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3);
+ if (ret)
+ goto out;
+
+ last = cmds;
+ while (cur != end) {
+ ret = arm_vsmmu_convert_user_cmd(vsmmu, cur);
+ if (ret)
+ goto out;
+
+ /* FIXME work in blocks of CMDQ_BATCH_ENTRIES and copy each block? */
+ cur++;
+ if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1)
+ continue;
+
+ /* FIXME always uses the main cmdq rather than trying to group by type */
+ ret = arm_smmu_cmdq_issue_cmdlist(smmu, &smmu->cmdq, last->cmd,
+ cur - last, true);
+ if (ret) {
+ cur--;
+ goto out;
+ }
+ last = cur;
+ }
+out:
+ array->entry_num = cur - cmds;
+ kfree(cmds);
+ return ret;
+}
+
static const struct iommufd_viommu_ops arm_vsmmu_ops = {
.alloc_domain_nested = arm_vsmmu_alloc_domain_nested,
+ .cache_invalidate = arm_vsmmu_cache_invalidate,
};
struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev,
@@ -233,6 +359,14 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev,
if (s2_parent->smmu != master->smmu)
return ERR_PTR(-EINVAL);
+ /*
+ * FORCE_SYNC is not set with FEAT_NESTING. Some study of the exact HW
+ * defect is needed to determine if arm_vsmmu_cache_invalidate() needs
+ * any change to remove this.
+ */
+ if (WARN_ON(smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC))
+ return ERR_PTR(-EOPNOTSUPP);
+
/*
* Must support some way to prevent the VM from bypassing the cache
* because VFIO currently does not do any cache maintenance. canwbs
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index b47f80224781ba..2a9f2d1d3ed910 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -766,9 +766,9 @@ static void arm_smmu_cmdq_write_entries(struct arm_smmu_cmdq *cmdq, u64 *cmds,
* insert their own list of commands then all of the commands from one
* CPU will appear before any of the commands from the other CPU.
*/
-static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
- struct arm_smmu_cmdq *cmdq,
- u64 *cmds, int n, bool sync)
+int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmdq *cmdq, u64 *cmds, int n,
+ bool sync)
{
u64 cmd_sync[CMDQ_ENT_DWORDS];
u32 prod;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 01c1d16dc0c81a..af25f092303f10 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -529,6 +529,7 @@ struct arm_smmu_cmdq_ent {
#define CMDQ_OP_TLBI_NH_ALL 0x10
#define CMDQ_OP_TLBI_NH_ASID 0x11
#define CMDQ_OP_TLBI_NH_VA 0x12
+ #define CMDQ_OP_TLBI_NH_VAA 0x13
#define CMDQ_OP_TLBI_EL2_ALL 0x20
#define CMDQ_OP_TLBI_EL2_ASID 0x21
#define CMDQ_OP_TLBI_EL2_VA 0x22
@@ -951,6 +952,10 @@ void arm_smmu_attach_commit(struct arm_smmu_attach_state *state);
void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master,
const struct arm_smmu_ste *target);
+int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmdq *cmdq, u64 *cmds, int n,
+ bool sync);
+
#ifdef CONFIG_ARM_SMMU_V3_SVA
bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
bool arm_smmu_master_sva_supported(struct arm_smmu_master *master);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 125b51b78ad8f9..2a492e054fb7c9 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -688,9 +688,11 @@ struct iommu_hwpt_get_dirty_bitmap {
* enum iommu_hwpt_invalidate_data_type - IOMMU HWPT Cache Invalidation
* Data Type
* @IOMMU_HWPT_INVALIDATE_DATA_VTD_S1: Invalidation data for VTD_S1
+ * @IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3: Invalidation data for ARM SMMUv3
*/
enum iommu_hwpt_invalidate_data_type {
IOMMU_HWPT_INVALIDATE_DATA_VTD_S1 = 0,
+ IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3 = 1,
};
/**
@@ -729,6 +731,28 @@ struct iommu_hwpt_vtd_s1_invalidate {
__u32 __reserved;
};
+/**
+ * struct iommu_viommu_arm_smmuv3_invalidate - ARM SMMUv3 cahce invalidation
+ * (IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3)
+ * @cmd: 128-bit cache invalidation command that runs in SMMU CMDQ.
+ * Must be little-endian.
+ *
+ * Supported command list only when passing in a vIOMMU via @hwpt_id:
+ * CMDQ_OP_TLBI_NSNH_ALL
+ * CMDQ_OP_TLBI_NH_VA
+ * CMDQ_OP_TLBI_NH_VAA
+ * CMDQ_OP_TLBI_NH_ALL
+ * CMDQ_OP_TLBI_NH_ASID
+ * CMDQ_OP_ATC_INV
+ * CMDQ_OP_CFGI_CD
+ * CMDQ_OP_CFGI_CD_ALL
+ *
+ * -EIO will be returned if the command is not supported.
+ */
+struct iommu_viommu_arm_smmuv3_invalidate {
+ __aligned_le64 cmd[2];
+};
+
/**
* struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
* @size: sizeof(struct iommu_hwpt_invalidate)
--
2.43.0
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2024-10-31 0:20 [PATCH v4 00/12] Initial support for SMMUv3 nested translation Jason Gunthorpe
` (11 preceding siblings ...)
2024-10-31 0:20 ` [PATCH v4 12/12] iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object Jason Gunthorpe
@ 2024-10-31 0:53 ` Nicolin Chen
2024-11-01 12:18 ` Will Deacon
2024-11-12 18:29 ` Jason Gunthorpe
14 siblings, 0 replies; 58+ messages in thread
From: Nicolin Chen @ 2024-10-31 0:53 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote:
> The vIOMMU series is essential to allow the invalidations to be processed
> for the CD as well.
>
> It is enough to allow qemu work to progress.
>
> This is on github: https://github.com/jgunthorpe/linux/commits/smmuv3_nesting
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
With a branch adding RMR changes on top of this smmuv3_nesting:
https://github.com/nicolinc/iommufd/commits/smmuv3_nesting-with-rmr
and with an *updated* paring QEMU branch:
https://github.com/nicolinc/qemu/commits/wip/for_smmuv3_nesting-v4
For folks who are interested in testing, please use this QEMU
branch, as there are uAPI changes against previous verions.
Thanks
Nicolin
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 09/12] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED
2024-10-31 0:20 ` [PATCH v4 09/12] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED Jason Gunthorpe
@ 2024-10-31 6:21 ` Zhangfei Gao
2024-11-04 17:19 ` Jason Gunthorpe
0 siblings, 1 reply; 58+ messages in thread
From: Zhangfei Gao @ 2024-10-31 6:21 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On Thu, 31 Oct 2024 at 08:21, Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> For SMMUv3 a IOMMU_DOMAIN_NESTED is composed of a S2 iommu_domain acting
> as the parent and a user provided STE fragment that defines the CD table
> and related data with addresses translated by the S2 iommu_domain.
>
> The kernel only permits userspace to control certain allowed bits of the
> STE that are safe for user/guest control.
>
> IOTLB maintenance is a bit subtle here, the S1 implicitly includes the S2
> translation, but there is no way of knowing which S1 entries refer to a
> range of S2.
>
> For the IOTLB we follow ARM's guidance and issue a CMDQ_OP_TLBI_NH_ALL to
> flush all ASIDs from the VMID after flushing the S2 on any change to the
> S2.
>
> The IOMMU_DOMAIN_NESTED can only be created from inside a VIOMMU as the
> invalidation path relies on the VIOMMU to translate virtual stream ID used
> in the invalidation commands for the CD table and ATS.
>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
> Reviewed-by: Donald Dutile <ddutile@redhat.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 157 ++++++++++++++++++
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 17 +-
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 26 +++
> include/uapi/linux/iommufd.h | 20 +++
> 4 files changed, 219 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
> index 60dd9e90759571..0b9fffc5b2f09b 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
> @@ -30,7 +30,164 @@ void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type)
> return info;
> }
>
> +static void arm_smmu_make_nested_cd_table_ste(
> + struct arm_smmu_ste *target, struct arm_smmu_master *master,
> + struct arm_smmu_nested_domain *nested_domain, bool ats_enabled)
> +{
> + arm_smmu_make_s2_domain_ste(
> + target, master, nested_domain->vsmmu->s2_parent, ats_enabled);
> +
> + target->data[0] = cpu_to_le64(STRTAB_STE_0_V |
> + FIELD_PREP(STRTAB_STE_0_CFG,
> + STRTAB_STE_0_CFG_NESTED));
> + target->data[0] |= nested_domain->ste[0] &
> + ~cpu_to_le64(STRTAB_STE_0_CFG);
> + target->data[1] |= nested_domain->ste[1];
> +}
> +
> +/*
> + * Create a physical STE from the virtual STE that userspace provided when it
> + * created the nested domain. Using the vSTE userspace can request:
> + * - Non-valid STE
> + * - Abort STE
> + * - Bypass STE (install the S2, no CD table)
> + * - CD table STE (install the S2 and the userspace CD table)
> + */
> +static void arm_smmu_make_nested_domain_ste(
> + struct arm_smmu_ste *target, struct arm_smmu_master *master,
> + struct arm_smmu_nested_domain *nested_domain, bool ats_enabled)
> +{
> + unsigned int cfg =
> + FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(nested_domain->ste[0]));
> +
> + /*
> + * Userspace can request a non-valid STE through the nesting interface.
> + * We relay that into an abort physical STE with the intention that
> + * C_BAD_STE for this SID can be generated to userspace.
> + */
> + if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V)))
> + cfg = STRTAB_STE_0_CFG_ABORT;
> +
> + switch (cfg) {
> + case STRTAB_STE_0_CFG_S1_TRANS:
> + arm_smmu_make_nested_cd_table_ste(target, master, nested_domain,
> + ats_enabled);
> + break;
> + case STRTAB_STE_0_CFG_BYPASS:
> + arm_smmu_make_s2_domain_ste(target, master,
> + nested_domain->vsmmu->s2_parent,
> + ats_enabled);
> + break;
> + case STRTAB_STE_0_CFG_ABORT:
> + default:
> + arm_smmu_make_abort_ste(target);
> + break;
> + }
> +}
> +
> +static int arm_smmu_attach_dev_nested(struct iommu_domain *domain,
> + struct device *dev)
> +{
> + struct arm_smmu_nested_domain *nested_domain =
> + to_smmu_nested_domain(domain);
> + struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> + struct arm_smmu_attach_state state = {
> + .master = master,
> + .old_domain = iommu_get_domain_for_dev(dev),
> + .ssid = IOMMU_NO_PASID,
> + /* Currently invalidation of ATC is not supported */
> + .disable_ats = true,
> + };
> + struct arm_smmu_ste ste;
> + int ret;
> +
> + if (nested_domain->vsmmu->smmu != master->smmu)
> + return -EINVAL;
> + if (arm_smmu_ssids_in_use(&master->cd_table))
> + return -EBUSY;
> +
> + mutex_lock(&arm_smmu_asid_lock);
> + ret = arm_smmu_attach_prepare(&state, domain);
> + if (ret) {
> + mutex_unlock(&arm_smmu_asid_lock);
> + return ret;
> + }
> +
> + arm_smmu_make_nested_domain_ste(&ste, master, nested_domain,
> + state.ats_enabled);
> + arm_smmu_install_ste_for_dev(master, &ste);
> + arm_smmu_attach_commit(&state);
> + mutex_unlock(&arm_smmu_asid_lock);
> + return 0;
> +}
> +
> +static void arm_smmu_domain_nested_free(struct iommu_domain *domain)
> +{
> + kfree(to_smmu_nested_domain(domain));
> +}
> +
> +static const struct iommu_domain_ops arm_smmu_nested_ops = {
> + .attach_dev = arm_smmu_attach_dev_nested,
> + .free = arm_smmu_domain_nested_free,
> +};
> +
> +static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg)
> +{
> + unsigned int cfg;
> +
> + if (!(arg->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) {
> + memset(arg->ste, 0, sizeof(arg->ste));
> + return 0;
> + }
> +
> + /* EIO is reserved for invalid STE data. */
> + if ((arg->ste[0] & ~STRTAB_STE_0_NESTING_ALLOWED) ||
> + (arg->ste[1] & ~STRTAB_STE_1_NESTING_ALLOWED))
> + return -EIO;
> +
> + cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(arg->ste[0]));
> + if (cfg != STRTAB_STE_0_CFG_ABORT && cfg != STRTAB_STE_0_CFG_BYPASS &&
> + cfg != STRTAB_STE_0_CFG_S1_TRANS)
> + return -EIO;
> + return 0;
> +}
> +
> +static struct iommu_domain *
> +arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
> + const struct iommu_user_data *user_data)
> +{
> + struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core);
> + struct arm_smmu_nested_domain *nested_domain;
> + struct iommu_hwpt_arm_smmuv3 arg;
> + int ret;
> +
> + if (flags)
> + return ERR_PTR(-EOPNOTSUPP);
This check fails when using user page fault, with flags =
IOMMU_HWPT_FAULT_ID_VALID (4)
Strange, the check is not exist in last version?
iommufd_viommu_alloc_hwpt_nested ->
viommu->ops->alloc_domain_nested(viommu, flags, user_data) ->
arm_vsmmu_alloc_domain_nested
Thanks
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2024-10-31 0:20 [PATCH v4 00/12] Initial support for SMMUv3 nested translation Jason Gunthorpe
` (12 preceding siblings ...)
2024-10-31 0:53 ` [PATCH v4 00/12] Initial support for SMMUv3 nested translation Nicolin Chen
@ 2024-11-01 12:18 ` Will Deacon
2024-11-01 13:25 ` Jason Gunthorpe
2024-11-12 18:29 ` Jason Gunthorpe
14 siblings, 1 reply; 58+ messages in thread
From: Will Deacon @ 2024-11-01 12:18 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote:
> [This is now based on Nicolin's iommufd patches for vIOMMU and will need
> to go through the iommufd tree, please ack]
Can't we separate out the SMMUv3 driver changes? They shouldn't depend on
Nicolin's work afaict.
Will
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2024-11-01 12:18 ` Will Deacon
@ 2024-11-01 13:25 ` Jason Gunthorpe
2024-11-05 16:48 ` Will Deacon
0 siblings, 1 reply; 58+ messages in thread
From: Jason Gunthorpe @ 2024-11-01 13:25 UTC (permalink / raw)
To: Will Deacon
Cc: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
On Fri, Nov 01, 2024 at 12:18:50PM +0000, Will Deacon wrote:
> On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote:
> > [This is now based on Nicolin's iommufd patches for vIOMMU and will need
> > to go through the iommufd tree, please ack]
>
> Can't we separate out the SMMUv3 driver changes? They shouldn't depend on
> Nicolin's work afaict.
We can put everything before "iommu/arm-smmu-v3: Support
IOMMU_VIOMMU_ALLOC" directly on a rc and share a branch with your tree.
That patch and following all depend on Nicolin's work, as they rely on
the VIOMMU due to how different ARM is from Intel.
How about you take these patches:
[PATCH v4 01/12] vfio: Remove VFIO_TYPE1_NESTING_IOMMU
[PATCH v4 02/12] ACPICA: IORT: Update for revision E.f
[PATCH v4 03/12] ACPI/IORT: Support CANWBS memory access flag
[PATCH v4 04/12] iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS
[PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
[PATCH v4 06/12] iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT
[PATCH v4 07/12] iommu/arm-smmu-v3: Expose the arm_smmu_attach interface
Onto a branch.
I'll take these patches after merging your branch and Nicolin's:
[PATCH v4 08/12] iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC
[PATCH v4 09/12] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED
[PATCH v4 10/12] iommu/arm-smmu-v3: Use S2FWB for NESTED domains
[PATCH v4 11/12] iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED
[PATCH v4 12/12] iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object
?
I can also probably push most of S2FWB and ATS into the first batch.
Please let me know, I would like this to be done this cycle, Nicolin's
vIOMMU series are all reviewed now.
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
2024-10-31 0:20 ` [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info Jason Gunthorpe
@ 2024-11-04 11:47 ` Will Deacon
2024-11-04 12:41 ` Jason Gunthorpe
0 siblings, 1 reply; 58+ messages in thread
From: Will Deacon @ 2024-11-04 11:47 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
On Wed, Oct 30, 2024 at 09:20:49PM -0300, Jason Gunthorpe wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> For virtualization cases the IDR/IIDR/AIDR values of the actual SMMU
> instance need to be available to the VMM so it can construct an
> appropriate vSMMUv3 that reflects the correct HW capabilities.
>
> For userspace page tables these values are required to constrain the valid
> values within the CD table and the IOPTEs.
>
> The kernel does not sanitize these values. If building a VMM then
> userspace is required to only forward bits into a VM that it knows it can
> implement. Some bits will also require a VMM to detect if appropriate
> kernel support is available such as for ATS and BTM.
>
> Start a new file and kconfig for the advanced iommufd support. This lets
> it be compiled out for kernels that are not intended to support
> virtualization, and allows distros to leave it disabled until they are
> shipping a matching qemu too.
>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
> Reviewed-by: Donald Dutile <ddutile@redhat.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
--->8
> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index e266dfa6a38d9d..b227ac16333fe1 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -488,15 +488,50 @@ struct iommu_hw_info_vtd {
> __aligned_u64 ecap_reg;
> };
>
> +/**
> + * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information
> + * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3)
> + *
> + * @flags: Must be set to 0
> + * @__reserved: Must be 0
> + * @idr: Implemented features for ARM SMMU Non-secure programming interface
> + * @iidr: Information about the implementation and implementer of ARM SMMU,
> + * and architecture version supported
> + * @aidr: ARM SMMU architecture version
> + *
> + * For the details of @idr, @iidr and @aidr, please refer to the chapters
> + * from 6.3.1 to 6.3.6 in the SMMUv3 Spec.
> + *
> + * User space should read the underlying ARM SMMUv3 hardware information for
> + * the list of supported features.
> + *
> + * Note that these values reflect the raw HW capability, without any insight if
> + * any required kernel driver support is present. Bits may be set indicating the
> + * HW has functionality that is lacking kernel software support, such as BTM. If
> + * a VMM is using this information to construct emulated copies of these
> + * registers it should only forward bits that it knows it can support.
> + *
> + * In future, presence of required kernel support will be indicated in flags.
What about the case where we _know_ that some functionality is broken in
the hardware? For example, we nobble BTM support on MMU 700 thanks to
erratum #2812531 yet we'll still cheerfully advertise it in IDR0 here.
Similarly, HTTU can be overridden by IORT, so should we update the view
that we advertise for that as well?
Will
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
2024-11-04 11:47 ` Will Deacon
@ 2024-11-04 12:41 ` Jason Gunthorpe
2024-11-06 16:37 ` Robin Murphy
0 siblings, 1 reply; 58+ messages in thread
From: Jason Gunthorpe @ 2024-11-04 12:41 UTC (permalink / raw)
To: Will Deacon
Cc: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
On Mon, Nov 04, 2024 at 11:47:24AM +0000, Will Deacon wrote:
> > +/**
> > + * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information
> > + * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3)
> > + *
> > + * @flags: Must be set to 0
> > + * @__reserved: Must be 0
> > + * @idr: Implemented features for ARM SMMU Non-secure programming interface
> > + * @iidr: Information about the implementation and implementer of ARM SMMU,
> > + * and architecture version supported
> > + * @aidr: ARM SMMU architecture version
> > + *
> > + * For the details of @idr, @iidr and @aidr, please refer to the chapters
> > + * from 6.3.1 to 6.3.6 in the SMMUv3 Spec.
> > + *
> > + * User space should read the underlying ARM SMMUv3 hardware information for
> > + * the list of supported features.
> > + *
> > + * Note that these values reflect the raw HW capability, without any insight if
> > + * any required kernel driver support is present. Bits may be set indicating the
> > + * HW has functionality that is lacking kernel software support, such as BTM. If
> > + * a VMM is using this information to construct emulated copies of these
> > + * registers it should only forward bits that it knows it can support.
> > + *
> > + * In future, presence of required kernel support will be indicated in flags.
>
> What about the case where we _know_ that some functionality is broken in
> the hardware? For example, we nobble BTM support on MMU 700 thanks to
> erratum #2812531 yet we'll still cheerfully advertise it in IDR0 here.
> Similarly, HTTU can be overridden by IORT, so should we update the view
> that we advertise for that as well?
My knee jerk answer is no, these struct fields should just report the
raw HW register. A VMM should not copy these fields directly into a
VM. The principle purpose is to give the VMM the same details about the
HW as the kernel so it can apply erratas/etc.
For instance, if we hide these fields how will the VMM/VM know to
apply the various flushing errata? With vCMDQ/etc the VM is directly
pushing flushes to HW, it must know the errata.
For BTM/HTTU/etc - those all require kernel SW support and per-device
permission in the kernel to turn on. For instance requesting a nested
vSTE that needs BTM will fail today during attach. Turning on HTTU on
the S2 already has an API that will fail if the IORT blocks it.
Incrementally dealing with expanding the support is part of the
"required kernel support will be indicated in flags."
Basically, exposing the information as-is doesn't do any harm.
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 09/12] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED
2024-10-31 6:21 ` Zhangfei Gao
@ 2024-11-04 17:19 ` Jason Gunthorpe
2024-11-06 5:39 ` Zhangfei Gao
0 siblings, 1 reply; 58+ messages in thread
From: Jason Gunthorpe @ 2024-11-04 17:19 UTC (permalink / raw)
To: Zhangfei Gao
Cc: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On Thu, Oct 31, 2024 at 02:21:11PM +0800, Zhangfei Gao wrote:
> > +static struct iommu_domain *
> > +arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
> > + const struct iommu_user_data *user_data)
> > +{
> > + struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core);
> > + struct arm_smmu_nested_domain *nested_domain;
> > + struct iommu_hwpt_arm_smmuv3 arg;
> > + int ret;
> > +
> > + if (flags)
> > + return ERR_PTR(-EOPNOTSUPP);
>
> This check fails when using user page fault, with flags =
> IOMMU_HWPT_FAULT_ID_VALID (4)
> Strange, the check is not exist in last version?
>
> iommufd_viommu_alloc_hwpt_nested ->
> viommu->ops->alloc_domain_nested(viommu, flags, user_data) ->
> arm_vsmmu_alloc_domain_nested
It should permit IOMMU_HWPT_FAULT_ID_VALID, I'll add this hunk:
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -178,12 +178,18 @@ arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
const struct iommu_user_data *user_data)
{
struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core);
+ const u32 SUPPORTED_FLAGS = IOMMU_HWPT_FAULT_ID_VALID;
struct arm_smmu_nested_domain *nested_domain;
struct iommu_hwpt_arm_smmuv3 arg;
bool enable_ats = false;
int ret;
- if (flags)
+ /*
+ * Faults delivered to the nested domain are faults that originated by
+ * the S1 in the domain. The core code will match all PASIDs when
+ * delivering the fault due to user_pasid_table
+ */
+ if (flags & ~SUPPORTED_FLAGS)
return ERR_PTR(-EOPNOTSUPP);
ret = iommu_copy_struct_from_user(&arg, user_data,
Thanks,
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2024-11-01 13:25 ` Jason Gunthorpe
@ 2024-11-05 16:48 ` Will Deacon
2024-11-05 17:03 ` Jason Gunthorpe
2024-11-08 14:56 ` Will Deacon
0 siblings, 2 replies; 58+ messages in thread
From: Will Deacon @ 2024-11-05 16:48 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
Hi Jason,
On Fri, Nov 01, 2024 at 10:25:03AM -0300, Jason Gunthorpe wrote:
> On Fri, Nov 01, 2024 at 12:18:50PM +0000, Will Deacon wrote:
> > On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote:
> > > [This is now based on Nicolin's iommufd patches for vIOMMU and will need
> > > to go through the iommufd tree, please ack]
> >
> > Can't we separate out the SMMUv3 driver changes? They shouldn't depend on
> > Nicolin's work afaict.
>
> We can put everything before "iommu/arm-smmu-v3: Support
> IOMMU_VIOMMU_ALLOC" directly on a rc and share a branch with your tree.
>
> That patch and following all depend on Nicolin's work, as they rely on
> the VIOMMU due to how different ARM is from Intel.
>
> How about you take these patches:
>
> [PATCH v4 01/12] vfio: Remove VFIO_TYPE1_NESTING_IOMMU
> [PATCH v4 02/12] ACPICA: IORT: Update for revision E.f
> [PATCH v4 03/12] ACPI/IORT: Support CANWBS memory access flag
> [PATCH v4 04/12] iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS
> [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
> [PATCH v4 06/12] iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT
> [PATCH v4 07/12] iommu/arm-smmu-v3: Expose the arm_smmu_attach interface
>
> Onto a branch.
I've pushed these onto a new branch in the IOMMU tree:
iommufd/arm-smmuv3-nested
However, please can you give it a day or two to get some exposure in
-next before you merge that into iommufd? I'll ping back here later in
the week.
Cheers,
Will
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2024-11-05 16:48 ` Will Deacon
@ 2024-11-05 17:03 ` Jason Gunthorpe
2024-11-08 14:56 ` Will Deacon
1 sibling, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2024-11-05 17:03 UTC (permalink / raw)
To: Will Deacon
Cc: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
On Tue, Nov 05, 2024 at 04:48:29PM +0000, Will Deacon wrote:
> Hi Jason,
>
> On Fri, Nov 01, 2024 at 10:25:03AM -0300, Jason Gunthorpe wrote:
> > On Fri, Nov 01, 2024 at 12:18:50PM +0000, Will Deacon wrote:
> > > On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote:
> > > > [This is now based on Nicolin's iommufd patches for vIOMMU and will need
> > > > to go through the iommufd tree, please ack]
> > >
> > > Can't we separate out the SMMUv3 driver changes? They shouldn't depend on
> > > Nicolin's work afaict.
> >
> > We can put everything before "iommu/arm-smmu-v3: Support
> > IOMMU_VIOMMU_ALLOC" directly on a rc and share a branch with your tree.
> >
> > That patch and following all depend on Nicolin's work, as they rely on
> > the VIOMMU due to how different ARM is from Intel.
> >
> > How about you take these patches:
> >
> > [PATCH v4 01/12] vfio: Remove VFIO_TYPE1_NESTING_IOMMU
> > [PATCH v4 02/12] ACPICA: IORT: Update for revision E.f
> > [PATCH v4 03/12] ACPI/IORT: Support CANWBS memory access flag
> > [PATCH v4 04/12] iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS
> > [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
> > [PATCH v4 06/12] iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT
> > [PATCH v4 07/12] iommu/arm-smmu-v3: Expose the arm_smmu_attach interface
> >
> > Onto a branch.
>
> I've pushed these onto a new branch in the IOMMU tree:
>
> iommufd/arm-smmuv3-nested
>
> However, please can you give it a day or two to get some exposure in
> -next before you merge that into iommufd? I'll ping back here later in
> the week.
Thanks, I will hope to put the other parts together on Friday then so
it can also get its time in -next, as we are running out of days
before the merge window
0-day is still complaining about some kconfig in Nicolin's patches so
we need to settle that anyhow.
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 09/12] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED
2024-11-04 17:19 ` Jason Gunthorpe
@ 2024-11-06 5:39 ` Zhangfei Gao
0 siblings, 0 replies; 58+ messages in thread
From: Zhangfei Gao @ 2024-11-06 5:39 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On Mon, 4 Nov 2024 at 17:19, Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Oct 31, 2024 at 02:21:11PM +0800, Zhangfei Gao wrote:
>
> > > +static struct iommu_domain *
> > > +arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
> > > + const struct iommu_user_data *user_data)
> > > +{
> > > + struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core);
> > > + struct arm_smmu_nested_domain *nested_domain;
> > > + struct iommu_hwpt_arm_smmuv3 arg;
> > > + int ret;
> > > +
> > > + if (flags)
> > > + return ERR_PTR(-EOPNOTSUPP);
> >
> > This check fails when using user page fault, with flags =
> > IOMMU_HWPT_FAULT_ID_VALID (4)
> > Strange, the check is not exist in last version?
> >
> > iommufd_viommu_alloc_hwpt_nested ->
> > viommu->ops->alloc_domain_nested(viommu, flags, user_data) ->
> > arm_vsmmu_alloc_domain_nested
>
> It should permit IOMMU_HWPT_FAULT_ID_VALID, I'll add this hunk:
>
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
> @@ -178,12 +178,18 @@ arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
> const struct iommu_user_data *user_data)
> {
> struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core);
> + const u32 SUPPORTED_FLAGS = IOMMU_HWPT_FAULT_ID_VALID;
> struct arm_smmu_nested_domain *nested_domain;
> struct iommu_hwpt_arm_smmuv3 arg;
> bool enable_ats = false;
> int ret;
>
> - if (flags)
> + /*
> + * Faults delivered to the nested domain are faults that originated by
> + * the S1 in the domain. The core code will match all PASIDs when
> + * delivering the fault due to user_pasid_table
> + */
> + if (flags & ~SUPPORTED_FLAGS)
> return ERR_PTR(-EOPNOTSUPP);
Thanks Jason, this works
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
2024-11-04 12:41 ` Jason Gunthorpe
@ 2024-11-06 16:37 ` Robin Murphy
2024-11-06 18:05 ` Jason Gunthorpe
0 siblings, 1 reply; 58+ messages in thread
From: Robin Murphy @ 2024-11-06 16:37 UTC (permalink / raw)
To: Jason Gunthorpe, Will Deacon
Cc: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Sudeep Holla, Alex Williamson,
Donald Dutile, Eric Auger, Hanjun Guo, Jean-Philippe Brucker,
Jerry Snitselaar, Moritz Fischer, Michael Shavit, Nicolin Chen,
patches, Rafael J. Wysocki, Shameerali Kolothum Thodi,
Mostafa Saleh
On 2024-11-04 12:41 pm, Jason Gunthorpe wrote:
> On Mon, Nov 04, 2024 at 11:47:24AM +0000, Will Deacon wrote:
>>> +/**
>>> + * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information
>>> + * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3)
>>> + *
>>> + * @flags: Must be set to 0
>>> + * @__reserved: Must be 0
>>> + * @idr: Implemented features for ARM SMMU Non-secure programming interface
>>> + * @iidr: Information about the implementation and implementer of ARM SMMU,
>>> + * and architecture version supported
>>> + * @aidr: ARM SMMU architecture version
>>> + *
>>> + * For the details of @idr, @iidr and @aidr, please refer to the chapters
>>> + * from 6.3.1 to 6.3.6 in the SMMUv3 Spec.
>>> + *
>>> + * User space should read the underlying ARM SMMUv3 hardware information for
>>> + * the list of supported features.
>>> + *
>>> + * Note that these values reflect the raw HW capability, without any insight if
>>> + * any required kernel driver support is present. Bits may be set indicating the
>>> + * HW has functionality that is lacking kernel software support, such as BTM. If
>>> + * a VMM is using this information to construct emulated copies of these
>>> + * registers it should only forward bits that it knows it can support.
But how *is* a VMM supposed to know what it can support? Are they all
expected to grovel the host devicetree/ACPI tables and maintain their
own knowledge of implementation errata to understand what's actually usable?
>>> + *
>>> + * In future, presence of required kernel support will be indicated in flags.
>>
>> What about the case where we _know_ that some functionality is broken in
>> the hardware? For example, we nobble BTM support on MMU 700 thanks to
>> erratum #2812531 yet we'll still cheerfully advertise it in IDR0 here.
>> Similarly, HTTU can be overridden by IORT, so should we update the view
>> that we advertise for that as well?
>
> My knee jerk answer is no, these struct fields should just report the
> raw HW register. A VMM should not copy these fields directly into a
> VM. The principle purpose is to give the VMM the same details about the
> HW as the kernel so it can apply erratas/etc.
>
> For instance, if we hide these fields how will the VMM/VM know to
> apply the various flushing errata? With vCMDQ/etc the VM is directly
> pushing flushes to HW, it must know the errata.
That doesn't seem like a valid argument. We obviously can't abstract
SMMU_IIDR, that would indeed be an invitation for trouble, but
otherwise, if an erratum affects S1 operation under conditions dependent
on an optional feature, then not advertising that feature would make the
workaround irrelevant anyway, since as far as the VM is concerned it
would be wrong to expect a non-existent feature to work in the first place.
> For BTM/HTTU/etc - those all require kernel SW support and per-device
> permission in the kernel to turn on. For instance requesting a nested
> vSTE that needs BTM will fail today during attach. Turning on HTTU on
> the S2 already has an API that will fail if the IORT blocks it.
What does S2 HTTU have to do with the VM? How the host wants to maintain
its S2 tables it its own business. AFAICS, unless the VMM wants to do
some fiddly CD shadowing, it's going to be kinda hard to prevent the
SMMU seeing a guest CD with CD.HA and/or CD.HD set if the guest expects
S1 HTTU to work.
I'm not sure what "vSTE that needs BTM" means. Even if the system does
support BTM, the only control is the global SMMU_CR2.PTM, and a vSMMU
can't usefully emulate changing that either way. Either the host set
PTM=0 before enabling the SMMU, so BTM can be advertised and expected to
work, or it didn't, in which case there can be no BTM, full stop.
> Incrementally dealing with expanding the support is part of the
> "required kernel support will be indicated in flags."
>
> Basically, exposing the information as-is doesn't do any harm.
I would say it does. Advertising a feature when we already know it's not
usable at all puts a non-trivial and unnecessary burden on the VMM and
VM to then have to somehow derive that information from other sources,
at the risk of being confused by unexpected behaviour if they don't.
We sanitise CPU ID registers for userspace and KVM, so I see no
compelling reason for SMMU ID registers to be different.
Thanks,
Robin.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
2024-11-06 16:37 ` Robin Murphy
@ 2024-11-06 18:05 ` Jason Gunthorpe
2024-11-06 21:05 ` Robin Murphy
0 siblings, 1 reply; 58+ messages in thread
From: Jason Gunthorpe @ 2024-11-06 18:05 UTC (permalink / raw)
To: Robin Murphy
Cc: Will Deacon, acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm,
Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Sudeep Holla, Alex Williamson,
Donald Dutile, Eric Auger, Hanjun Guo, Jean-Philippe Brucker,
Jerry Snitselaar, Moritz Fischer, Michael Shavit, Nicolin Chen,
patches, Rafael J. Wysocki, Shameerali Kolothum Thodi,
Mostafa Saleh
On Wed, Nov 06, 2024 at 04:37:53PM +0000, Robin Murphy wrote:
> On 2024-11-04 12:41 pm, Jason Gunthorpe wrote:
> > On Mon, Nov 04, 2024 at 11:47:24AM +0000, Will Deacon wrote:
> > > > +/**
> > > > + * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information
> > > > + * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3)
> > > > + *
> > > > + * @flags: Must be set to 0
> > > > + * @__reserved: Must be 0
> > > > + * @idr: Implemented features for ARM SMMU Non-secure programming interface
> > > > + * @iidr: Information about the implementation and implementer of ARM SMMU,
> > > > + * and architecture version supported
> > > > + * @aidr: ARM SMMU architecture version
> > > > + *
> > > > + * For the details of @idr, @iidr and @aidr, please refer to the chapters
> > > > + * from 6.3.1 to 6.3.6 in the SMMUv3 Spec.
> > > > + *
> > > > + * User space should read the underlying ARM SMMUv3 hardware information for
> > > > + * the list of supported features.
> > > > + *
> > > > + * Note that these values reflect the raw HW capability, without any insight if
> > > > + * any required kernel driver support is present. Bits may be set indicating the
> > > > + * HW has functionality that is lacking kernel software support, such as BTM. If
> > > > + * a VMM is using this information to construct emulated copies of these
> > > > + * registers it should only forward bits that it knows it can support.
>
> But how *is* a VMM supposed to know what it can support?
I answered a related question to Mostafa with an example:
https://lore.kernel.org/linux-iommu/20240903235532.GJ3773488@nvidia.com/
"global" capabilities that are enabled directly from the CD entry
would follow the pattern.
> Are they all expected to grovel the host devicetree/ACPI tables and
> maintain their own knowledge of implementation errata to understand
> what's actually usable?
No, VMMs are expected to only implement base line features we have
working today and not blindly add new features based only HW registers
reported here.
Each future capability we want to enable at the VMM needs an analysis:
1) Does it require kernel SW changes, ie like BTM? Then it needs a
kernel_capabilities bit to say the kernel SW exists
2) Does it require data from ACPI/DT/etc? Then it needs a
kernel_capabilities bit
3) Does it need to be "turned on" per VM, ie with a VMS enablement?
Then it needs a new request flag in ALLOC_VIOMMU
4) Otherwise it can be read directly from the idr[] array
This is why the comment above is so stern that the VMM "should only
forward bits that it knows it can support".
> S2 tables it its own business. AFAICS, unless the VMM wants to do some
> fiddly CD shadowing, it's going to be kinda hard to prevent the SMMU seeing
> a guest CD with CD.HA and/or CD.HD set if the guest expects S1 HTTU to work.
If the VMM wrongly indicates HTTU support to the VM, because it
wrongly inspected those bits in the idr report, then it is just
broken.
> I would say it does. Advertising a feature when we already know it's not
> usable at all puts a non-trivial and unnecessary burden on the VMM and VM to
> then have to somehow derive that information from other sources, at the risk
> of being confused by unexpected behaviour if they don't.
That is not the purpose here, the register report is not to be used as
"advertising features". It describes details of the raw HW that the
VMM may need to use *some* of the fields.
There are quite a few fields that fit #4 today: OAS, VAX, GRAN, BBML,
CD2L, etc.
Basically we will pass most of the bits and mask a few. If we get the
masking wrong and pass something we shouldn't, then we've improved
nothing compared to this proposal. I think we are likely to get the
masking wrong :)
> We sanitise CPU ID registers for userspace and KVM, so I see no compelling
> reason for SMMU ID registers to be different.
We discussed this already:
https://lore.kernel.org/linux-iommu/20240904120103.GB3915968@nvidia.com
It is a false comparison, for KVM the kernel is responsible to control
the CPU ID registers. Reporting the registers the VM sees to the VMM
makes alot of sense. For SMMU the VMM exclusively controls the VM's ID
registers.
If you still feel strongly about this please let me know by Friday and
I will drop the idr[] array from this cycle. We can continue to
discuss a solution for the next cycle.
Regards,
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
2024-11-06 18:05 ` Jason Gunthorpe
@ 2024-11-06 21:05 ` Robin Murphy
2024-11-06 21:53 ` Nicolin Chen
2024-11-07 2:35 ` Jason Gunthorpe
0 siblings, 2 replies; 58+ messages in thread
From: Robin Murphy @ 2024-11-06 21:05 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Will Deacon, acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm,
Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Sudeep Holla, Alex Williamson,
Donald Dutile, Eric Auger, Hanjun Guo, Jean-Philippe Brucker,
Jerry Snitselaar, Moritz Fischer, Michael Shavit, Nicolin Chen,
patches, Rafael J. Wysocki, Shameerali Kolothum Thodi,
Mostafa Saleh
On 2024-11-06 6:05 pm, Jason Gunthorpe wrote:
> On Wed, Nov 06, 2024 at 04:37:53PM +0000, Robin Murphy wrote:
>> On 2024-11-04 12:41 pm, Jason Gunthorpe wrote:
>>> On Mon, Nov 04, 2024 at 11:47:24AM +0000, Will Deacon wrote:
>>>>> +/**
>>>>> + * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information
>>>>> + * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3)
>>>>> + *
>>>>> + * @flags: Must be set to 0
>>>>> + * @__reserved: Must be 0
>>>>> + * @idr: Implemented features for ARM SMMU Non-secure programming interface
>>>>> + * @iidr: Information about the implementation and implementer of ARM SMMU,
>>>>> + * and architecture version supported
>>>>> + * @aidr: ARM SMMU architecture version
>>>>> + *
>>>>> + * For the details of @idr, @iidr and @aidr, please refer to the chapters
>>>>> + * from 6.3.1 to 6.3.6 in the SMMUv3 Spec.
>>>>> + *
>>>>> + * User space should read the underlying ARM SMMUv3 hardware information for
>>>>> + * the list of supported features.
>>>>> + *
>>>>> + * Note that these values reflect the raw HW capability, without any insight if
>>>>> + * any required kernel driver support is present. Bits may be set indicating the
>>>>> + * HW has functionality that is lacking kernel software support, such as BTM. If
>>>>> + * a VMM is using this information to construct emulated copies of these
>>>>> + * registers it should only forward bits that it knows it can support.
>>
>> But how *is* a VMM supposed to know what it can support?
>
> I answered a related question to Mostafa with an example:
>
> https://lore.kernel.org/linux-iommu/20240903235532.GJ3773488@nvidia.com/
>
> "global" capabilities that are enabled directly from the CD entry
> would follow the pattern.
>
>> Are they all expected to grovel the host devicetree/ACPI tables and
>> maintain their own knowledge of implementation errata to understand
>> what's actually usable?
>
> No, VMMs are expected to only implement base line features we have
> working today and not blindly add new features based only HW registers
> reported here.
>
> Each future capability we want to enable at the VMM needs an analysis:
>
> 1) Does it require kernel SW changes, ie like BTM? Then it needs a
> kernel_capabilities bit to say the kernel SW exists
> 2) Does it require data from ACPI/DT/etc? Then it needs a
> kernel_capabilities bit
> 3) Does it need to be "turned on" per VM, ie with a VMS enablement?
> Then it needs a new request flag in ALLOC_VIOMMU
> 4) Otherwise it can be read directly from the idr[] array
>
> This is why the comment above is so stern that the VMM "should only
> forward bits that it knows it can support".
So... you're saying this patch is in fact broken, or at least uselessly
incomplete, since VMMs aren't allowed to emulate a vSMMU at all without
first consulting some other interface which does not exist? Great.
>> S2 tables it its own business. AFAICS, unless the VMM wants to do some
>> fiddly CD shadowing, it's going to be kinda hard to prevent the SMMU seeing
>> a guest CD with CD.HA and/or CD.HD set if the guest expects S1 HTTU to work.
>
> If the VMM wrongly indicates HTTU support to the VM, because it
> wrongly inspected those bits in the idr report, then it is just
> broken.
What do you mean? We could have a system right now where the hardware is
configured with SMMU_IDR0.HTTU=2, but it turned out that atomics were
broken in the interconnect so firmware sets the IORT "HTTU override"
field is set to 0. We know about that in the kernel, but all a VMM sees
is iommu_hw_info_arm_smmuv3.idr[0] indicating HTTU=2. If it is "broken"
to take the only information available at face value, assume HTTU is
available, and reflect that in a vSMMU interface, then what is the
correct thing to do, other than to not dare emulate a vSMMU at all, in
fear of a sternly worded comment?
>> I would say it does. Advertising a feature when we already know it's not
>> usable at all puts a non-trivial and unnecessary burden on the VMM and VM to
>> then have to somehow derive that information from other sources, at the risk
>> of being confused by unexpected behaviour if they don't.
>
> That is not the purpose here, the register report is not to be used as
> "advertising features". It describes details of the raw HW that the
> VMM may need to use *some* of the fields.
>
> There are quite a few fields that fit #4 today: OAS, VAX, GRAN, BBML,
> CD2L, etc.
>
> Basically we will pass most of the bits and mask a few. If we get the
> masking wrong and pass something we shouldn't, then we've improved
> nothing compared to this proposal. I think we are likely to get the
> masking wrong :)
Seriously? A simple inverse of the feature detection the kernel driver
already does for its own needs, implemented once in the same place, is hard?
Compared to maintaining the exact same information within the driver but
in some new different form, and also maintaining it in the UAPI, and
having every VMM ever all do the same work to put the two together, and
always be up to date with the right UAPI, and never ever let any field
slip through as-is, especially not all the ones which were RES0 at time
of writing, enforced by a sternly worded comment? Why yes, of course I
can see how that's trivially easy and carries no risk whatsoever.
>> We sanitise CPU ID registers for userspace and KVM, so I see no compelling
>> reason for SMMU ID registers to be different.
>
> We discussed this already:
>
> https://lore.kernel.org/linux-iommu/20240904120103.GB3915968@nvidia.com
>
> It is a false comparison, for KVM the kernel is responsible to control
> the CPU ID registers. Reporting the registers the VM sees to the VMM
> makes alot of sense. For SMMU the VMM exclusively controls the VM's ID
> registers.
Pointing out that two things are different is a false comparison because
they are different, by virtue of your choice to make them different?
Please try making sense.
Your tautology still does not offer any reasoning against doing the
logical thing and following the same basic pattern: the kernel uses the
ID register mechanism itself to advertise the set of features it's
able/willing to support, by sanitising the values it offers to the VMM,
combining the notions of hardware and kernel support where the
distinction is irrelevant anyway. The VMM is then still free to take
those values and hide more features, or potentially add any that it is
capable of emulating without the kernel's help, and advertise that final
set to the VM. Obviously there are significant *implementation*
differences, most notably that the latter VMM->VM part doesn't need to
involve IOMMUFD at all since MMIO register emulation can stay entirely
in userspace, whereas for CPU system registers the final VM-visible
values need to be plugged back in to KVM for it to handle the traps.
We are all asking you to explain why you think doing the kernel->VMM
advertisement naturally and intuitively is somehow bad, and forcing VMMs
to instead rely on a more complex, fragile, and crucially non-existent
additional interface is better. You should take "We discussed this
already" as more of a clue to yourself than to me - if 4 different
people have all said the exact same thing in so many words, perhaps
there's something in it...
And in case I need to spell it out with less sarcasm, "we'll get masking
wrong in the kernel" only implies "we'll get kernel_capabilities wrong
in the kernel (and elsewhere)", so it's clearly not a useful argument to
keep repeating. Besides, as KVM + sysfs + MRS emulation shows, we're
pretty experienced at masking ID registers in the kernel. It's not hard
to do it right in a robust manner, where particularly with the nature of
SMMU features, the only real risk might be forgetting to expose
something new once we do actually support it.
> If you still feel strongly about this please let me know by Friday and
> I will drop the idr[] array from this cycle. We can continue to
> discuss a solution for the next cycle.
It already can't work as-is, I don't see how making it even more broken
would help. IMO it doesn't seem like a good idea to be merging UAPI at
all while it's still clearly incomplete and by its own definition unusable.
Thanks,
Robin.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
2024-11-06 21:05 ` Robin Murphy
@ 2024-11-06 21:53 ` Nicolin Chen
2024-11-07 2:35 ` Jason Gunthorpe
1 sibling, 0 replies; 58+ messages in thread
From: Nicolin Chen @ 2024-11-06 21:53 UTC (permalink / raw)
To: Robin Murphy
Cc: Jason Gunthorpe, Will Deacon, acpica-devel, iommu, Joerg Roedel,
Kevin Tian, kvm, Len Brown, linux-acpi, linux-arm-kernel,
Lorenzo Pieralisi, Rafael J. Wysocki, Robert Moore, Sudeep Holla,
Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
On Wed, Nov 06, 2024 at 09:05:26PM +0000, Robin Murphy wrote:
> On 2024-11-06 6:05 pm, Jason Gunthorpe wrote:
> > If you still feel strongly about this please let me know by Friday and
> > I will drop the idr[] array from this cycle. We can continue to
> > discuss a solution for the next cycle.
>
> It already can't work as-is, I don't see how making it even more broken
> would help. IMO it doesn't seem like a good idea to be merging UAPI at
> all while it's still clearly incomplete and by its own definition unusable.
Robin, would you please give a clear suggestion for the hw_info?
My takeaway is that you would want the unsupported features (per
firmware overrides and errata) to be stripped from the reporting
IDR array. Alternatively, we could start with some basic nesting
features less those advanced ones (HTTU/PRI or so), and then add
then later once we're comfortable to advertise.
Does this sound okay to you?
Thanks
Nicolin
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
2024-11-06 21:05 ` Robin Murphy
2024-11-06 21:53 ` Nicolin Chen
@ 2024-11-07 2:35 ` Jason Gunthorpe
2024-11-07 15:51 ` Jason Gunthorpe
2024-11-08 14:53 ` Will Deacon
1 sibling, 2 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2024-11-07 2:35 UTC (permalink / raw)
To: Robin Murphy
Cc: Will Deacon, acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm,
Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Sudeep Holla, Alex Williamson,
Donald Dutile, Eric Auger, Hanjun Guo, Jean-Philippe Brucker,
Jerry Snitselaar, Moritz Fischer, Michael Shavit, Nicolin Chen,
patches, Rafael J. Wysocki, Shameerali Kolothum Thodi,
Mostafa Saleh
On Wed, Nov 06, 2024 at 09:05:26PM +0000, Robin Murphy wrote:
> > Each future capability we want to enable at the VMM needs an analysis:
> >
> > 1) Does it require kernel SW changes, ie like BTM? Then it needs a
> > kernel_capabilities bit to say the kernel SW exists
> > 2) Does it require data from ACPI/DT/etc? Then it needs a
> > kernel_capabilities bit
> > 3) Does it need to be "turned on" per VM, ie with a VMS enablement?
> > Then it needs a new request flag in ALLOC_VIOMMU
> > 4) Otherwise it can be read directly from the idr[] array
> >
> > This is why the comment above is so stern that the VMM "should only
> > forward bits that it knows it can support".
>
> So... you're saying this patch is in fact broken, or at least uselessly
> incomplete, since VMMs aren't allowed to emulate a vSMMU at all without
> first consulting some other interface which does not exist? Great.
That is too far, there is nothing fundamentally broken here.
This does not enable every SMMU feature in the architecture. It
implements a basic baseline and no more. The VMMs are only permitted
to read bits in that baseline. VMMs that want to go beyond that have
to go through the 4 steps above.
Fundamentally all we are arguing about here is if bits the VMM
shouldn't even read should be forced to zero by the kernel.
If the bits are forced to zero then perhaps they could be used as a
future SW feature negotiation, and maybe that would be better, or
maybe not. See below about BTM for a counter example.
I agree the kdoc does not describe what the baseline actually is.
> We know about that in the kernel, but all a VMM sees is
> iommu_hw_info_arm_smmuv3.idr[0] indicating HTTU=2. If it is "broken" to take
> the only information available at face value, assume HTTU is available, and
> reflect that in a vSMMU interface, then what is the correct thing to do,
> other than to not dare emulate a vSMMU at all, in fear of a sternly worded
> comment?
These patches support a baseline feature set. They do not support the
entire SMMU architecture. They do not support vHTTU.
Things are going step by step because this is a huge project. HTTU is
not in the baseline, so it is definitely wrong for any VMM working on
these patches to advertise support to the VM! The VMM *MUST* ignore
such bits in the IDR report.
The correct thing is the 4 steps I outlined above. When a VMM author
wants to go beyond the baseline they have to determine what kernel and
VMM software is required and implement the missing parts.
> Your tautology still does not offer any reasoning against doing the logical
> thing and following the same basic pattern: the kernel uses the ID register
> mechanism itself to advertise the set of features it's able/willing to
> support, by sanitising the values it offers to the VMM, combining the
> notions of hardware and kernel support where the distinction is irrelevant
> anyway.
Even with your plan the VMM can not pass the kernel's IDR register
into the VM unprotected. It must always sanitize it against the
features that the VMM supports.
Let's consider vBTM support. This requires kernel support to enable
the global BTM register *AND* a new VMM ioctl flow with iommufd to
link to KVM's VMID.
You are suggesting that the old kernel should wire IDR BTM to 0 and
the new kernel will set it to 1. However the VMM cannot just forward
this bit to the VM! A old VMM that does not do the KVM flow does NOT
support BTM.
All VMM's must mask the BTM bit from day 0, or we can never set it to
1 in a future kernel. Which is exactly the same plan as this patch!
> You should take "We discussed this already"
> as more of a clue to yourself than to me - if 4 different people have all
> said the exact same thing in so many words, perhaps there's something in
> it...
And all seemed to agree it was not a big deal after the discussion.
I think Mostafa was driving in a direction that we break up the IDR
into explicit fields and thus be explicit about what information the
VMM is able to access. This would effectively document and enforce
what the baseline is.
> And in case I need to spell it out with less sarcasm, "we'll get masking
> wrong in the kernel" only implies "we'll get kernel_capabilities wrong in
> the kernel (and elsewhere)",
Less sarcasm and hyperbole would be nice.
> > If you still feel strongly about this please let me know by Friday and
> > I will drop the idr[] array from this cycle. We can continue to
> > discuss a solution for the next cycle.
>
> It already can't work as-is,
As above, this is not true.
> I don't see how making it even more broken
> would help. IMO it doesn't seem like a good idea to be merging UAPI at all
> while it's still clearly incomplete and by its own definition unusable.
I agree that removing the idr would make it definitely unusable. There
is quite a bit of essential information in there the VMM
needs. However, the uAPI is extensible so adding a new field in the
next cycle is not breaking.
I'm trying to offer you an olive branch so we can have this discussion
with time instead of urgently at the 11th hour and derailing the
series. This exact topic has been discussed already two months ago,
and is, IMHO, mostly a style choice not an urgent technical matter.
If you have another option we can conclude in the next few days then
I'm happy to hear it.
If we focus only on the baseline functionality Nicolin can come up
with a list of fields and we can write it very explicitly in the kdoc
that the VMM can only read those fields.
I could also reluctantly agree to permanent masking to report only the
above kdoc bits in the kernel. Noting if you want this because you
don't trust the VMM authors to follow an explicit kdoc, then we likely
also cannot ever set some bits to 1. Eg BTM would be permanently 0.
Regards,
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
2024-11-07 2:35 ` Jason Gunthorpe
@ 2024-11-07 15:51 ` Jason Gunthorpe
2024-11-08 14:53 ` Will Deacon
1 sibling, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2024-11-07 15:51 UTC (permalink / raw)
To: Robin Murphy
Cc: Will Deacon, acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm,
Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Sudeep Holla, Alex Williamson,
Donald Dutile, Eric Auger, Hanjun Guo, Jean-Philippe Brucker,
Jerry Snitselaar, Moritz Fischer, Michael Shavit, Nicolin Chen,
patches, Rafael J. Wysocki, Shameerali Kolothum Thodi,
Mostafa Saleh
On Wed, Nov 06, 2024 at 10:35:07PM -0400, Jason Gunthorpe wrote:
> I agree the kdoc does not describe what the baseline actually is.
Nicolin worked on this, here is a more detailed kdoc:
/**
* struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information
* (IOMMU_HW_INFO_TYPE_ARM_SMMUV3)
*
* @flags: Must be set to 0
* @__reserved: Must be 0
* @idr: Implemented features for ARM SMMU Non-secure programming interface
* @iidr: Information about the implementation and implementer of ARM SMMU,
* and architecture version supported
* @aidr: ARM SMMU architecture version
*
* For the details of @idr, @iidr and @aidr, please refer to the chapters
* from 6.3.1 to 6.3.6 in the SMMUv3 Spec.
*
* This reports the raw HW capability, and not all bits are meaningful to be
* read by userspace. Only the following fields should be used:
*
* idr[0]: ST_LEVEL, TERM_MODEL, STALL_MODEL, TTENDIAN , CD2L, ASID16, TTF
* idr[1]: SIDSIZE, SSIDSIZE
* idr[3]: BBML, RIL
* idr[5]: VAX, GRAN64K, GRAN16K, GRAN4K
*
* - S1P should be assumed to be true if a NESTED HWPT can be created
* - VFIO/iommufd only support platforms with COHACC, it should be assumed to be
* true.
* - ATS is a per-device property. If the VMM describes any devices as ATS
* capable in ACPI/DT it should set the corresponding idr.
*
* This list may expand in future (eg E0PD, AIE, PBHA, D128, DS etc). It is
* important that VMMs do not read bits outside the list to allow for
* compatibility with future kernels. Several features in the SMMUv3
* architecture are not currently supported by the kernel for nesting: HTTU,
* BTM, MPAM and others.
*/
This focuses on stuff we can actually test and gives a path to test
and confirm a no-code update to future stuff.
The future list (E0PD/etc) reflects things the current kernel doesn't
use. Our naive read of the spec suggests they are probably fine to
just read the raw HW IDR. When someone implements guest kernel
support, does a detailed spec read, and crucially tests them - then we
can update the comment and have immediate support.
HTTU, BTM, and others like that will need additional bits outside the
IDR if someone wishes to enable them.
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
2024-11-07 2:35 ` Jason Gunthorpe
2024-11-07 15:51 ` Jason Gunthorpe
@ 2024-11-08 14:53 ` Will Deacon
2024-11-08 15:42 ` Jason Gunthorpe
1 sibling, 1 reply; 58+ messages in thread
From: Will Deacon @ 2024-11-08 14:53 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Robin Murphy, acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm,
Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Sudeep Holla, Alex Williamson,
Donald Dutile, Eric Auger, Hanjun Guo, Jean-Philippe Brucker,
Jerry Snitselaar, Moritz Fischer, Michael Shavit, Nicolin Chen,
patches, Rafael J. Wysocki, Shameerali Kolothum Thodi,
Mostafa Saleh
Hi guys,
On Wed, Nov 06, 2024 at 10:35:06PM -0400, Jason Gunthorpe wrote:
> On Wed, Nov 06, 2024 at 09:05:26PM +0000, Robin Murphy wrote:
> > You should take "We discussed this already"
> > as more of a clue to yourself than to me - if 4 different people have all
> > said the exact same thing in so many words, perhaps there's something in
> > it...
>
> And all seemed to agree it was not a big deal after the discussion.
>
> I think Mostafa was driving in a direction that we break up the IDR
> into explicit fields and thus be explicit about what information the
> VMM is able to access. This would effectively document and enforce
> what the baseline is.
As one of the four people mentioned above, I figured I'd chime in with
my rationale for queuing this in case it's of any help or interest.
Initially, I was reasonably sure that we should be sanitising the ID
registers and being selective about what we advertise to userspace.
However, after Jason's reply to my comments, mulling it over in my head
and having lively conversations with Mostafa at lunchtime, I've come
full circle. Is it a great interface? Not at all. It's 8 register values
copied from the hardware to userspace. But is it good enough? I think it
is. It's also extremely simple (i.e. easy to explain what it does and
trivial to implement), which I think is a huge benefit given that the
IOMMUFD work around it is still evolving.
I'm firmly of the opinion that the VMM is going to need a tonne of help
from other sources to expose a virtual IOMMU successfully. For example,
anything relating to policy or configuration choices should be driven
from userspace rather than the kernel. If we start to expose policy in
the id registers for the cases where it happens to fit nicely, I fear
that this will backfire and the VMM will end up second-guessing the
kernel in cases where it decides it knows best. I'm not sold on the
analogy with CPU ID registers as (a) we don't have big/little idiocy to
deal with in the SMMU and (b) I don't think SMMU features always need
support code in the host, which is quite unlike many CPU features that
cannot be exposed safely without hypervisor context-switching support.
On top of all that, this interface can be extended if we change our
minds. If we decide we need to mask out fields, I think we could add
that after the fact. Hell, if we all decide that it's a disaster in a
few releases time, we can try something else. Ultimately, Jason is the
one maintaining IOMMUFD and he gets to deal with the problems if his
UAPI doesn't work out :)
So, given where we are in the cycle, I think the pragmatic thing to do
is to land this change now and enable the ongoing IOMMUFD work to
continue. We still have over two months to resolve any major problems
with the interface (and even unplug it entirely from the driver if we
really get stuck) but for now I think it's "fine".
Will
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2024-11-05 16:48 ` Will Deacon
2024-11-05 17:03 ` Jason Gunthorpe
@ 2024-11-08 14:56 ` Will Deacon
1 sibling, 0 replies; 58+ messages in thread
From: Will Deacon @ 2024-11-08 14:56 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
On Tue, Nov 05, 2024 at 04:48:29PM +0000, Will Deacon wrote:
> On Fri, Nov 01, 2024 at 10:25:03AM -0300, Jason Gunthorpe wrote:
> > On Fri, Nov 01, 2024 at 12:18:50PM +0000, Will Deacon wrote:
> > > On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote:
> > > > [This is now based on Nicolin's iommufd patches for vIOMMU and will need
> > > > to go through the iommufd tree, please ack]
> > >
> > > Can't we separate out the SMMUv3 driver changes? They shouldn't depend on
> > > Nicolin's work afaict.
> >
> > We can put everything before "iommu/arm-smmu-v3: Support
> > IOMMU_VIOMMU_ALLOC" directly on a rc and share a branch with your tree.
> >
> > That patch and following all depend on Nicolin's work, as they rely on
> > the VIOMMU due to how different ARM is from Intel.
> >
> > How about you take these patches:
> >
> > [PATCH v4 01/12] vfio: Remove VFIO_TYPE1_NESTING_IOMMU
> > [PATCH v4 02/12] ACPICA: IORT: Update for revision E.f
> > [PATCH v4 03/12] ACPI/IORT: Support CANWBS memory access flag
> > [PATCH v4 04/12] iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS
> > [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
> > [PATCH v4 06/12] iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT
> > [PATCH v4 07/12] iommu/arm-smmu-v3: Expose the arm_smmu_attach interface
> >
> > Onto a branch.
>
> I've pushed these onto a new branch in the IOMMU tree:
>
> iommufd/arm-smmuv3-nested
>
> However, please can you give it a day or two to get some exposure in
> -next before you merge that into iommufd? I'll ping back here later in
> the week.
Not seen anything go bang in -next, so I think we can consider this
branch stable now.
Will
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info
2024-11-08 14:53 ` Will Deacon
@ 2024-11-08 15:42 ` Jason Gunthorpe
0 siblings, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2024-11-08 15:42 UTC (permalink / raw)
To: Will Deacon
Cc: Robin Murphy, acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm,
Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Sudeep Holla, Alex Williamson,
Donald Dutile, Eric Auger, Hanjun Guo, Jean-Philippe Brucker,
Jerry Snitselaar, Moritz Fischer, Michael Shavit, Nicolin Chen,
patches, Rafael J. Wysocki, Shameerali Kolothum Thodi,
Mostafa Saleh
On Fri, Nov 08, 2024 at 02:53:22PM +0000, Will Deacon wrote:
> So, given where we are in the cycle, I think the pragmatic thing to do
> is to land this change now and enable the ongoing IOMMUFD work to
> continue. We still have over two months to resolve any major problems
> with the interface (and even unplug it entirely from the driver if we
> really get stuck) but for now I think it's "fine".
Thanks Will, this all eloquently captures my line of reasoning also.
I will add that after going through the exercise with Nicolin, to
write down all the fields the VMM is allowed to touch, it seems we do
have a list of future fields that likely do not require any host
support (E0PD, AIE, PBHA, D128, DS). We also have ones that will (BTM)
and then the fwspec weirdo of HTTU.
Given those ratios I feel pretty good about showing that data to the
userspace, expecting that down the road we will "turn it on" for
E0PD/etc without any kernel change, and BTM/HTTU will not use idr for
discovery.
The biggest risk is the VMM's badly muck it up. I now think it was a
lazy shortcut to not enumerate all the fields in the kdoc. Now that
we've done that we see that the (unreviewed) RFC qemu patches did miss
some things and it is being corrected.
Nicolin and I had some discussion if we should go ahead and include
E0PD/etc in the documentation right now, but we both felt some
uncertainty that we really understood those features deeply enough to
be confident, and have no way to test.
There is also the nanny option of zeroing everything not in the
approved list, but I feel like it is already so hard to write a VMM
vSMMU3 that is too nannyish. Hoepfully time will not show that my
opinion of VMM authors is too high. :\
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2024-10-31 0:20 [PATCH v4 00/12] Initial support for SMMUv3 nested translation Jason Gunthorpe
` (13 preceding siblings ...)
2024-11-01 12:18 ` Will Deacon
@ 2024-11-12 18:29 ` Jason Gunthorpe
2024-11-13 1:01 ` Zhangfei Gao
14 siblings, 1 reply; 58+ messages in thread
From: Jason Gunthorpe @ 2024-11-12 18:29 UTC (permalink / raw)
To: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon
Cc: Alex Williamson, Donald Dutile, Eric Auger, Hanjun Guo,
Jean-Philippe Brucker, Jerry Snitselaar, Moritz Fischer,
Michael Shavit, Nicolin Chen, patches, Rafael J. Wysocki,
Shameerali Kolothum Thodi, Mostafa Saleh
On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote:
> Jason Gunthorpe (7):
> iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED
> iommu/arm-smmu-v3: Use S2FWB for NESTED domains
> iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED
>
> Nicolin Chen (5):
> iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC
> iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object
Applied to iommufd for-next along with all the dependencies and the
additional hunk Zhangfei pointed out.
Thanks,
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2024-11-12 18:29 ` Jason Gunthorpe
@ 2024-11-13 1:01 ` Zhangfei Gao
2024-11-13 1:23 ` Jason Gunthorpe
0 siblings, 1 reply; 58+ messages in thread
From: Zhangfei Gao @ 2024-11-13 1:01 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On Wed, 13 Nov 2024 at 02:29, Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote:
> > Jason Gunthorpe (7):
> > iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED
> > iommu/arm-smmu-v3: Use S2FWB for NESTED domains
> > iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED
> >
> > Nicolin Chen (5):
> > iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC
> > iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object
>
> Applied to iommufd for-next along with all the dependencies and the
> additional hunk Zhangfei pointed out.
Thanks Jason, I have verified on aarch64 based on your
jason/smmuv3_nesting branch
https://github.com/Linaro/linux-kernel-uadk/tree/6.12-wip
https://github.com/Linaro/qemu/tree/6.12-wip
Still need this hack
https://github.com/Linaro/linux-kernel-uadk/commit/eaa194d954112cad4da7852e29343e546baf8683
One is adding iommu_dev_enable/disable_feature IOMMU_DEV_FEAT_SVA,
which you have patchset before.
The other is to temporarily ignore S2FWB or CANWBS.
Thanks
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2024-11-13 1:01 ` Zhangfei Gao
@ 2024-11-13 1:23 ` Jason Gunthorpe
2024-11-13 2:55 ` Baolu Lu
0 siblings, 1 reply; 58+ messages in thread
From: Jason Gunthorpe @ 2024-11-13 1:23 UTC (permalink / raw)
To: Zhangfei Gao
Cc: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On Wed, Nov 13, 2024 at 09:01:41AM +0800, Zhangfei Gao wrote:
> On Wed, 13 Nov 2024 at 02:29, Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Wed, Oct 30, 2024 at 09:20:44PM -0300, Jason Gunthorpe wrote:
> > > Jason Gunthorpe (7):
> > > iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED
> > > iommu/arm-smmu-v3: Use S2FWB for NESTED domains
> > > iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED
> > >
> > > Nicolin Chen (5):
> > > iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC
> > > iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object
> >
> > Applied to iommufd for-next along with all the dependencies and the
> > additional hunk Zhangfei pointed out.
>
> Thanks Jason, I have verified on aarch64 based on your
> jason/smmuv3_nesting branch
Great, thanks
> https://github.com/Linaro/linux-kernel-uadk/tree/6.12-wip
> https://github.com/Linaro/qemu/tree/6.12-wip
>
> Still need this hack
> https://github.com/Linaro/linux-kernel-uadk/commit/eaa194d954112cad4da7852e29343e546baf8683
>
> One is adding iommu_dev_enable/disable_feature IOMMU_DEV_FEAT_SVA,
> which you have patchset before.
Yes, I have a more complete version of that here someplace. Need some
help on vt-d but hope to get that done next cycle.
> The other is to temporarily ignore S2FWB or CANWBS.
Yes, ideally you should do that in the device FW, or perhaps via the
Linux ACPI patching?
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2024-11-13 1:23 ` Jason Gunthorpe
@ 2024-11-13 2:55 ` Baolu Lu
2024-11-13 6:28 ` Zhangfei Gao
2024-11-13 16:43 ` Jason Gunthorpe
0 siblings, 2 replies; 58+ messages in thread
From: Baolu Lu @ 2024-11-13 2:55 UTC (permalink / raw)
To: Jason Gunthorpe, Zhangfei Gao
Cc: acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm, Len Brown,
linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On 11/13/24 09:23, Jason Gunthorpe wrote:
>> https://github.com/Linaro/linux-kernel-uadk/tree/6.12-wip
>> https://github.com/Linaro/qemu/tree/6.12-wip
>>
>> Still need this hack
>> https://github.com/Linaro/linux-kernel-uadk/commit/
>> eaa194d954112cad4da7852e29343e546baf8683
>>
>> One is adding iommu_dev_enable/disable_feature IOMMU_DEV_FEAT_SVA,
>> which you have patchset before.
> Yes, I have a more complete version of that here someplace. Need some
> help on vt-d but hope to get that done next cycle.
Can you please elaborate this a bit more? Are you talking about below
change
+ ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_SVA);
+ if (ret)
+ return ret;
in iommufd_fault_iopf_enable()?
I have no idea about why SVA is affected when enabling iopf.
--
baolu
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2024-11-13 2:55 ` Baolu Lu
@ 2024-11-13 6:28 ` Zhangfei Gao
2024-11-13 16:43 ` Jason Gunthorpe
1 sibling, 0 replies; 58+ messages in thread
From: Zhangfei Gao @ 2024-11-13 6:28 UTC (permalink / raw)
To: Baolu Lu
Cc: Jason Gunthorpe, acpica-devel, iommu, Joerg Roedel, Kevin Tian,
kvm, Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
Hi, Baolu
On Wed, 13 Nov 2024 at 10:56, Baolu Lu <baolu.lu@linux.intel.com> wrote:
>
> On 11/13/24 09:23, Jason Gunthorpe wrote:
> >> https://github.com/Linaro/linux-kernel-uadk/tree/6.12-wip
> >> https://github.com/Linaro/qemu/tree/6.12-wip
> >>
> >> Still need this hack
> >> https://github.com/Linaro/linux-kernel-uadk/commit/
> >> eaa194d954112cad4da7852e29343e546baf8683
> >>
> >> One is adding iommu_dev_enable/disable_feature IOMMU_DEV_FEAT_SVA,
> >> which you have patchset before.
> > Yes, I have a more complete version of that here someplace. Need some
> > help on vt-d but hope to get that done next cycle.
>
> Can you please elaborate this a bit more? Are you talking about below
> change
>
> + ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_SVA);
> + if (ret)
> + return ret;
>
> in iommufd_fault_iopf_enable()?
>
> I have no idea about why SVA is affected when enabling iopf.
In drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c,
iommu_dev_enable_feature(IOMMU_DEV_FEAT_SVA) will real call
iopf_queue_add_device,
while iommu_dev_enable_feature(IOPF) only set flag.
arm_smmu_dev_enable_feature
case IOMMU_DEV_FEAT_SVA:
arm_smmu_master_enable_sva(master)
iopf_queue_add_device(master->smmu->evtq.iopf, dev);
Thanks
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2024-11-13 2:55 ` Baolu Lu
2024-11-13 6:28 ` Zhangfei Gao
@ 2024-11-13 16:43 ` Jason Gunthorpe
2024-11-14 0:51 ` Baolu Lu
1 sibling, 1 reply; 58+ messages in thread
From: Jason Gunthorpe @ 2024-11-13 16:43 UTC (permalink / raw)
To: Baolu Lu
Cc: Zhangfei Gao, acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm,
Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On Wed, Nov 13, 2024 at 10:55:41AM +0800, Baolu Lu wrote:
> On 11/13/24 09:23, Jason Gunthorpe wrote:
> > > https://github.com/Linaro/linux-kernel-uadk/tree/6.12-wip
> > > https://github.com/Linaro/qemu/tree/6.12-wip
> > >
> > > Still need this hack
> > > https://github.com/Linaro/linux-kernel-uadk/commit/
> > > eaa194d954112cad4da7852e29343e546baf8683
> > >
> > > One is adding iommu_dev_enable/disable_feature IOMMU_DEV_FEAT_SVA,
> > > which you have patchset before.
> > Yes, I have a more complete version of that here someplace. Need some
> > help on vt-d but hope to get that done next cycle.
>
> Can you please elaborate this a bit more? Are you talking about below
> change
I need your help to remove IOMMU_DEV_FEAT_IOPF from the intel
driver. I have a patch series that eliminates it from all the other
drivers, and I wrote a patch to remove FEAT_SVA from intel..
> + ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_SVA);
> + if (ret)
> + return ret;
>
> in iommufd_fault_iopf_enable()?
>
> I have no idea about why SVA is affected when enabling iopf.
It is ARM not implementing the API correctly. Only SVA turns on the
page fault reporting mechanism.
In the new world the page fault reporting should be managed during
domain attachment. If the domain is fault capable then faults should
be delivered to that domain. That is the correct time to setup the
iopf mechanism as well.
So I fixed that and now ARM and AMD both have no-op implementations of
IOMMU_DEV_FEAT_IOPF and IOMMU_DEV_FEAT_SVA. Thus I'd like to remove it
entirely.
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2024-11-13 16:43 ` Jason Gunthorpe
@ 2024-11-14 0:51 ` Baolu Lu
2024-11-15 17:55 ` Jason Gunthorpe
0 siblings, 1 reply; 58+ messages in thread
From: Baolu Lu @ 2024-11-14 0:51 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Zhangfei Gao, acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm,
Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On 11/14/24 00:43, Jason Gunthorpe wrote:
> On Wed, Nov 13, 2024 at 10:55:41AM +0800, Baolu Lu wrote:
>> On 11/13/24 09:23, Jason Gunthorpe wrote:
>>>> https://github.com/Linaro/linux-kernel-uadk/tree/6.12-wip
>>>> https://github.com/Linaro/qemu/tree/6.12-wip
>>>>
>>>> Still need this hack
>>>> https://github.com/Linaro/linux-kernel-uadk/commit/
>>>> eaa194d954112cad4da7852e29343e546baf8683
>>>>
>>>> One is adding iommu_dev_enable/disable_feature IOMMU_DEV_FEAT_SVA,
>>>> which you have patchset before.
>>> Yes, I have a more complete version of that here someplace. Need some
>>> help on vt-d but hope to get that done next cycle.
>>
>> Can you please elaborate this a bit more? Are you talking about below
>> change
>
> I need your help to remove IOMMU_DEV_FEAT_IOPF from the intel
> driver. I have a patch series that eliminates it from all the other
> drivers, and I wrote a patch to remove FEAT_SVA from intel..
Yes, sure. Let's make this happen in the next cycle.
FEAT_IOPF could be removed. IOPF manipulation can be handled in the
domain attachment path. A per-device refcount can be implemented. This
count increments with each iopf-capable domain attachment and decrements
with each detachment. PCI PRI is enabled for the first iopf-capable
domain and disabled when the last one is removed. Probably we can also
solve the PF/VF sharing PRI issue.
With iopf moved to the domain attach path and hardware capability checks
to the SVA domain allocation path, FEAT_SVA becomes essentially a no-op.
>
>> + ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_SVA);
>> + if (ret)
>> + return ret;
>>
>> in iommufd_fault_iopf_enable()?
>>
>> I have no idea about why SVA is affected when enabling iopf.
>
> It is ARM not implementing the API correctly. Only SVA turns on the
> page fault reporting mechanism.
>
> In the new world the page fault reporting should be managed during
> domain attachment. If the domain is fault capable then faults should
> be delivered to that domain. That is the correct time to setup the
> iopf mechanism as well.
>
> So I fixed that and now ARM and AMD both have no-op implementations of
> IOMMU_DEV_FEAT_IOPF and IOMMU_DEV_FEAT_SVA. Thus I'd like to remove it
> entirely.
Thank you for the explanation.
--
baolu
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2024-11-14 0:51 ` Baolu Lu
@ 2024-11-15 17:55 ` Jason Gunthorpe
2025-01-22 19:26 ` Jason Gunthorpe
0 siblings, 1 reply; 58+ messages in thread
From: Jason Gunthorpe @ 2024-11-15 17:55 UTC (permalink / raw)
To: Baolu Lu
Cc: Zhangfei Gao, acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm,
Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On Thu, Nov 14, 2024 at 08:51:47AM +0800, Baolu Lu wrote:
> On 11/14/24 00:43, Jason Gunthorpe wrote:
> > On Wed, Nov 13, 2024 at 10:55:41AM +0800, Baolu Lu wrote:
> > > On 11/13/24 09:23, Jason Gunthorpe wrote:
> > > > > https://github.com/Linaro/linux-kernel-uadk/tree/6.12-wip
> > > > > https://github.com/Linaro/qemu/tree/6.12-wip
> > > > >
> > > > > Still need this hack
> > > > > https://github.com/Linaro/linux-kernel-uadk/commit/
> > > > > eaa194d954112cad4da7852e29343e546baf8683
> > > > >
> > > > > One is adding iommu_dev_enable/disable_feature IOMMU_DEV_FEAT_SVA,
> > > > > which you have patchset before.
> > > > Yes, I have a more complete version of that here someplace. Need some
> > > > help on vt-d but hope to get that done next cycle.
> > >
> > > Can you please elaborate this a bit more? Are you talking about below
> > > change
> >
> > I need your help to remove IOMMU_DEV_FEAT_IOPF from the intel
> > driver. I have a patch series that eliminates it from all the other
> > drivers, and I wrote a patch to remove FEAT_SVA from intel..
>
> Yes, sure. Let's make this happen in the next cycle.
>
> FEAT_IOPF could be removed. IOPF manipulation can be handled in the
> domain attachment path. A per-device refcount can be implemented. This
> count increments with each iopf-capable domain attachment and decrements
> with each detachment. PCI PRI is enabled for the first iopf-capable
> domain and disabled when the last one is removed. Probably we can also
> solve the PF/VF sharing PRI issue.
Here is what I have so far, if you send me a patch for vt-d to move
FEAT_IOPF into attach as you describe above (see what I did to arm for
example), then I can send it next cycle
https://github.com/jgunthorpe/linux/commits/iommu_no_feat/
Thanks,
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* RE: [PATCH v4 08/12] iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC
2024-10-31 0:20 ` [PATCH v4 08/12] iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC Jason Gunthorpe
@ 2024-11-29 14:38 ` Shameerali Kolothum Thodi
2024-11-29 15:06 ` Jason Gunthorpe
0 siblings, 1 reply; 58+ messages in thread
From: Shameerali Kolothum Thodi @ 2024-11-29 14:38 UTC (permalink / raw)
To: Jason Gunthorpe, acpica-devel@lists.linux.dev,
iommu@lists.linux.dev, Joerg Roedel, Kevin Tian,
kvm@vger.kernel.org, Len Brown, linux-acpi@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon
Cc: Alex Williamson, Donald Dutile, Eric Auger,
Guohanjun (Hanjun Guo), Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen,
patches@lists.linux.dev, Rafael J. Wysocki, Mostafa Saleh
Hi Jason,
> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, October 31, 2024 12:21 AM
> To: acpica-devel@lists.linux.dev; iommu@lists.linux.dev; Joerg Roedel
> <joro@8bytes.org>; Kevin Tian <kevin.tian@intel.com>;
> kvm@vger.kernel.org; Len Brown <lenb@kernel.org>; linux-
> acpi@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Lorenzo
> Pieralisi <lpieralisi@kernel.org>; Rafael J. Wysocki <rafael@kernel.org>;
> Robert Moore <robert.moore@intel.com>; Robin Murphy
> <robin.murphy@arm.com>; Sudeep Holla <sudeep.holla@arm.com>; Will
> Deacon <will@kernel.org>
> Cc: Alex Williamson <alex.williamson@redhat.com>; Donald Dutile
> <ddutile@redhat.com>; Eric Auger <eric.auger@redhat.com>; Guohanjun
> (Hanjun Guo) <guohanjun@huawei.com>; Jean-Philippe Brucker <jean-
> philippe@linaro.org>; Jerry Snitselaar <jsnitsel@redhat.com>; Moritz
> Fischer <mdf@kernel.org>; Michael Shavit <mshavit@google.com>; Nicolin
> Chen <nicolinc@nvidia.com>; patches@lists.linux.dev; Rafael J. Wysocki
> <rafael.j.wysocki@intel.com>; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Mostafa Saleh
> <smostafa@google.com>
> Subject: [PATCH v4 08/12] iommu/arm-smmu-v3: Support
> IOMMU_VIOMMU_ALLOC
[...]
> +struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev,
> + struct iommu_domain *parent,
> + struct iommufd_ctx *ictx,
> + unsigned int viommu_type)
> +{
> + struct arm_smmu_device *smmu =
> + iommu_get_iommu_dev(dev, struct arm_smmu_device,
> iommu);
> + struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> + struct arm_smmu_domain *s2_parent = to_smmu_domain(parent);
> + struct arm_vsmmu *vsmmu;
> +
> + if (viommu_type != IOMMU_VIOMMU_TYPE_ARM_SMMUV3)
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + if (!(smmu->features & ARM_SMMU_FEAT_NESTING))
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + if (s2_parent->smmu != master->smmu)
> + return ERR_PTR(-EINVAL);
> +
> + /*
> + * Must support some way to prevent the VM from bypassing the
> cache
> + * because VFIO currently does not do any cache maintenance.
> canwbs
> + * indicates the device is fully coherent and no cache maintenance
> is
> + * ever required, even for PCI No-Snoop."
> + */
> + if (!arm_smmu_master_canwbs(master))
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + vsmmu = iommufd_viommu_alloc(ictx, struct arm_vsmmu, core,
> + &arm_vsmmu_ops);
> + if (IS_ERR(vsmmu))
> + return ERR_CAST(vsmmu);
> +
> + vsmmu->smmu = smmu;
> + vsmmu->s2_parent = s2_parent;
> + /* FIXME Move VMID allocation from the S2 domain allocation to
> here */
I am planning to respin the " iommu/arm-smmu-v3: Use pinned KVM VMID for
stage 2" [0] based on the latest IOMMUF code. One of the comment on that
RFC was, we should associate the KVM pointer to the sub objects like viommu
instead of iommufd itself[1]. But at present the s2 domain is already finalized
with a VMID before a viommu object is allocated.
So does the above comment indicates that we plan to do another
S2 VMID allocation here and replace the old one?
Please let me know your thoughts on this.
Thanks,
Shameer
[0] https://lore.kernel.org/linux-iommu/20240208151837.35068-1-shameerali.kolothum.thodi@huawei.com/
[1] https://lore.kernel.org/linux-iommu/BN9PR11MB527662A2AB0A9BABD5E20E6D8CD52@BN9PR11MB5276.namprd11.prod.outlook.com/
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 08/12] iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC
2024-11-29 14:38 ` Shameerali Kolothum Thodi
@ 2024-11-29 15:06 ` Jason Gunthorpe
0 siblings, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2024-11-29 15:06 UTC (permalink / raw)
To: Shameerali Kolothum Thodi
Cc: acpica-devel@lists.linux.dev, iommu@lists.linux.dev, Joerg Roedel,
Kevin Tian, kvm@vger.kernel.org, Len Brown,
linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
Lorenzo Pieralisi, Rafael J. Wysocki, Robert Moore, Robin Murphy,
Sudeep Holla, Will Deacon, Alex Williamson, Donald Dutile,
Eric Auger, Guohanjun (Hanjun Guo), Jean-Philippe Brucker,
Jerry Snitselaar, Moritz Fischer, Michael Shavit, Nicolin Chen,
patches@lists.linux.dev, Rafael J. Wysocki, Mostafa Saleh
On Fri, Nov 29, 2024 at 02:38:10PM +0000, Shameerali Kolothum Thodi wrote:
> > + /* FIXME Move VMID allocation from the S2 domain allocation to
> > here */
>
> I am planning to respin the " iommu/arm-smmu-v3: Use pinned KVM VMID for
> stage 2" [0] based on the latest IOMMUF code. One of the comment on that
> RFC was, we should associate the KVM pointer to the sub objects like viommu
> instead of iommufd itself[1].
Yes
> But at present the s2 domain is already finalized
> with a VMID before a viommu object is allocated.
Right, this needs fixing up. The vmid must come from the viommu and
the same s2 domain shared by viommus needs different vmids.
> So does the above comment indicates that we plan to do another
> S2 VMID allocation here and replace the old one?
I have been planning to remove the vmid and asid values from the
domains, along with the instance pointers.
When the domain is attached in some way the vmid and instance will be
recorded in a list.
A viommu attach will use the vmid from the viommu, a normal attach
will allocate a vmid for each instance.
This will also allow S2's to be shared across instances which is
another issue we have (10 copies of the S2 on big server HW is
wasteful)
It is simple enough to explain, but I think the datastructure will be
a bit tricky to keep invalidations efficient.
I haven't even started typing it in yet. But it is the next step in
this project. The vcmdq has the same issue with vmid assignment as you
are facing.
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2024-11-15 17:55 ` Jason Gunthorpe
@ 2025-01-22 19:26 ` Jason Gunthorpe
2025-02-05 3:45 ` Baolu Lu
0 siblings, 1 reply; 58+ messages in thread
From: Jason Gunthorpe @ 2025-01-22 19:26 UTC (permalink / raw)
To: Baolu Lu
Cc: Zhangfei Gao, acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm,
Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On Fri, Nov 15, 2024 at 01:55:22PM -0400, Jason Gunthorpe wrote:
> > > I need your help to remove IOMMU_DEV_FEAT_IOPF from the intel
> > > driver. I have a patch series that eliminates it from all the other
> > > drivers, and I wrote a patch to remove FEAT_SVA from intel..
> >
> > Yes, sure. Let's make this happen in the next cycle.
> >
> > FEAT_IOPF could be removed. IOPF manipulation can be handled in the
> > domain attachment path. A per-device refcount can be implemented. This
> > count increments with each iopf-capable domain attachment and decrements
> > with each detachment. PCI PRI is enabled for the first iopf-capable
> > domain and disabled when the last one is removed. Probably we can also
> > solve the PF/VF sharing PRI issue.
>
> Here is what I have so far, if you send me a patch for vt-d to move
> FEAT_IOPF into attach as you describe above (see what I did to arm for
> example), then I can send it next cycle
>
> https://github.com/jgunthorpe/linux/commits/iommu_no_feat/
Hey Baolu, a reminder on this, lets try for it next cycle?
Thanks,
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2025-01-22 19:26 ` Jason Gunthorpe
@ 2025-02-05 3:45 ` Baolu Lu
2025-02-13 11:57 ` Baolu Lu
0 siblings, 1 reply; 58+ messages in thread
From: Baolu Lu @ 2025-02-05 3:45 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: baolu.lu, Zhangfei Gao, acpica-devel, iommu, Joerg Roedel,
Kevin Tian, kvm, Len Brown, linux-acpi, linux-arm-kernel,
Lorenzo Pieralisi, Rafael J. Wysocki, Robert Moore, Robin Murphy,
Sudeep Holla, Will Deacon, Alex Williamson, Donald Dutile,
Eric Auger, Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On 2025/1/23 3:26, Jason Gunthorpe wrote:
> On Fri, Nov 15, 2024 at 01:55:22PM -0400, Jason Gunthorpe wrote:
>>>> I need your help to remove IOMMU_DEV_FEAT_IOPF from the intel
>>>> driver. I have a patch series that eliminates it from all the other
>>>> drivers, and I wrote a patch to remove FEAT_SVA from intel..
>>> Yes, sure. Let's make this happen in the next cycle.
>>>
>>> FEAT_IOPF could be removed. IOPF manipulation can be handled in the
>>> domain attachment path. A per-device refcount can be implemented. This
>>> count increments with each iopf-capable domain attachment and decrements
>>> with each detachment. PCI PRI is enabled for the first iopf-capable
>>> domain and disabled when the last one is removed. Probably we can also
>>> solve the PF/VF sharing PRI issue.
>> Here is what I have so far, if you send me a patch for vt-d to move
>> FEAT_IOPF into attach as you describe above (see what I did to arm for
>> example), then I can send it next cycle
>>
>> https://github.com/jgunthorpe/linux/commits/iommu_no_feat/
> Hey Baolu, a reminder on this, lets try for it next cycle?
Oh, I forgot this. Thanks for the reminding. Sure, let's try to make it
in the next cycle.
---
baolu
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2025-02-05 3:45 ` Baolu Lu
@ 2025-02-13 11:57 ` Baolu Lu
2025-02-13 18:43 ` Jason Gunthorpe
0 siblings, 1 reply; 58+ messages in thread
From: Baolu Lu @ 2025-02-13 11:57 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: baolu.lu, Zhangfei Gao, acpica-devel, iommu, Joerg Roedel,
Kevin Tian, kvm, Len Brown, linux-acpi, linux-arm-kernel,
Lorenzo Pieralisi, Rafael J. Wysocki, Robert Moore, Robin Murphy,
Sudeep Holla, Will Deacon, Alex Williamson, Donald Dutile,
Eric Auger, Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On 2025/2/5 11:45, Baolu Lu wrote:
> On 2025/1/23 3:26, Jason Gunthorpe wrote:
>> On Fri, Nov 15, 2024 at 01:55:22PM -0400, Jason Gunthorpe wrote:
>>>>> I need your help to remove IOMMU_DEV_FEAT_IOPF from the intel
>>>>> driver. I have a patch series that eliminates it from all the other
>>>>> drivers, and I wrote a patch to remove FEAT_SVA from intel..
>>>> Yes, sure. Let's make this happen in the next cycle.
>>>>
>>>> FEAT_IOPF could be removed. IOPF manipulation can be handled in the
>>>> domain attachment path. A per-device refcount can be implemented. This
>>>> count increments with each iopf-capable domain attachment and
>>>> decrements
>>>> with each detachment. PCI PRI is enabled for the first iopf-capable
>>>> domain and disabled when the last one is removed. Probably we can also
>>>> solve the PF/VF sharing PRI issue.
>>> Here is what I have so far, if you send me a patch for vt-d to move
>>> FEAT_IOPF into attach as you describe above (see what I did to arm for
>>> example), then I can send it next cycle
>>>
>>> https://github.com/jgunthorpe/linux/commits/iommu_no_feat/
>> Hey Baolu, a reminder on this, lets try for it next cycle?
>
> Oh, I forgot this. Thanks for the reminding. Sure, let's try to make it
> in the next cycle.
Hi Jason,
I've worked through the entire series. The patches are available here:
https://github.com/LuBaolu/intel-iommu/commits/iommu-no-feat-v6.14-rc2
Please check if this is the right direction.
Thanks,
baolu
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2025-02-13 11:57 ` Baolu Lu
@ 2025-02-13 18:43 ` Jason Gunthorpe
2025-02-14 5:39 ` Baolu Lu
0 siblings, 1 reply; 58+ messages in thread
From: Jason Gunthorpe @ 2025-02-13 18:43 UTC (permalink / raw)
To: Baolu Lu
Cc: Zhangfei Gao, acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm,
Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On Thu, Feb 13, 2025 at 07:57:51PM +0800, Baolu Lu wrote:
> On 2025/2/5 11:45, Baolu Lu wrote:
> > On 2025/1/23 3:26, Jason Gunthorpe wrote:
> > > On Fri, Nov 15, 2024 at 01:55:22PM -0400, Jason Gunthorpe wrote:
> > > > > > I need your help to remove IOMMU_DEV_FEAT_IOPF from the intel
> > > > > > driver. I have a patch series that eliminates it from all the other
> > > > > > drivers, and I wrote a patch to remove FEAT_SVA from intel..
> > > > > Yes, sure. Let's make this happen in the next cycle.
> > > > >
> > > > > FEAT_IOPF could be removed. IOPF manipulation can be handled in the
> > > > > domain attachment path. A per-device refcount can be implemented. This
> > > > > count increments with each iopf-capable domain attachment
> > > > > and decrements
> > > > > with each detachment. PCI PRI is enabled for the first iopf-capable
> > > > > domain and disabled when the last one is removed. Probably we can also
> > > > > solve the PF/VF sharing PRI issue.
> > > > Here is what I have so far, if you send me a patch for vt-d to move
> > > > FEAT_IOPF into attach as you describe above (see what I did to arm for
> > > > example), then I can send it next cycle
> > > >
> > > > https://github.com/jgunthorpe/linux/commits/iommu_no_feat/
> > > Hey Baolu, a reminder on this, lets try for it next cycle?
> >
> > Oh, I forgot this. Thanks for the reminding. Sure, let's try to make it
> > in the next cycle.
>
> Hi Jason,
>
> I've worked through the entire series. The patches are available here:
>
> https://github.com/LuBaolu/intel-iommu/commits/iommu-no-feat-v6.14-rc2
>
> Please check if this is the right direction.
Looks great, and you did all the cleanup stuff too!
The vt-d flow is a little more complicated than the ARM logic because
the driver flow is structed differently.
Do we really want ATS turned on if the only thing attached is an
IDENTITY domain? That will unnecessarily slow down ATS capable HW.. It
is functionally OK though.
Also, are there enough ATC flushes around any domain type change? I
didn't check..
I feel like we should leave "iommu: Move PRI enablement for VF to
iommu driver" out for now, every driver needs this check? AMD
supports SVA and PRI so it needs it too.
Do you want to squash those fixup patches and post it?
Thanks,
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2025-02-13 18:43 ` Jason Gunthorpe
@ 2025-02-14 5:39 ` Baolu Lu
2025-02-14 12:41 ` Jason Gunthorpe
0 siblings, 1 reply; 58+ messages in thread
From: Baolu Lu @ 2025-02-14 5:39 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Zhangfei Gao, acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm,
Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On 2/14/25 02:43, Jason Gunthorpe wrote:
> On Thu, Feb 13, 2025 at 07:57:51PM +0800, Baolu Lu wrote:
>> On 2025/2/5 11:45, Baolu Lu wrote:
>>> On 2025/1/23 3:26, Jason Gunthorpe wrote:
>>>> On Fri, Nov 15, 2024 at 01:55:22PM -0400, Jason Gunthorpe wrote:
>>>>>>> I need your help to remove IOMMU_DEV_FEAT_IOPF from the intel
>>>>>>> driver. I have a patch series that eliminates it from all the other
>>>>>>> drivers, and I wrote a patch to remove FEAT_SVA from intel..
>>>>>> Yes, sure. Let's make this happen in the next cycle.
>>>>>>
>>>>>> FEAT_IOPF could be removed. IOPF manipulation can be handled in the
>>>>>> domain attachment path. A per-device refcount can be implemented. This
>>>>>> count increments with each iopf-capable domain attachment
>>>>>> and decrements
>>>>>> with each detachment. PCI PRI is enabled for the first iopf-capable
>>>>>> domain and disabled when the last one is removed. Probably we can also
>>>>>> solve the PF/VF sharing PRI issue.
>>>>> Here is what I have so far, if you send me a patch for vt-d to move
>>>>> FEAT_IOPF into attach as you describe above (see what I did to arm for
>>>>> example), then I can send it next cycle
>>>>>
>>>>> https://github.com/jgunthorpe/linux/commits/iommu_no_feat/
>>>> Hey Baolu, a reminder on this, lets try for it next cycle?
>>>
>>> Oh, I forgot this. Thanks for the reminding. Sure, let's try to make it
>>> in the next cycle.
>>
>> Hi Jason,
>>
>> I've worked through the entire series. The patches are available here:
>>
>> https://github.com/LuBaolu/intel-iommu/commits/iommu-no-feat-v6.14-rc2
>>
>> Please check if this is the right direction.
>
> Looks great, and you did all the cleanup stuff too!
>
> The vt-d flow is a little more complicated than the ARM logic because
> the driver flow is structed differently.
>
> Do we really want ATS turned on if the only thing attached is an
> IDENTITY domain? That will unnecessarily slow down ATS capable HW.. It
> is functionally OK though.
It depends. The Intel driver uses a simple approach.
When the IOMMU is working in legacy mode, PASID and PRI are not
supported. In this case, ATS will not be enabled if the identity domain
is attached to the device.
When the IOMMU is working in scalable mode, PASID and PRI are supported.
ATS will always be enabled, even if the identity domain is attached to
the device, because the PASID might use PRI, which depends on ATS
functionality. This might not be the best choice, but it is the
simplest and functional.
If any device does not work with the identity domain and ATS enabled,
then we can add a quirk to the driver, as we already have such a
mechanism.
>
> Also, are there enough ATC flushes around any domain type change? I
> didn't check..
The VT-d specification defines guidance for software to manage caches,
both the IOTLB in the IOMMU and the ATC on the device (Table 28,
Guidance to Software for Invalidations). The driver was written
according to that guidance.
> I feel like we should leave "iommu: Move PRI enablement for VF to
> iommu driver" out for now, every driver needs this check? AMD
> supports SVA and PRI so it needs it too.
Yes, agreed.
>
> Do you want to squash those fixup patches and post it?
Yes, sure. Let me post it for further review.
> Thanks,
> Jason
Thanks,
baolu
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2025-02-14 5:39 ` Baolu Lu
@ 2025-02-14 12:41 ` Jason Gunthorpe
2025-02-15 9:53 ` Baolu Lu
0 siblings, 1 reply; 58+ messages in thread
From: Jason Gunthorpe @ 2025-02-14 12:41 UTC (permalink / raw)
To: Baolu Lu
Cc: Zhangfei Gao, acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm,
Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On Fri, Feb 14, 2025 at 01:39:52PM +0800, Baolu Lu wrote:
> When the IOMMU is working in scalable mode, PASID and PRI are supported.
> ATS will always be enabled, even if the identity domain is attached to
> the device, because the PASID might use PRI, which depends on ATS
> functionality. This might not be the best choice, but it is the
> simplest and functional.
The arm driver keeps track of things and enables ATS when PASIDs are
present
> If any device does not work with the identity domain and ATS enabled,
> then we can add a quirk to the driver, as we already have such a
> mechanism.
The device should not care, as long as the HW works.. ARM has a weird
quirk where one of the two ways to configure an identity domain does
not work with ATS. If you have something like that as well then you
have to switch ATS off if the IOMMU is in a state where it will not
respond to it.
Otherwise, the HW I'm aware of uses ATS pretty selectively and it may
not make any real difference..
> > I feel like we should leave "iommu: Move PRI enablement for VF to
> > iommu driver" out for now, every driver needs this check? AMD
> > supports SVA and PRI so it needs it too.
>
> Yes, agreed.
Although, I'm wondering now, that check should be on the SVA paths as
well as the iommufd path..
> > Do you want to squash those fixup patches and post it?
>
> Yes, sure. Let me post it for further review.
Thanks
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2025-02-14 12:41 ` Jason Gunthorpe
@ 2025-02-15 9:53 ` Baolu Lu
2025-02-18 13:03 ` Jason Gunthorpe
0 siblings, 1 reply; 58+ messages in thread
From: Baolu Lu @ 2025-02-15 9:53 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Zhangfei Gao, acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm,
Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On 2/14/25 20:41, Jason Gunthorpe wrote:
> On Fri, Feb 14, 2025 at 01:39:52PM +0800, Baolu Lu wrote:
>
>> When the IOMMU is working in scalable mode, PASID and PRI are supported.
>> ATS will always be enabled, even if the identity domain is attached to
>> the device, because the PASID might use PRI, which depends on ATS
>> functionality. This might not be the best choice, but it is the
>> simplest and functional.
> The arm driver keeps track of things and enables ATS when PASIDs are
> present
I am not aware of any VT-d hardware implementation that supports
scalable mode but not PASID. If there were one, it would be worthwhile
to add an optimization to avoid enabling ATS during probe if PASID is
not supported.
>
>> If any device does not work with the identity domain and ATS enabled,
>> then we can add a quirk to the driver, as we already have such a
>> mechanism.
> The device should not care, as long as the HW works.. ARM has a weird
> quirk where one of the two ways to configure an identity domain does
> not work with ATS. If you have something like that as well then you
> have to switch ATS off if the IOMMU is in a state where it will not
> respond to it.
>
> Otherwise, the HW I'm aware of uses ATS pretty selectively and it may
> not make any real difference..
>
>>> I feel like we should leave "iommu: Move PRI enablement for VF to
>>> iommu driver" out for now, every driver needs this check? AMD
>>> supports SVA and PRI so it needs it too.
>> Yes, agreed.
> Although, I'm wondering now, that check should be on the SVA paths as
> well as the iommufd path..
That appears to be a fix.
Thanks,
baolu
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2025-02-15 9:53 ` Baolu Lu
@ 2025-02-18 13:03 ` Jason Gunthorpe
2025-02-19 2:09 ` Baolu Lu
0 siblings, 1 reply; 58+ messages in thread
From: Jason Gunthorpe @ 2025-02-18 13:03 UTC (permalink / raw)
To: Baolu Lu
Cc: Zhangfei Gao, acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm,
Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On Sat, Feb 15, 2025 at 05:53:13PM +0800, Baolu Lu wrote:
> On 2/14/25 20:41, Jason Gunthorpe wrote:
> > On Fri, Feb 14, 2025 at 01:39:52PM +0800, Baolu Lu wrote:
> >
> > > When the IOMMU is working in scalable mode, PASID and PRI are supported.
> > > ATS will always be enabled, even if the identity domain is attached to
> > > the device, because the PASID might use PRI, which depends on ATS
> > > functionality. This might not be the best choice, but it is the
> > > simplest and functional.
> > The arm driver keeps track of things and enables ATS when PASIDs are
> > present
>
> I am not aware of any VT-d hardware implementation that supports
> scalable mode but not PASID. If there were one, it would be worthwhile
> to add an optimization to avoid enabling ATS during probe if PASID is
> not supported.
I mean domains attached to PASIDs that need PRI/ATS/etc
> > Although, I'm wondering now, that check should be on the SVA paths as
> > well as the iommufd path..
>
> That appears to be a fix.
Does SVA have the same issue?
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2025-02-18 13:03 ` Jason Gunthorpe
@ 2025-02-19 2:09 ` Baolu Lu
2025-02-19 8:34 ` Tian, Kevin
0 siblings, 1 reply; 58+ messages in thread
From: Baolu Lu @ 2025-02-19 2:09 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Zhangfei Gao, acpica-devel, iommu, Joerg Roedel, Kevin Tian, kvm,
Len Brown, linux-acpi, linux-arm-kernel, Lorenzo Pieralisi,
Rafael J. Wysocki, Robert Moore, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
Rafael J. Wysocki, Shameerali Kolothum Thodi, Mostafa Saleh
On 2/18/25 21:03, Jason Gunthorpe wrote:
> On Sat, Feb 15, 2025 at 05:53:13PM +0800, Baolu Lu wrote:
>> On 2/14/25 20:41, Jason Gunthorpe wrote:
>>> On Fri, Feb 14, 2025 at 01:39:52PM +0800, Baolu Lu wrote:
>>>
>>>> When the IOMMU is working in scalable mode, PASID and PRI are supported.
>>>> ATS will always be enabled, even if the identity domain is attached to
>>>> the device, because the PASID might use PRI, which depends on ATS
>>>> functionality. This might not be the best choice, but it is the
>>>> simplest and functional.
>>> The arm driver keeps track of things and enables ATS when PASIDs are
>>> present
>> I am not aware of any VT-d hardware implementation that supports
>> scalable mode but not PASID. If there were one, it would be worthwhile
>> to add an optimization to avoid enabling ATS during probe if PASID is
>> not supported.
> I mean domains attached to PASIDs that need PRI/ATS/etc
Yeah, that's a better solution. The PCI PRI/ATS features are only
enabled when a domain that requires them is attached to it. I will
consider it in the Intel driver later.
>>> Although, I'm wondering now, that check should be on the SVA paths as
>>> well as the iommufd path..
>> That appears to be a fix.
> Does SVA have the same issue?
One case I can think of is SVA on SR-IOV VFs. Without the in-progress
iopf refcount patch series, enabling and disabling iopf could be
problematic, because all PRI enablement is switched in the PF, it's
possible that enable and disable operations won't be paired correctly.
Another issue is that a failure or invalid page group response may halt
the PRI interface, which would cause SVA on other VFs to stop working.
So, we should probably disable SVA on VFs for now and re-enable it after
all these issues are resolved.
Thanks,
baolu
^ permalink raw reply [flat|nested] 58+ messages in thread
* RE: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2025-02-19 2:09 ` Baolu Lu
@ 2025-02-19 8:34 ` Tian, Kevin
2025-02-20 2:10 ` Baolu Lu
0 siblings, 1 reply; 58+ messages in thread
From: Tian, Kevin @ 2025-02-19 8:34 UTC (permalink / raw)
To: Baolu Lu, Jason Gunthorpe
Cc: Zhangfei Gao, acpica-devel@lists.linux.dev, iommu@lists.linux.dev,
Joerg Roedel, kvm@vger.kernel.org, Len Brown,
linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
Lorenzo Pieralisi, Rafael J. Wysocki, Moore, Robert, Robin Murphy,
Sudeep Holla, Will Deacon, Alex Williamson, Donald Dutile,
Eric Auger, Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen,
patches@lists.linux.dev, Wysocki, Rafael J,
Shameerali Kolothum Thodi, Mostafa Saleh
> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Wednesday, February 19, 2025 10:10 AM
>
> On 2/18/25 21:03, Jason Gunthorpe wrote:
> > On Sat, Feb 15, 2025 at 05:53:13PM +0800, Baolu Lu wrote:
> >> On 2/14/25 20:41, Jason Gunthorpe wrote:
> >>> On Fri, Feb 14, 2025 at 01:39:52PM +0800, Baolu Lu wrote:
> >>>
> >>>> When the IOMMU is working in scalable mode, PASID and PRI are
> supported.
> >>>> ATS will always be enabled, even if the identity domain is attached to
> >>>> the device, because the PASID might use PRI, which depends on ATS
> >>>> functionality. This might not be the best choice, but it is the
> >>>> simplest and functional.
> >>> The arm driver keeps track of things and enables ATS when PASIDs are
> >>> present
> >> I am not aware of any VT-d hardware implementation that supports
> >> scalable mode but not PASID. If there were one, it would be worthwhile
> >> to add an optimization to avoid enabling ATS during probe if PASID is
> >> not supported.
> > I mean domains attached to PASIDs that need PRI/ATS/etc
>
> Yeah, that's a better solution. The PCI PRI/ATS features are only
> enabled when a domain that requires them is attached to it. I will
> consider it in the Intel driver later.
>
I didn't get the connection here. ATS can run w/o PASID per PCIe
spec. Why do we want to add a dependency on PASID here?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2025-02-19 8:34 ` Tian, Kevin
@ 2025-02-20 2:10 ` Baolu Lu
2025-02-20 6:51 ` Tian, Kevin
0 siblings, 1 reply; 58+ messages in thread
From: Baolu Lu @ 2025-02-20 2:10 UTC (permalink / raw)
To: Tian, Kevin, Jason Gunthorpe
Cc: Zhangfei Gao, acpica-devel@lists.linux.dev, iommu@lists.linux.dev,
Joerg Roedel, kvm@vger.kernel.org, Len Brown,
linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
Lorenzo Pieralisi, Rafael J. Wysocki, Moore, Robert, Robin Murphy,
Sudeep Holla, Will Deacon, Alex Williamson, Donald Dutile,
Eric Auger, Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen,
patches@lists.linux.dev, Wysocki, Rafael J,
Shameerali Kolothum Thodi, Mostafa Saleh
On 2/19/25 16:34, Tian, Kevin wrote:
>> From: Baolu Lu <baolu.lu@linux.intel.com>
>> Sent: Wednesday, February 19, 2025 10:10 AM
>>
>> On 2/18/25 21:03, Jason Gunthorpe wrote:
>>> On Sat, Feb 15, 2025 at 05:53:13PM +0800, Baolu Lu wrote:
>>>> On 2/14/25 20:41, Jason Gunthorpe wrote:
>>>>> On Fri, Feb 14, 2025 at 01:39:52PM +0800, Baolu Lu wrote:
>>>>>
>>>>>> When the IOMMU is working in scalable mode, PASID and PRI are
>> supported.
>>>>>> ATS will always be enabled, even if the identity domain is attached to
>>>>>> the device, because the PASID might use PRI, which depends on ATS
>>>>>> functionality. This might not be the best choice, but it is the
>>>>>> simplest and functional.
>>>>> The arm driver keeps track of things and enables ATS when PASIDs are
>>>>> present
>>>> I am not aware of any VT-d hardware implementation that supports
>>>> scalable mode but not PASID. If there were one, it would be worthwhile
>>>> to add an optimization to avoid enabling ATS during probe if PASID is
>>>> not supported.
>>> I mean domains attached to PASIDs that need PRI/ATS/etc
>>
>> Yeah, that's a better solution. The PCI PRI/ATS features are only
>> enabled when a domain that requires them is attached to it. I will
>> consider it in the Intel driver later.
>>
>
> I didn't get the connection here. ATS can run w/o PASID per PCIe
> spec. Why do we want to add a dependency on PASID here?
It's due to PRI, which depends on ATS. The original topic is: when an
identity domain is attached to the device and the device has no PASID
support, then ATS might be disabled because ATS isn't supposed to
provide much benefit in this case. Otherwise, ATS should be enabled
because:
- It benefits performance when the domain is a paging domain.
- A domain attached to a PASID might use PRI, thus requiring ATS to be
on.
The proposed solution is to use a reference count for ATS enablement,
similar to how we handle iopf in another series. ATS is enabled as long
as any domain requires it and disabled if no domain requires it.
Hope it explains.
Thanks,
baolu
^ permalink raw reply [flat|nested] 58+ messages in thread
* RE: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2025-02-20 2:10 ` Baolu Lu
@ 2025-02-20 6:51 ` Tian, Kevin
2025-02-20 11:49 ` Baolu Lu
0 siblings, 1 reply; 58+ messages in thread
From: Tian, Kevin @ 2025-02-20 6:51 UTC (permalink / raw)
To: Baolu Lu, Jason Gunthorpe
Cc: Zhangfei Gao, acpica-devel@lists.linux.dev, iommu@lists.linux.dev,
Joerg Roedel, kvm@vger.kernel.org, Len Brown,
linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
Lorenzo Pieralisi, Rafael J. Wysocki, Moore, Robert, Robin Murphy,
Sudeep Holla, Will Deacon, Alex Williamson, Donald Dutile,
Eric Auger, Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen,
patches@lists.linux.dev, Wysocki, Rafael J,
Shameerali Kolothum Thodi, Mostafa Saleh
> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Thursday, February 20, 2025 10:11 AM
>
> On 2/19/25 16:34, Tian, Kevin wrote:
> >> From: Baolu Lu <baolu.lu@linux.intel.com>
> >> Sent: Wednesday, February 19, 2025 10:10 AM
> >>
> >> On 2/18/25 21:03, Jason Gunthorpe wrote:
> >>> On Sat, Feb 15, 2025 at 05:53:13PM +0800, Baolu Lu wrote:
> >>>> On 2/14/25 20:41, Jason Gunthorpe wrote:
> >>>>> On Fri, Feb 14, 2025 at 01:39:52PM +0800, Baolu Lu wrote:
> >>>>>
> >>>>>> When the IOMMU is working in scalable mode, PASID and PRI are
> >> supported.
> >>>>>> ATS will always be enabled, even if the identity domain is attached to
> >>>>>> the device, because the PASID might use PRI, which depends on ATS
> >>>>>> functionality. This might not be the best choice, but it is the
> >>>>>> simplest and functional.
> >>>>> The arm driver keeps track of things and enables ATS when PASIDs are
> >>>>> present
> >>>> I am not aware of any VT-d hardware implementation that supports
> >>>> scalable mode but not PASID. If there were one, it would be worthwhile
> >>>> to add an optimization to avoid enabling ATS during probe if PASID is
> >>>> not supported.
> >>> I mean domains attached to PASIDs that need PRI/ATS/etc
> >>
> >> Yeah, that's a better solution. The PCI PRI/ATS features are only
> >> enabled when a domain that requires them is attached to it. I will
> >> consider it in the Intel driver later.
> >>
> >
> > I didn't get the connection here. ATS can run w/o PASID per PCIe
> > spec. Why do we want to add a dependency on PASID here?
>
> It's due to PRI, which depends on ATS. The original topic is: when an
> identity domain is attached to the device and the device has no PASID
> support, then ATS might be disabled because ATS isn't supposed to
> provide much benefit in this case.
PRI depends on ATS but PASID is optional.
ATS has no benefit (or even more cost) with identity domain but again
it has nothing to do with PASID.
> Otherwise, ATS should be enabled because:
>
> - It benefits performance when the domain is a paging domain.
> - A domain attached to a PASID might use PRI, thus requiring ATS to be
> on.
Above talks about the domain type. Nothing specific to PASID.
>
> The proposed solution is to use a reference count for ATS enablement,
> similar to how we handle iopf in another series. ATS is enabled as long
> as any domain requires it and disabled if no domain requires it.
>
I'm fine with using reference count for ATS enablement based on
the domain type, but just didn't get the role of PASID in this discussion.
Probably we want an explicit hwpt flag to opt for ATS, just like the
existing faultable flag.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2025-02-20 6:51 ` Tian, Kevin
@ 2025-02-20 11:49 ` Baolu Lu
2025-02-21 4:28 ` Tian, Kevin
0 siblings, 1 reply; 58+ messages in thread
From: Baolu Lu @ 2025-02-20 11:49 UTC (permalink / raw)
To: Tian, Kevin, Jason Gunthorpe
Cc: baolu.lu, Zhangfei Gao, acpica-devel@lists.linux.dev,
iommu@lists.linux.dev, Joerg Roedel, kvm@vger.kernel.org,
Len Brown, linux-acpi@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, Lorenzo Pieralisi,
Rafael J. Wysocki, Moore, Robert, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen,
patches@lists.linux.dev, Wysocki, Rafael J,
Shameerali Kolothum Thodi, Mostafa Saleh
On 2025/2/20 14:51, Tian, Kevin wrote:
>> From: Baolu Lu<baolu.lu@linux.intel.com>
>> Sent: Thursday, February 20, 2025 10:11 AM
>>
>> On 2/19/25 16:34, Tian, Kevin wrote:
>>>> From: Baolu Lu<baolu.lu@linux.intel.com>
>>>> Sent: Wednesday, February 19, 2025 10:10 AM
>>>>
>>>> On 2/18/25 21:03, Jason Gunthorpe wrote:
>>>>> On Sat, Feb 15, 2025 at 05:53:13PM +0800, Baolu Lu wrote:
>>>>>> On 2/14/25 20:41, Jason Gunthorpe wrote:
>>>>>>> On Fri, Feb 14, 2025 at 01:39:52PM +0800, Baolu Lu wrote:
>>>>>>>
>>>>>>>> When the IOMMU is working in scalable mode, PASID and PRI are
>>>> supported.
>>>>>>>> ATS will always be enabled, even if the identity domain is attached to
>>>>>>>> the device, because the PASID might use PRI, which depends on ATS
>>>>>>>> functionality. This might not be the best choice, but it is the
>>>>>>>> simplest and functional.
>>>>>>> The arm driver keeps track of things and enables ATS when PASIDs are
>>>>>>> present
>>>>>> I am not aware of any VT-d hardware implementation that supports
>>>>>> scalable mode but not PASID. If there were one, it would be worthwhile
>>>>>> to add an optimization to avoid enabling ATS during probe if PASID is
>>>>>> not supported.
>>>>> I mean domains attached to PASIDs that need PRI/ATS/etc
>>>> Yeah, that's a better solution. The PCI PRI/ATS features are only
>>>> enabled when a domain that requires them is attached to it. I will
>>>> consider it in the Intel driver later.
>>>>
>>> I didn't get the connection here. ATS can run w/o PASID per PCIe
>>> spec. Why do we want to add a dependency on PASID here?
>> It's due to PRI, which depends on ATS. The original topic is: when an
>> identity domain is attached to the device and the device has no PASID
>> support, then ATS might be disabled because ATS isn't supposed to
>> provide much benefit in this case.
> PRI depends on ATS but PASID is optional.
>
> ATS has no benefit (or even more cost) with identity domain but again
> it has nothing to do with PASID.
>
>> Otherwise, ATS should be enabled because:
>>
>> - It benefits performance when the domain is a paging domain.
>> - A domain attached to a PASID might use PRI, thus requiring ATS to be
>> on.
> Above talks about the domain type. Nothing specific to PASID.
>
>> The proposed solution is to use a reference count for ATS enablement,
>> similar to how we handle iopf in another series. ATS is enabled as long
>> as any domain requires it and disabled if no domain requires it.
>>
> I'm fine with using reference count for ATS enablement based on
> the domain type, but just didn't get the role of PASID in this discussion.
Sorry that I didn't make it clear. Let me try again.
PASID is mentioned in this discussion because it makes things different.
Without PASID support, only a single domain is attached to the device.
ATS enablement can then be determined based on the domain type.
Specifically:
- For an identity domain, ATS could be disabled.
- For other domains, ATS is enabled.
With PASID support, multiple domains can be attached to the device, and
each domain may have different ATS requirements. Therefore, we cannot
simply determine the ATS status in the RID domain attach/detach paths. A
better solution is to use the reference count, as mentioned above.
Thanks,
baolu
^ permalink raw reply [flat|nested] 58+ messages in thread
* RE: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2025-02-20 11:49 ` Baolu Lu
@ 2025-02-21 4:28 ` Tian, Kevin
2025-02-21 13:04 ` Jason Gunthorpe
0 siblings, 1 reply; 58+ messages in thread
From: Tian, Kevin @ 2025-02-21 4:28 UTC (permalink / raw)
To: Baolu Lu, Jason Gunthorpe
Cc: Zhangfei Gao, acpica-devel@lists.linux.dev, iommu@lists.linux.dev,
Joerg Roedel, kvm@vger.kernel.org, Len Brown,
linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
Lorenzo Pieralisi, Rafael J. Wysocki, Moore, Robert, Robin Murphy,
Sudeep Holla, Will Deacon, Alex Williamson, Donald Dutile,
Eric Auger, Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen,
patches@lists.linux.dev, Wysocki, Rafael J,
Shameerali Kolothum Thodi, Mostafa Saleh
> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Thursday, February 20, 2025 7:49 PM
>
> On 2025/2/20 14:51, Tian, Kevin wrote:
> >> From: Baolu Lu<baolu.lu@linux.intel.com>
> >> Sent: Thursday, February 20, 2025 10:11 AM
> >>
> >> On 2/19/25 16:34, Tian, Kevin wrote:
> >>>> From: Baolu Lu<baolu.lu@linux.intel.com>
> >>>> Sent: Wednesday, February 19, 2025 10:10 AM
> >>>>
> >>>> On 2/18/25 21:03, Jason Gunthorpe wrote:
> >>>>> On Sat, Feb 15, 2025 at 05:53:13PM +0800, Baolu Lu wrote:
> >>>>>> On 2/14/25 20:41, Jason Gunthorpe wrote:
> >>>>>>> On Fri, Feb 14, 2025 at 01:39:52PM +0800, Baolu Lu wrote:
> >>>>>>>
> >>>>>>>> When the IOMMU is working in scalable mode, PASID and PRI are
> >>>> supported.
> >>>>>>>> ATS will always be enabled, even if the identity domain is attached
> to
> >>>>>>>> the device, because the PASID might use PRI, which depends on
> ATS
> >>>>>>>> functionality. This might not be the best choice, but it is the
> >>>>>>>> simplest and functional.
> >>>>>>> The arm driver keeps track of things and enables ATS when PASIDs
> are
> >>>>>>> present
> >>>>>> I am not aware of any VT-d hardware implementation that supports
> >>>>>> scalable mode but not PASID. If there were one, it would be
> worthwhile
> >>>>>> to add an optimization to avoid enabling ATS during probe if PASID is
> >>>>>> not supported.
> >>>>> I mean domains attached to PASIDs that need PRI/ATS/etc
> >>>> Yeah, that's a better solution. The PCI PRI/ATS features are only
> >>>> enabled when a domain that requires them is attached to it. I will
> >>>> consider it in the Intel driver later.
> >>>>
> >>> I didn't get the connection here. ATS can run w/o PASID per PCIe
> >>> spec. Why do we want to add a dependency on PASID here?
> >> It's due to PRI, which depends on ATS. The original topic is: when an
> >> identity domain is attached to the device and the device has no PASID
> >> support, then ATS might be disabled because ATS isn't supposed to
> >> provide much benefit in this case.
> > PRI depends on ATS but PASID is optional.
> >
> > ATS has no benefit (or even more cost) with identity domain but again
> > it has nothing to do with PASID.
> >
> >> Otherwise, ATS should be enabled because:
> >>
> >> - It benefits performance when the domain is a paging domain.
> >> - A domain attached to a PASID might use PRI, thus requiring ATS to be
> >> on.
> > Above talks about the domain type. Nothing specific to PASID.
> >
> >> The proposed solution is to use a reference count for ATS enablement,
> >> similar to how we handle iopf in another series. ATS is enabled as long
> >> as any domain requires it and disabled if no domain requires it.
> >>
> > I'm fine with using reference count for ATS enablement based on
> > the domain type, but just didn't get the role of PASID in this discussion.
>
> Sorry that I didn't make it clear. Let me try again.
>
> PASID is mentioned in this discussion because it makes things different.
>
> Without PASID support, only a single domain is attached to the device.
> ATS enablement can then be determined based on the domain type.
> Specifically:
>
> - For an identity domain, ATS could be disabled.
> - For other domains, ATS is enabled.
>
> With PASID support, multiple domains can be attached to the device, and
> each domain may have different ATS requirements. Therefore, we cannot
> simply determine the ATS status in the RID domain attach/detach paths. A
> better solution is to use the reference count, as mentioned above.
>
Okay, that helps connect the dots and makes sense to me. Thanks!
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2025-02-21 4:28 ` Tian, Kevin
@ 2025-02-21 13:04 ` Jason Gunthorpe
2025-02-22 7:13 ` Baolu Lu
0 siblings, 1 reply; 58+ messages in thread
From: Jason Gunthorpe @ 2025-02-21 13:04 UTC (permalink / raw)
To: Tian, Kevin
Cc: Baolu Lu, Zhangfei Gao, acpica-devel@lists.linux.dev,
iommu@lists.linux.dev, Joerg Roedel, kvm@vger.kernel.org,
Len Brown, linux-acpi@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, Lorenzo Pieralisi,
Rafael J. Wysocki, Moore, Robert, Robin Murphy, Sudeep Holla,
Will Deacon, Alex Williamson, Donald Dutile, Eric Auger,
Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen,
patches@lists.linux.dev, Wysocki, Rafael J,
Shameerali Kolothum Thodi, Mostafa Saleh
On Fri, Feb 21, 2025 at 04:28:50AM +0000, Tian, Kevin wrote:
> > With PASID support, multiple domains can be attached to the device, and
> > each domain may have different ATS requirements. Therefore, we cannot
> > simply determine the ATS status in the RID domain attach/detach paths. A
> > better solution is to use the reference count, as mentioned above.
> >
>
> Okay, that helps connect the dots and makes sense to me. Thanks!
I also have this general feeling that using ATS or not should be some
user policy (ie with sysfs or something) not just always automatic..
Right now on our devices there is a firmware config that hides the ATS
support from PCI config space and the general guidance is to only turn
it on in very specific situations.
Jason
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 00/12] Initial support for SMMUv3 nested translation
2025-02-21 13:04 ` Jason Gunthorpe
@ 2025-02-22 7:13 ` Baolu Lu
0 siblings, 0 replies; 58+ messages in thread
From: Baolu Lu @ 2025-02-22 7:13 UTC (permalink / raw)
To: Jason Gunthorpe, Tian, Kevin
Cc: Zhangfei Gao, acpica-devel@lists.linux.dev, iommu@lists.linux.dev,
Joerg Roedel, kvm@vger.kernel.org, Len Brown,
linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
Lorenzo Pieralisi, Rafael J. Wysocki, Moore, Robert, Robin Murphy,
Sudeep Holla, Will Deacon, Alex Williamson, Donald Dutile,
Eric Auger, Hanjun Guo, Jean-Philippe Brucker, Jerry Snitselaar,
Moritz Fischer, Michael Shavit, Nicolin Chen,
patches@lists.linux.dev, Wysocki, Rafael J,
Shameerali Kolothum Thodi, Mostafa Saleh
On 2/21/25 21:04, Jason Gunthorpe wrote:
> On Fri, Feb 21, 2025 at 04:28:50AM +0000, Tian, Kevin wrote:
>>> With PASID support, multiple domains can be attached to the device, and
>>> each domain may have different ATS requirements. Therefore, we cannot
>>> simply determine the ATS status in the RID domain attach/detach paths. A
>>> better solution is to use the reference count, as mentioned above.
>>>
>> Okay, that helps connect the dots and makes sense to me. Thanks!
> I also have this general feeling that using ATS or not should be some
> user policy (ie with sysfs or something) not just always automatic..
Agreed. ATS is inherently insecure because it allows a device to
directly access system memory using translated requests. A malicious
device could exploit this to compromise the system. Currently, Linux
prevents ATS from being enabled on devices with pci_dev->untrusted set.
But this seems insufficient, as only devices connected to external-
facing ports are currently marked as untrusted. It would be preferable
to allow the user to determine which devices are trusted and, therefore,
permitted to use ATS.
Some IOMMU architectures have introduced new features to enhance the
security of ATS, such as the host permission table in VT-d v5.0. This
could be an interesting topic when considering its implementation in the
Linux kernel.
> Right now on our devices there is a firmware config that hides the ATS
> support from PCI config space and the general guidance is to only turn
> it on in very specific situations.
Thanks,
baolu
^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~2025-02-22 7:18 UTC | newest]
Thread overview: 58+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-31 0:20 [PATCH v4 00/12] Initial support for SMMUv3 nested translation Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 01/12] vfio: Remove VFIO_TYPE1_NESTING_IOMMU Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 02/12] ACPICA: IORT: Update for revision E.f Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 03/12] ACPI/IORT: Support CANWBS memory access flag Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 04/12] iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info Jason Gunthorpe
2024-11-04 11:47 ` Will Deacon
2024-11-04 12:41 ` Jason Gunthorpe
2024-11-06 16:37 ` Robin Murphy
2024-11-06 18:05 ` Jason Gunthorpe
2024-11-06 21:05 ` Robin Murphy
2024-11-06 21:53 ` Nicolin Chen
2024-11-07 2:35 ` Jason Gunthorpe
2024-11-07 15:51 ` Jason Gunthorpe
2024-11-08 14:53 ` Will Deacon
2024-11-08 15:42 ` Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 06/12] iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 07/12] iommu/arm-smmu-v3: Expose the arm_smmu_attach interface Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 08/12] iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC Jason Gunthorpe
2024-11-29 14:38 ` Shameerali Kolothum Thodi
2024-11-29 15:06 ` Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 09/12] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED Jason Gunthorpe
2024-10-31 6:21 ` Zhangfei Gao
2024-11-04 17:19 ` Jason Gunthorpe
2024-11-06 5:39 ` Zhangfei Gao
2024-10-31 0:20 ` [PATCH v4 10/12] iommu/arm-smmu-v3: Use S2FWB for NESTED domains Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 11/12] iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED Jason Gunthorpe
2024-10-31 0:20 ` [PATCH v4 12/12] iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object Jason Gunthorpe
2024-10-31 0:53 ` [PATCH v4 00/12] Initial support for SMMUv3 nested translation Nicolin Chen
2024-11-01 12:18 ` Will Deacon
2024-11-01 13:25 ` Jason Gunthorpe
2024-11-05 16:48 ` Will Deacon
2024-11-05 17:03 ` Jason Gunthorpe
2024-11-08 14:56 ` Will Deacon
2024-11-12 18:29 ` Jason Gunthorpe
2024-11-13 1:01 ` Zhangfei Gao
2024-11-13 1:23 ` Jason Gunthorpe
2024-11-13 2:55 ` Baolu Lu
2024-11-13 6:28 ` Zhangfei Gao
2024-11-13 16:43 ` Jason Gunthorpe
2024-11-14 0:51 ` Baolu Lu
2024-11-15 17:55 ` Jason Gunthorpe
2025-01-22 19:26 ` Jason Gunthorpe
2025-02-05 3:45 ` Baolu Lu
2025-02-13 11:57 ` Baolu Lu
2025-02-13 18:43 ` Jason Gunthorpe
2025-02-14 5:39 ` Baolu Lu
2025-02-14 12:41 ` Jason Gunthorpe
2025-02-15 9:53 ` Baolu Lu
2025-02-18 13:03 ` Jason Gunthorpe
2025-02-19 2:09 ` Baolu Lu
2025-02-19 8:34 ` Tian, Kevin
2025-02-20 2:10 ` Baolu Lu
2025-02-20 6:51 ` Tian, Kevin
2025-02-20 11:49 ` Baolu Lu
2025-02-21 4:28 ` Tian, Kevin
2025-02-21 13:04 ` Jason Gunthorpe
2025-02-22 7:13 ` Baolu Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).