* [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate)
@ 2025-08-19 21:51 Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 01/28] KVM: arm64: Add a new function to donate memory with prot Mostafa Saleh
` (27 more replies)
0 siblings, 28 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
This is v4 of pKVM SMMUv3 support, this version is quite different from
the previous ones as it implements nested SMMUv3 using trap and emulate:
v1: Implements full fledged pv interface
https://lore.kernel.org/kvmarm/20230201125328.2186498-1-jean-philippe@linaro.org/
v2: Implements full fledged pv interface (+ more features as evtq and s1)
https://lore.kernel.org/kvmarm/20241212180423.1578358-1-smostafa@google.com/
v3: Only DMA isolation (using pv)
https://lore.kernel.org/kvmarm/20250728175316.3706196-1-smostafa@google.com/
Based on the feedback on v3, having a separate driver was too complicated
to maintain. So, the alternatives were either to integrate the KVM
implementation in the current driver and rely on impl ops, I have PoC for it:
https://android-kvm.googlesource.com/linux/+log/refs/heads/pkvm-smmu-implops-poc
Or just go for the final goal which is nested translation using trap and
emulate which is implemented in this series.
Other major changes, is that io-pgtable-arm is not split to a common file,
however kernel specific code was factored out (mostly memory allocation
and selftests) based on Robin feedback.
Design:
=======
Assumptions:
------------
As mentioned, this is a completely different approach which uses trapping
of the SMMUv3 MMIO space and emulating some of these accesses.
One of the important points, is that this doesn’t emulate the full SMMUv3
architecture, but only the parts used by Linux kernel, that’s why enablement
of this (ARM_SMMU_V3_PKVM) depends on (ARM_SMMU_V3=y) so we are sure of the
driver behaviour.
Any new change in the driver will likely trigger a WARN_ON ending up in panic.
Most notable assumptions:
- Changing of stream table format/size or l2 pointers is not allowed after
initialization.
- leaf=0 CFGI is not allowed
- CFGI_ALL with any value but 31 is not allowed
- Some commands which are not used are not allowed (ex CMD_TLBI_NH_ALL)
- Values set in ARM_SMMU_CR1 are hardcoded and don't change.
Emulation logic mainly targets:
1) Command Queue
----------------
At boot time, the hypervisor will allocate a shadow command queue (doesn’t need
to match the host size) which then sets up in HW, then it will trap access to
i) ARM_SMMU_CMDQ_BASE
That can only be written when the cmdq is disabled. Then on enable, the hypervisor
will put the host command queue in a shared state to avoid transition into the
hypervisor or VMs. It will be unshared with the cmdq is disabled
ii) ARM_SMMU_CMDQ_PROD
Trigger emulation code, where the hypervisor will copy the commands between cons and
prod, of the host queue and sanitise them (mostly WARNs if the host is malicious and
issuing commands it shouldn’t) then eagerly consume them, updating the host cons.
iii) ARM_SMMU_CMDQ_CONS
No much logic, just return the emulated cons + error bits.
2) Stream table
---------------
Similar to the command queue, the first level is allocated at boot with max possible
size, then the hypervisor will trap access to:
i) ARM_SMMU_STRTAB_BASE/ARM_SMMU_STRTAB_BASE_CFG: Keep track of the stream table to
put it in a shared state.
On CFGI_STE, the hypervisor will read the STE in scope from the host copy, shadow
L2 pointers if needed and attach stage-2.
3) GBPA
-------
The hypervisor will set GBPA to abort at boot, then any access to GBPA from the host
will return the value set by the host. The host only sets ABORT so that is fine.
Otherwise we can always return ABORT is set even if not which makes it look like HW
is not responding to updates.
Bisectibility:
==============
I wrote the patches where most of them are bisectable at run time (so we can run
with a prefix of the series till MMIO emulation, cmdq emulation, STE or full nested)
that was very helpful in debugging, and I kept like this to make debugging easier.
Constraints:
============
1) Discovery:
-------------
Only device tree is supported at the moment.
I don’t usually use ACPI, but I can look into adding that later. (not make this
series bigger)
2) Errata:
----------
Some HW with both stage-1 and stage-2 but can’t run nested translation due to some errata,
which makes the driver remove nesting for MMU_700, I believe this is too restrictive.
At the moment KVM will use nesting if advertised. (Or we need other mechanism to exclude
only the affected HW)
3) Shadow page table
--------------------
Uses page granularity (leaf) for memory, that’s because of the lack of split_block_unmap()
logic. I am currently looking into the possibility of sharing page tables,
if that turned complicated (as expected) it might be worth to re-add this logic
Boot flow:
==========
The hypervisor initialises at “module_init”.
Before that, at “core_initcall” the SMMUv3 KVM code will
- Register the hypervisor ops with the hypervisor
- Parse the device tree and populate an array with the SMMUs to the hypervisor.
At “module_init”, the hypervisor init will run, where the SMMU driver will:
- Take over the SMMus description
- Probe the SMMUs (from IDRs) I tried to make most of this code common using macros.
- Take over the SMMUs MMIO space so it will be trapped.
- Take over and set up the shadow command queue and stream table.
With “ARM_SMMU_V3_PKVM” enabled, the current SMMU driver will register at
“device_initcall_sync” so it can run after the kernel de-privileges and the
hypervisor is set up.
Future work
===========
1) Sharing page tables will be an interesting optimization, but requires dealing with
stage-2 page faults (which are handled by the kernel), BBM and possibly more complexity.
2) There is currently ongoing work to enable RPM, that will possibly enable/disable
the SMMU frequently, we might need some optimizations to avoid re-shadowing the
CMDQ/STE unnecessarily.
3) Look into ACPI support.
Patches overview
=================
The patches are split as follows:
Patches 01-03: Core hypervisor: Add donation for NC, dealing with MMIO,
and arch timer abstraction.
Patches 04-08: Refactoring of io-pgtable-arm and SMMUv3 driver
Patches 09-12: Hypervisor IOMMU core: IOMMU pagetable management, dabts…
Patches 13-28: KVM SMMUv3 code
Tested on Qemu(S1 only, S2 only and nested) and Morello board.
Also tested with PAGE_SIZE 4k,16k, and 64k.
A development branch can be found in:
https://android-kvm.googlesource.com/linux/+log/refs/heads/pkvm-smmu-v4
Jean-Philippe Brucker (1):
iommu/arm-smmu-v3-kvm: Add SMMUv3 driver
Mostafa Saleh (27):
KVM: arm64: Add a new function to donate memory with prot
KVM: arm64: Donate MMIO to the hypervisor
KVM: arm64: pkvm: Add pkvm_time_get()
iommu/io-pgtable-arm: Move selftests to a separate file
iommu/io-pgtable-arm: Factor kernel specific code out
iommu/arm-smmu-v3: Split code with hyp
iommu/arm-smmu-v3: Move TLB range invalidation into a macro
iommu/arm-smmu-v3: Move IDR parsing to common functions
KVM: arm64: iommu: Introduce IOMMU driver infrastructure
KVM: arm64: iommu: Shadow host stage-2 page table
KVM: arm64: iommu: Add memory pool
KVM: arm64: iommu: Support DABT for IOMMU
iommu/arm-smmu-v3: Add KVM mode in the driver
iommu/arm-smmu-v3: Load the driver later in KVM mode
iommu/arm-smmu-v3-kvm: Create array for hyp SMMUv3
iommu/arm-smmu-v3-kvm: Take over SMMUs
iommu/arm-smmu-v3-kvm: Probe SMMU HW
iommu/arm-smmu-v3-kvm: Add MMIO emulation
iommu/arm-smmu-v3-kvm: Shadow the command queue
iommu/arm-smmu-v3-kvm: Add CMDQ functions
iommu/arm-smmu-v3-kvm: Emulate CMDQ for host
iommu/arm-smmu-v3-kvm: Shadow stream table
iommu/arm-smmu-v3-kvm: Shadow STEs
iommu/arm-smmu-v3-kvm: Emulate GBPA
iommu/arm-smmu-v3-kvm: Support io-pgtable
iommu/arm-smmu-v3-kvm: Shadow the CPU stage-2 page table
iommu/arm-smmu-v3-kvm: Enable nesting
arch/arm64/include/asm/kvm_arm.h | 2 +
arch/arm64/include/asm/kvm_host.h | 7 +
arch/arm64/kvm/Makefile | 3 +-
arch/arm64/kvm/hyp/include/nvhe/iommu.h | 21 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 3 +
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 2 +
arch/arm64/kvm/hyp/nvhe/Makefile | 10 +-
arch/arm64/kvm/hyp/nvhe/iommu/iommu.c | 130 +++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 90 +-
arch/arm64/kvm/hyp/nvhe/setup.c | 17 +
arch/arm64/kvm/hyp/nvhe/timer-sr.c | 33 +
arch/arm64/kvm/hyp/pgtable.c | 9 +-
arch/arm64/kvm/iommu.c | 32 +
arch/arm64/kvm/pkvm.c | 1 +
drivers/iommu/Makefile | 2 +-
drivers/iommu/arm/Kconfig | 9 +
drivers/iommu/arm/arm-smmu-v3/Makefile | 3 +-
.../arm/arm-smmu-v3/arm-smmu-v3-common-hyp.c | 114 ++
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c | 158 +++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 342 +-----
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 220 ++++
.../iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c | 1036 +++++++++++++++++
.../iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h | 67 ++
.../arm/arm-smmu-v3/pkvm/io-pgtable-arm-hyp.c | 64 +
drivers/iommu/io-pgtable-arm-kernel.c | 305 +++++
drivers/iommu/io-pgtable-arm.c | 346 +-----
drivers/iommu/io-pgtable-arm.h | 66 ++
27 files changed, 2439 insertions(+), 653 deletions(-)
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/iommu.h
create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
create mode 100644 arch/arm64/kvm/iommu.c
create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common-hyp.c
create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
create mode 100644 drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
create mode 100644 drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
create mode 100644 drivers/iommu/arm/arm-smmu-v3/pkvm/io-pgtable-arm-hyp.c
create mode 100644 drivers/iommu/io-pgtable-arm-kernel.c
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v4 01/28] KVM: arm64: Add a new function to donate memory with prot
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 02/28] KVM: arm64: Donate MMIO to the hypervisor Mostafa Saleh
` (26 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
Soon, IOMMU drivers running in the hypervisor might interact with
non-coherent devices, so it needs a mechanism to map memory as
non cacheable.
Add ___pkvm_host_donate_hyp() which accepts a new argument for prot,
so the driver can add KVM_PGTABLE_PROT_NORMAL_NC.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 11 +++++++++--
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 5f9d56754e39..52d7ee91e18c 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -36,6 +36,7 @@ int __pkvm_prot_finalize(void);
int __pkvm_host_share_hyp(u64 pfn);
int __pkvm_host_unshare_hyp(u64 pfn);
int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
+int ___pkvm_host_donate_hyp(u64 pfn, u64 nr_pages, enum kvm_pgtable_prot prot);
int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 8957734d6183..861e448183fd 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -769,13 +769,15 @@ int __pkvm_host_unshare_hyp(u64 pfn)
return ret;
}
-int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages)
+int ___pkvm_host_donate_hyp(u64 pfn, u64 nr_pages, enum kvm_pgtable_prot prot)
{
u64 phys = hyp_pfn_to_phys(pfn);
u64 size = PAGE_SIZE * nr_pages;
void *virt = __hyp_va(phys);
int ret;
+ WARN_ON(prot & KVM_PGTABLE_PROT_X);
+
host_lock_component();
hyp_lock_component();
@@ -787,7 +789,7 @@ int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages)
goto unlock;
__hyp_set_page_state_range(phys, size, PKVM_PAGE_OWNED);
- WARN_ON(pkvm_create_mappings_locked(virt, virt + size, PAGE_HYP));
+ WARN_ON(pkvm_create_mappings_locked(virt, virt + size, prot));
WARN_ON(host_stage2_set_owner_locked(phys, size, PKVM_ID_HYP));
unlock:
@@ -797,6 +799,11 @@ int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages)
return ret;
}
+int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages)
+{
+ return ___pkvm_host_donate_hyp(pfn, nr_pages, PAGE_HYP);
+}
+
int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages)
{
u64 phys = hyp_pfn_to_phys(pfn);
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 02/28] KVM: arm64: Donate MMIO to the hypervisor
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 01/28] KVM: arm64: Add a new function to donate memory with prot Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 03/28] KVM: arm64: pkvm: Add pkvm_time_get() Mostafa Saleh
` (25 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
Add a function to donate MMIO to the hypervisor so IOMMU hypervisor
drivers can use that to protect the MMIO of IOMMU.
The initial attempt to implement this was to have a new flag to
"___pkvm_host_donate_hyp" to accept MMIO. However that had many problems,
it was quite intrusive for host/hyp to check/set page state to make it
aware of MMIO and to encode the state in the page table in that case.
Which is called in paths that can be sensitive to performance (FFA, VMs..)
As donating MMIO is very rare, and we don’t need to encode the full state,
it’s reasonable to have a separate function to do this.
It will init the host s2 page table with an invalid leaf with the owner ID
to prevent the host from mapping the page on faults.
Also, prevent kvm_pgtable_stage2_unmap() from removing owner ID from
stage-2 PTEs, as this can be triggered from recycle logic under memory
pressure. There is no code relying on this, as all ownership changes is
done via kvm_pgtable_stage2_set_owner()
For error path in IOMMU drivers, add a function to donate MMIO back
from hyp to host.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 64 +++++++++++++++++++
arch/arm64/kvm/hyp/pgtable.c | 9 +--
3 files changed, 68 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 52d7ee91e18c..98e173da0f9b 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -37,6 +37,8 @@ int __pkvm_host_share_hyp(u64 pfn);
int __pkvm_host_unshare_hyp(u64 pfn);
int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
int ___pkvm_host_donate_hyp(u64 pfn, u64 nr_pages, enum kvm_pgtable_prot prot);
+int __pkvm_host_donate_hyp_mmio(u64 pfn);
+int __pkvm_hyp_donate_host_mmio(u64 pfn);
int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 861e448183fd..c9a15ef6b18d 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -799,6 +799,70 @@ int ___pkvm_host_donate_hyp(u64 pfn, u64 nr_pages, enum kvm_pgtable_prot prot)
return ret;
}
+int __pkvm_host_donate_hyp_mmio(u64 pfn)
+{
+ u64 phys = hyp_pfn_to_phys(pfn);
+ void *virt = __hyp_va(phys);
+ int ret;
+ kvm_pte_t pte;
+
+ host_lock_component();
+ hyp_lock_component();
+
+ ret = kvm_pgtable_get_leaf(&host_mmu.pgt, phys, &pte, NULL);
+ if (ret)
+ goto unlock;
+
+ if (pte && !kvm_pte_valid(pte)) {
+ ret = -EPERM;
+ goto unlock;
+ }
+
+ ret = kvm_pgtable_get_leaf(&pkvm_pgtable, (u64)virt, &pte, NULL);
+ if (ret)
+ goto unlock;
+ if (pte) {
+ ret = -EBUSY;
+ goto unlock;
+ }
+
+ ret = pkvm_create_mappings_locked(virt, virt + PAGE_SIZE, PAGE_HYP_DEVICE);
+ if (ret)
+ goto unlock;
+ /*
+ * We set HYP as the owner of the MMIO pages in the host stage-2, for:
+ * - host aborts: host_stage2_adjust_range() would fail for invalid non zero PTEs.
+ * - recycle under memory pressure: host_stage2_unmap_dev_all() would call
+ * kvm_pgtable_stage2_unmap() which will not clear non zero invalid ptes (counted).
+ * - other MMIO donation: Would fail as we check that the PTE is valid or empty.
+ */
+ WARN_ON(host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt, phys,
+ PAGE_SIZE, &host_s2_pool, PKVM_ID_HYP));
+unlock:
+ hyp_unlock_component();
+ host_unlock_component();
+
+ return ret;
+}
+
+int __pkvm_hyp_donate_host_mmio(u64 pfn)
+{
+ u64 phys = hyp_pfn_to_phys(pfn);
+ u64 virt = (u64)__hyp_va(phys);
+ size_t size = PAGE_SIZE;
+
+ host_lock_component();
+ hyp_lock_component();
+
+ WARN_ON(kvm_pgtable_hyp_unmap(&pkvm_pgtable, virt, size) != size);
+ WARN_ON(host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt, phys,
+ PAGE_SIZE, &host_s2_pool, PKVM_ID_HOST));
+ hyp_unlock_component();
+ host_unlock_component();
+
+ return 0;
+}
+
int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages)
{
return ___pkvm_host_donate_hyp(pfn, nr_pages, PAGE_HYP);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index c351b4abd5db..ba06b0c21d5a 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1095,13 +1095,8 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
kvm_pte_t *childp = NULL;
bool need_flush = false;
- if (!kvm_pte_valid(ctx->old)) {
- if (stage2_pte_is_counted(ctx->old)) {
- kvm_clear_pte(ctx->ptep);
- mm_ops->put_page(ctx->ptep);
- }
- return 0;
- }
+ if (!kvm_pte_valid(ctx->old))
+ return stage2_pte_is_counted(ctx->old) ? -EPERM : 0;
if (kvm_pte_table(ctx->old, ctx->level)) {
childp = kvm_pte_follow(ctx->old, mm_ops);
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 03/28] KVM: arm64: pkvm: Add pkvm_time_get()
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 01/28] KVM: arm64: Add a new function to donate memory with prot Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 02/28] KVM: arm64: Donate MMIO to the hypervisor Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 04/28] iommu/io-pgtable-arm: Move selftests to a separate file Mostafa Saleh
` (24 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
Add a function to return time in us.
This can be used from IOMMU drivers while waiting for conditions as
for SMMUv3 TLB invalidation waiting for sync.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 2 ++
arch/arm64/kvm/hyp/nvhe/setup.c | 4 ++++
arch/arm64/kvm/hyp/nvhe/timer-sr.c | 33 ++++++++++++++++++++++++++
3 files changed, 39 insertions(+)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index ce31d3b73603..6c19691720cd 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -87,4 +87,6 @@ bool kvm_handle_pvm_restricted(struct kvm_vcpu *vcpu, u64 *exit_code);
void kvm_init_pvm_id_regs(struct kvm_vcpu *vcpu);
int kvm_check_pvm_sysreg_table(void);
+int pkvm_timer_init(void);
+u64 pkvm_time_get(void);
#endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index a48d3f5a5afb..ee6435473204 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -304,6 +304,10 @@ void __noreturn __pkvm_init_finalise(void)
};
pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;
+ ret = pkvm_timer_init();
+ if (ret)
+ goto out;
+
ret = fix_host_ownership();
if (ret)
goto out;
diff --git a/arch/arm64/kvm/hyp/nvhe/timer-sr.c b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
index ff176f4ce7de..e166cd5a56b8 100644
--- a/arch/arm64/kvm/hyp/nvhe/timer-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
@@ -11,6 +11,10 @@
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
+#include <nvhe/pkvm.h>
+
+static u32 timer_freq;
+
void __kvm_timer_set_cntvoff(u64 cntvoff)
{
write_sysreg(cntvoff, cntvoff_el2);
@@ -68,3 +72,32 @@ void __timer_enable_traps(struct kvm_vcpu *vcpu)
sysreg_clear_set(cnthctl_el2, clr, set);
}
+
+static u64 pkvm_ticks_get(void)
+{
+ return __arch_counter_get_cntvct();
+}
+
+#define SEC_TO_US 1000000
+
+int pkvm_timer_init(void)
+{
+ timer_freq = read_sysreg(cntfrq_el0);
+ /*
+ * TODO: The highest privileged level is supposed to initialize this
+ * register. But on some systems (which?), this information is only
+ * contained in the device-tree, so we'll need to find it out some other
+ * way.
+ */
+ if (!timer_freq || timer_freq < SEC_TO_US)
+ return -ENODEV;
+ return 0;
+}
+
+#define pkvm_time_ticks_to_us(ticks) ((u64)(ticks) * SEC_TO_US / timer_freq)
+
+/* Return time in us. */
+u64 pkvm_time_get(void)
+{
+ return pkvm_time_ticks_to_us(pkvm_ticks_get());
+}
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 04/28] iommu/io-pgtable-arm: Move selftests to a separate file
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (2 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 03/28] KVM: arm64: pkvm: Add pkvm_time_get() Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 05/28] iommu/io-pgtable-arm: Factor kernel specific code out Mostafa Saleh
` (23 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
Soon, io-pgtable-arm.c will be compiled as part of the KVM/arm64
in the hypervisor object, which doesn't have many of the kernel APIs,
as faux devices, printk...
We would need to factor this things outside of this file, this patch
moves the selftests outside, which remove many of the kernel
dependencies, which also is not needed by the hypervisor.
Create io-pgtable-arm-kernel.c for that, and in the next patch
the rest of the code is factored out.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
drivers/iommu/Makefile | 2 +-
drivers/iommu/io-pgtable-arm-kernel.c | 216 +++++++++++++++++++++++
drivers/iommu/io-pgtable-arm.c | 245 --------------------------
drivers/iommu/io-pgtable-arm.h | 41 +++++
4 files changed, 258 insertions(+), 246 deletions(-)
create mode 100644 drivers/iommu/io-pgtable-arm-kernel.c
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 355294fa9033..d601b0e25ef5 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -11,7 +11,7 @@ obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o
obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
-obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
+obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o io-pgtable-arm-kernel.o
obj-$(CONFIG_IOMMU_IO_PGTABLE_DART) += io-pgtable-dart.o
obj-$(CONFIG_IOMMU_IOVA) += iova.o
obj-$(CONFIG_OF_IOMMU) += of_iommu.o
diff --git a/drivers/iommu/io-pgtable-arm-kernel.c b/drivers/iommu/io-pgtable-arm-kernel.c
new file mode 100644
index 000000000000..f3b869310964
--- /dev/null
+++ b/drivers/iommu/io-pgtable-arm-kernel.c
@@ -0,0 +1,216 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * CPU-agnostic ARM page table allocator.
+ *
+ * Copyright (C) 2014 ARM Limited
+ *
+ * Author: Will Deacon <will.deacon@arm.com>
+ */
+#define pr_fmt(fmt) "arm-lpae io-pgtable: " fmt
+
+#include <linux/device/faux.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+
+#include "io-pgtable-arm.h"
+
+#ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST
+
+static struct io_pgtable_cfg *cfg_cookie __initdata;
+
+static void __init dummy_tlb_flush_all(void *cookie)
+{
+ WARN_ON(cookie != cfg_cookie);
+}
+
+static void __init dummy_tlb_flush(unsigned long iova, size_t size,
+ size_t granule, void *cookie)
+{
+ WARN_ON(cookie != cfg_cookie);
+ WARN_ON(!(size & cfg_cookie->pgsize_bitmap));
+}
+
+static void __init dummy_tlb_add_page(struct iommu_iotlb_gather *gather,
+ unsigned long iova, size_t granule,
+ void *cookie)
+{
+ dummy_tlb_flush(iova, granule, granule, cookie);
+}
+
+static const struct iommu_flush_ops dummy_tlb_ops __initconst = {
+ .tlb_flush_all = dummy_tlb_flush_all,
+ .tlb_flush_walk = dummy_tlb_flush,
+ .tlb_add_page = dummy_tlb_add_page,
+};
+
+static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
+{
+ struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+ struct io_pgtable_cfg *cfg = &data->iop.cfg;
+
+ pr_err("cfg: pgsize_bitmap 0x%lx, ias %u-bit\n",
+ cfg->pgsize_bitmap, cfg->ias);
+ pr_err("data: %d levels, 0x%zx pgd_size, %u pg_shift, %u bits_per_level, pgd @ %p\n",
+ ARM_LPAE_MAX_LEVELS - data->start_level, ARM_LPAE_PGD_SIZE(data),
+ ilog2(ARM_LPAE_GRANULE(data)), data->bits_per_level, data->pgd);
+}
+
+#define __FAIL(ops, i) ({ \
+ WARN(1, "selftest: test failed for fmt idx %d\n", (i)); \
+ arm_lpae_dump_ops(ops); \
+ -EFAULT; \
+})
+
+static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
+{
+ static const enum io_pgtable_fmt fmts[] __initconst = {
+ ARM_64_LPAE_S1,
+ ARM_64_LPAE_S2,
+ };
+
+ int i, j;
+ unsigned long iova;
+ size_t size, mapped;
+ struct io_pgtable_ops *ops;
+
+ for (i = 0; i < ARRAY_SIZE(fmts); ++i) {
+ cfg_cookie = cfg;
+ ops = alloc_io_pgtable_ops(fmts[i], cfg, cfg);
+ if (!ops) {
+ pr_err("selftest: failed to allocate io pgtable ops\n");
+ return -ENOMEM;
+ }
+
+ /*
+ * Initial sanity checks.
+ * Empty page tables shouldn't provide any translations.
+ */
+ if (ops->iova_to_phys(ops, 42))
+ return __FAIL(ops, i);
+
+ if (ops->iova_to_phys(ops, SZ_1G + 42))
+ return __FAIL(ops, i);
+
+ if (ops->iova_to_phys(ops, SZ_2G + 42))
+ return __FAIL(ops, i);
+
+ /*
+ * Distinct mappings of different granule sizes.
+ */
+ iova = 0;
+ for_each_set_bit(j, &cfg->pgsize_bitmap, BITS_PER_LONG) {
+ size = 1UL << j;
+
+ if (ops->map_pages(ops, iova, iova, size, 1,
+ IOMMU_READ | IOMMU_WRITE |
+ IOMMU_NOEXEC | IOMMU_CACHE,
+ GFP_KERNEL, &mapped))
+ return __FAIL(ops, i);
+
+ /* Overlapping mappings */
+ if (!ops->map_pages(ops, iova, iova + size, size, 1,
+ IOMMU_READ | IOMMU_NOEXEC,
+ GFP_KERNEL, &mapped))
+ return __FAIL(ops, i);
+
+ if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
+ return __FAIL(ops, i);
+
+ iova += SZ_1G;
+ }
+
+ /* Full unmap */
+ iova = 0;
+ for_each_set_bit(j, &cfg->pgsize_bitmap, BITS_PER_LONG) {
+ size = 1UL << j;
+
+ if (ops->unmap_pages(ops, iova, size, 1, NULL) != size)
+ return __FAIL(ops, i);
+
+ if (ops->iova_to_phys(ops, iova + 42))
+ return __FAIL(ops, i);
+
+ /* Remap full block */
+ if (ops->map_pages(ops, iova, iova, size, 1,
+ IOMMU_WRITE, GFP_KERNEL, &mapped))
+ return __FAIL(ops, i);
+
+ if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
+ return __FAIL(ops, i);
+
+ iova += SZ_1G;
+ }
+
+ /*
+ * Map/unmap the last largest supported page of the IAS, this can
+ * trigger corner cases in the concatednated page tables.
+ */
+ mapped = 0;
+ size = 1UL << __fls(cfg->pgsize_bitmap);
+ iova = (1UL << cfg->ias) - size;
+ if (ops->map_pages(ops, iova, iova, size, 1,
+ IOMMU_READ | IOMMU_WRITE |
+ IOMMU_NOEXEC | IOMMU_CACHE,
+ GFP_KERNEL, &mapped))
+ return __FAIL(ops, i);
+ if (mapped != size)
+ return __FAIL(ops, i);
+ if (ops->unmap_pages(ops, iova, size, 1, NULL) != size)
+ return __FAIL(ops, i);
+
+ free_io_pgtable_ops(ops);
+ }
+
+ return 0;
+}
+
+static int __init arm_lpae_do_selftests(void)
+{
+ static const unsigned long pgsize[] __initconst = {
+ SZ_4K | SZ_2M | SZ_1G,
+ SZ_16K | SZ_32M,
+ SZ_64K | SZ_512M,
+ };
+
+ static const unsigned int address_size[] __initconst = {
+ 32, 36, 40, 42, 44, 48,
+ };
+
+ int i, j, k, pass = 0, fail = 0;
+ struct faux_device *dev;
+ struct io_pgtable_cfg cfg = {
+ .tlb = &dummy_tlb_ops,
+ .coherent_walk = true,
+ .quirks = IO_PGTABLE_QUIRK_NO_WARN,
+ };
+
+ dev = faux_device_create("io-pgtable-test", NULL, 0);
+ if (!dev)
+ return -ENOMEM;
+
+ cfg.iommu_dev = &dev->dev;
+
+ for (i = 0; i < ARRAY_SIZE(pgsize); ++i) {
+ for (j = 0; j < ARRAY_SIZE(address_size); ++j) {
+ /* Don't use ias > oas as it is not valid for stage-2. */
+ for (k = 0; k <= j; ++k) {
+ cfg.pgsize_bitmap = pgsize[i];
+ cfg.ias = address_size[k];
+ cfg.oas = address_size[j];
+ pr_info("selftest: pgsize_bitmap 0x%08lx, IAS %u OAS %u\n",
+ pgsize[i], cfg.ias, cfg.oas);
+ if (arm_lpae_run_tests(&cfg))
+ fail++;
+ else
+ pass++;
+ }
+ }
+ }
+
+ pr_info("selftest: completed with %d PASS %d FAIL\n", pass, fail);
+ faux_device_destroy(dev);
+
+ return fail ? -EFAULT : 0;
+}
+subsys_initcall(arm_lpae_do_selftests);
+#endif
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 96425e92f313..791a2c4ecb83 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -7,15 +7,10 @@
* Author: Will Deacon <will.deacon@arm.com>
*/
-#define pr_fmt(fmt) "arm-lpae io-pgtable: " fmt
-
#include <linux/atomic.h>
#include <linux/bitops.h>
#include <linux/io-pgtable.h>
-#include <linux/kernel.h>
-#include <linux/device/faux.h>
#include <linux/sizes.h>
-#include <linux/slab.h>
#include <linux/types.h>
#include <linux/dma-mapping.h>
@@ -24,33 +19,6 @@
#include "io-pgtable-arm.h"
#include "iommu-pages.h"
-#define ARM_LPAE_MAX_ADDR_BITS 52
-#define ARM_LPAE_S2_MAX_CONCAT_PAGES 16
-#define ARM_LPAE_MAX_LEVELS 4
-
-/* Struct accessors */
-#define io_pgtable_to_data(x) \
- container_of((x), struct arm_lpae_io_pgtable, iop)
-
-#define io_pgtable_ops_to_data(x) \
- io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
-
-/*
- * Calculate the right shift amount to get to the portion describing level l
- * in a virtual address mapped by the pagetable in d.
- */
-#define ARM_LPAE_LVL_SHIFT(l,d) \
- (((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level) + \
- ilog2(sizeof(arm_lpae_iopte)))
-
-#define ARM_LPAE_GRANULE(d) \
- (sizeof(arm_lpae_iopte) << (d)->bits_per_level)
-#define ARM_LPAE_PGD_SIZE(d) \
- (sizeof(arm_lpae_iopte) << (d)->pgd_bits)
-
-#define ARM_LPAE_PTES_PER_TABLE(d) \
- (ARM_LPAE_GRANULE(d) >> ilog2(sizeof(arm_lpae_iopte)))
-
/*
* Calculate the index at level l used to map virtual address a using the
* pagetable in d.
@@ -163,18 +131,6 @@
#define iopte_set_writeable_clean(ptep) \
set_bit(ARM_LPAE_PTE_AP_RDONLY_BIT, (unsigned long *)(ptep))
-struct arm_lpae_io_pgtable {
- struct io_pgtable iop;
-
- int pgd_bits;
- int start_level;
- int bits_per_level;
-
- void *pgd;
-};
-
-typedef u64 arm_lpae_iopte;
-
static inline bool iopte_leaf(arm_lpae_iopte pte, int lvl,
enum io_pgtable_fmt fmt)
{
@@ -1274,204 +1230,3 @@ struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = {
.alloc = arm_mali_lpae_alloc_pgtable,
.free = arm_lpae_free_pgtable,
};
-
-#ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST
-
-static struct io_pgtable_cfg *cfg_cookie __initdata;
-
-static void __init dummy_tlb_flush_all(void *cookie)
-{
- WARN_ON(cookie != cfg_cookie);
-}
-
-static void __init dummy_tlb_flush(unsigned long iova, size_t size,
- size_t granule, void *cookie)
-{
- WARN_ON(cookie != cfg_cookie);
- WARN_ON(!(size & cfg_cookie->pgsize_bitmap));
-}
-
-static void __init dummy_tlb_add_page(struct iommu_iotlb_gather *gather,
- unsigned long iova, size_t granule,
- void *cookie)
-{
- dummy_tlb_flush(iova, granule, granule, cookie);
-}
-
-static const struct iommu_flush_ops dummy_tlb_ops __initconst = {
- .tlb_flush_all = dummy_tlb_flush_all,
- .tlb_flush_walk = dummy_tlb_flush,
- .tlb_add_page = dummy_tlb_add_page,
-};
-
-static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
-{
- struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
- struct io_pgtable_cfg *cfg = &data->iop.cfg;
-
- pr_err("cfg: pgsize_bitmap 0x%lx, ias %u-bit\n",
- cfg->pgsize_bitmap, cfg->ias);
- pr_err("data: %d levels, 0x%zx pgd_size, %u pg_shift, %u bits_per_level, pgd @ %p\n",
- ARM_LPAE_MAX_LEVELS - data->start_level, ARM_LPAE_PGD_SIZE(data),
- ilog2(ARM_LPAE_GRANULE(data)), data->bits_per_level, data->pgd);
-}
-
-#define __FAIL(ops, i) ({ \
- WARN(1, "selftest: test failed for fmt idx %d\n", (i)); \
- arm_lpae_dump_ops(ops); \
- -EFAULT; \
-})
-
-static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
-{
- static const enum io_pgtable_fmt fmts[] __initconst = {
- ARM_64_LPAE_S1,
- ARM_64_LPAE_S2,
- };
-
- int i, j;
- unsigned long iova;
- size_t size, mapped;
- struct io_pgtable_ops *ops;
-
- for (i = 0; i < ARRAY_SIZE(fmts); ++i) {
- cfg_cookie = cfg;
- ops = alloc_io_pgtable_ops(fmts[i], cfg, cfg);
- if (!ops) {
- pr_err("selftest: failed to allocate io pgtable ops\n");
- return -ENOMEM;
- }
-
- /*
- * Initial sanity checks.
- * Empty page tables shouldn't provide any translations.
- */
- if (ops->iova_to_phys(ops, 42))
- return __FAIL(ops, i);
-
- if (ops->iova_to_phys(ops, SZ_1G + 42))
- return __FAIL(ops, i);
-
- if (ops->iova_to_phys(ops, SZ_2G + 42))
- return __FAIL(ops, i);
-
- /*
- * Distinct mappings of different granule sizes.
- */
- iova = 0;
- for_each_set_bit(j, &cfg->pgsize_bitmap, BITS_PER_LONG) {
- size = 1UL << j;
-
- if (ops->map_pages(ops, iova, iova, size, 1,
- IOMMU_READ | IOMMU_WRITE |
- IOMMU_NOEXEC | IOMMU_CACHE,
- GFP_KERNEL, &mapped))
- return __FAIL(ops, i);
-
- /* Overlapping mappings */
- if (!ops->map_pages(ops, iova, iova + size, size, 1,
- IOMMU_READ | IOMMU_NOEXEC,
- GFP_KERNEL, &mapped))
- return __FAIL(ops, i);
-
- if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
- return __FAIL(ops, i);
-
- iova += SZ_1G;
- }
-
- /* Full unmap */
- iova = 0;
- for_each_set_bit(j, &cfg->pgsize_bitmap, BITS_PER_LONG) {
- size = 1UL << j;
-
- if (ops->unmap_pages(ops, iova, size, 1, NULL) != size)
- return __FAIL(ops, i);
-
- if (ops->iova_to_phys(ops, iova + 42))
- return __FAIL(ops, i);
-
- /* Remap full block */
- if (ops->map_pages(ops, iova, iova, size, 1,
- IOMMU_WRITE, GFP_KERNEL, &mapped))
- return __FAIL(ops, i);
-
- if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
- return __FAIL(ops, i);
-
- iova += SZ_1G;
- }
-
- /*
- * Map/unmap the last largest supported page of the IAS, this can
- * trigger corner cases in the concatednated page tables.
- */
- mapped = 0;
- size = 1UL << __fls(cfg->pgsize_bitmap);
- iova = (1UL << cfg->ias) - size;
- if (ops->map_pages(ops, iova, iova, size, 1,
- IOMMU_READ | IOMMU_WRITE |
- IOMMU_NOEXEC | IOMMU_CACHE,
- GFP_KERNEL, &mapped))
- return __FAIL(ops, i);
- if (mapped != size)
- return __FAIL(ops, i);
- if (ops->unmap_pages(ops, iova, size, 1, NULL) != size)
- return __FAIL(ops, i);
-
- free_io_pgtable_ops(ops);
- }
-
- return 0;
-}
-
-static int __init arm_lpae_do_selftests(void)
-{
- static const unsigned long pgsize[] __initconst = {
- SZ_4K | SZ_2M | SZ_1G,
- SZ_16K | SZ_32M,
- SZ_64K | SZ_512M,
- };
-
- static const unsigned int address_size[] __initconst = {
- 32, 36, 40, 42, 44, 48,
- };
-
- int i, j, k, pass = 0, fail = 0;
- struct faux_device *dev;
- struct io_pgtable_cfg cfg = {
- .tlb = &dummy_tlb_ops,
- .coherent_walk = true,
- .quirks = IO_PGTABLE_QUIRK_NO_WARN,
- };
-
- dev = faux_device_create("io-pgtable-test", NULL, 0);
- if (!dev)
- return -ENOMEM;
-
- cfg.iommu_dev = &dev->dev;
-
- for (i = 0; i < ARRAY_SIZE(pgsize); ++i) {
- for (j = 0; j < ARRAY_SIZE(address_size); ++j) {
- /* Don't use ias > oas as it is not valid for stage-2. */
- for (k = 0; k <= j; ++k) {
- cfg.pgsize_bitmap = pgsize[i];
- cfg.ias = address_size[k];
- cfg.oas = address_size[j];
- pr_info("selftest: pgsize_bitmap 0x%08lx, IAS %u OAS %u\n",
- pgsize[i], cfg.ias, cfg.oas);
- if (arm_lpae_run_tests(&cfg))
- fail++;
- else
- pass++;
- }
- }
- }
-
- pr_info("selftest: completed with %d PASS %d FAIL\n", pass, fail);
- faux_device_destroy(dev);
-
- return fail ? -EFAULT : 0;
-}
-subsys_initcall(arm_lpae_do_selftests);
-#endif
diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
index ba7cfdf7afa0..a06a23543cff 100644
--- a/drivers/iommu/io-pgtable-arm.h
+++ b/drivers/iommu/io-pgtable-arm.h
@@ -2,6 +2,8 @@
#ifndef IO_PGTABLE_ARM_H_
#define IO_PGTABLE_ARM_H_
+#include <linux/io-pgtable.h>
+
#define ARM_LPAE_TCR_TG0_4K 0
#define ARM_LPAE_TCR_TG0_64K 1
#define ARM_LPAE_TCR_TG0_16K 2
@@ -27,4 +29,43 @@
#define ARM_LPAE_TCR_PS_48_BIT 0x5ULL
#define ARM_LPAE_TCR_PS_52_BIT 0x6ULL
+/* Struct accessors */
+#define io_pgtable_to_data(x) \
+ container_of((x), struct arm_lpae_io_pgtable, iop)
+
+#define io_pgtable_ops_to_data(x) \
+ io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+
+struct arm_lpae_io_pgtable {
+ struct io_pgtable iop;
+
+ int pgd_bits;
+ int start_level;
+ int bits_per_level;
+
+ void *pgd;
+};
+
+#define ARM_LPAE_MAX_ADDR_BITS 52
+#define ARM_LPAE_S2_MAX_CONCAT_PAGES 16
+#define ARM_LPAE_MAX_LEVELS 4
+
+/*
+ * Calculate the right shift amount to get to the portion describing level l
+ * in a virtual address mapped by the pagetable in d.
+ */
+#define ARM_LPAE_LVL_SHIFT(l,d) \
+ (((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level) + \
+ ilog2(sizeof(arm_lpae_iopte)))
+
+#define ARM_LPAE_GRANULE(d) \
+ (sizeof(arm_lpae_iopte) << (d)->bits_per_level)
+#define ARM_LPAE_PGD_SIZE(d) \
+ (sizeof(arm_lpae_iopte) << (d)->pgd_bits)
+
+#define ARM_LPAE_PTES_PER_TABLE(d) \
+ (ARM_LPAE_GRANULE(d) >> ilog2(sizeof(arm_lpae_iopte)))
+
+typedef u64 arm_lpae_iopte;
+
#endif /* IO_PGTABLE_ARM_H_ */
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 05/28] iommu/io-pgtable-arm: Factor kernel specific code out
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (3 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 04/28] iommu/io-pgtable-arm: Move selftests to a separate file Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 06/28] iommu/arm-smmu-v3: Split code with hyp Mostafa Saleh
` (22 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
Some of the currently used APIs are only part of the kernel and not
available in the hypervisor, factor those out of the common file:
- alloc/free memory
- CMOs
- virt/phys conversions
Which is implemented by the kernel in io-pgtable-arm-kernel.c and
similarly for the hypervisor later in this series.
va/pa conversion kept as macros.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
drivers/iommu/io-pgtable-arm-kernel.c | 89 ++++++++++++++++++++++++
drivers/iommu/io-pgtable-arm.c | 99 +++------------------------
drivers/iommu/io-pgtable-arm.h | 14 ++++
3 files changed, 113 insertions(+), 89 deletions(-)
diff --git a/drivers/iommu/io-pgtable-arm-kernel.c b/drivers/iommu/io-pgtable-arm-kernel.c
index f3b869310964..d3056487b0f6 100644
--- a/drivers/iommu/io-pgtable-arm-kernel.c
+++ b/drivers/iommu/io-pgtable-arm-kernel.c
@@ -9,10 +9,99 @@
#define pr_fmt(fmt) "arm-lpae io-pgtable: " fmt
#include <linux/device/faux.h>
+#include <linux/dma-mapping.h>
#include <linux/kernel.h>
#include <linux/slab.h>
#include "io-pgtable-arm.h"
+#include "iommu-pages.h"
+
+static dma_addr_t __arm_lpae_dma_addr(void *pages)
+{
+ return (dma_addr_t)virt_to_phys(pages);
+}
+
+void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
+ struct io_pgtable_cfg *cfg,
+ void *cookie)
+{
+ struct device *dev = cfg->iommu_dev;
+ size_t alloc_size;
+ dma_addr_t dma;
+ void *pages;
+
+ /*
+ * For very small starting-level translation tables the HW requires a
+ * minimum alignment of at least 64 to cover all cases.
+ */
+ alloc_size = max(size, 64);
+ if (cfg->alloc)
+ pages = cfg->alloc(cookie, alloc_size, gfp);
+ else
+ pages = iommu_alloc_pages_node_sz(dev_to_node(dev), gfp,
+ alloc_size);
+
+ if (!pages)
+ return NULL;
+
+ if (!cfg->coherent_walk) {
+ dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE);
+ if (dma_mapping_error(dev, dma))
+ goto out_free;
+ /*
+ * We depend on the IOMMU being able to work with any physical
+ * address directly, so if the DMA layer suggests otherwise by
+ * translating or truncating them, that bodes very badly...
+ */
+ if (dma != virt_to_phys(pages))
+ goto out_unmap;
+ }
+
+ return pages;
+
+out_unmap:
+ dev_err(dev, "Cannot accommodate DMA translation for IOMMU page tables\n");
+ dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
+
+out_free:
+ if (cfg->free)
+ cfg->free(cookie, pages, size);
+ else
+ iommu_free_pages(pages);
+
+ return NULL;
+}
+
+void __arm_lpae_free_pages(void *pages, size_t size,
+ struct io_pgtable_cfg *cfg,
+ void *cookie)
+{
+ if (!cfg->coherent_walk)
+ dma_unmap_single(cfg->iommu_dev, __arm_lpae_dma_addr(pages),
+ size, DMA_TO_DEVICE);
+
+ if (cfg->free)
+ cfg->free(cookie, pages, size);
+ else
+ iommu_free_pages(pages);
+}
+
+void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
+ struct io_pgtable_cfg *cfg)
+{
+ dma_sync_single_for_device(cfg->iommu_dev, __arm_lpae_dma_addr(ptep),
+ sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
+}
+
+void *__arm_lpae_alloc_data(size_t size, gfp_t gfp)
+{
+ return kmalloc(size, gfp);
+}
+
+void __arm_lpae_free_data(void *p)
+{
+ return kfree(p);
+}
#ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 791a2c4ecb83..2ca09081c3b0 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -12,12 +12,10 @@
#include <linux/io-pgtable.h>
#include <linux/sizes.h>
#include <linux/types.h>
-#include <linux/dma-mapping.h>
#include <asm/barrier.h>
#include "io-pgtable-arm.h"
-#include "iommu-pages.h"
/*
* Calculate the index at level l used to map virtual address a using the
@@ -118,7 +116,7 @@
#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x8DULL
/* IOPTE accessors */
-#define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
+#define iopte_deref(pte,d) __arm_lpae_phys_to_virt(iopte_to_paddr(pte, d))
#define iopte_type(pte) \
(((pte) >> ARM_LPAE_PTE_TYPE_SHIFT) & ARM_LPAE_PTE_TYPE_MASK)
@@ -208,83 +206,6 @@ static inline bool arm_lpae_concat_mandatory(struct io_pgtable_cfg *cfg,
(data->start_level == 1) && (oas == 40);
}
-static dma_addr_t __arm_lpae_dma_addr(void *pages)
-{
- return (dma_addr_t)virt_to_phys(pages);
-}
-
-static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
- struct io_pgtable_cfg *cfg,
- void *cookie)
-{
- struct device *dev = cfg->iommu_dev;
- size_t alloc_size;
- dma_addr_t dma;
- void *pages;
-
- /*
- * For very small starting-level translation tables the HW requires a
- * minimum alignment of at least 64 to cover all cases.
- */
- alloc_size = max(size, 64);
- if (cfg->alloc)
- pages = cfg->alloc(cookie, alloc_size, gfp);
- else
- pages = iommu_alloc_pages_node_sz(dev_to_node(dev), gfp,
- alloc_size);
-
- if (!pages)
- return NULL;
-
- if (!cfg->coherent_walk) {
- dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE);
- if (dma_mapping_error(dev, dma))
- goto out_free;
- /*
- * We depend on the IOMMU being able to work with any physical
- * address directly, so if the DMA layer suggests otherwise by
- * translating or truncating them, that bodes very badly...
- */
- if (dma != virt_to_phys(pages))
- goto out_unmap;
- }
-
- return pages;
-
-out_unmap:
- dev_err(dev, "Cannot accommodate DMA translation for IOMMU page tables\n");
- dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
-
-out_free:
- if (cfg->free)
- cfg->free(cookie, pages, size);
- else
- iommu_free_pages(pages);
-
- return NULL;
-}
-
-static void __arm_lpae_free_pages(void *pages, size_t size,
- struct io_pgtable_cfg *cfg,
- void *cookie)
-{
- if (!cfg->coherent_walk)
- dma_unmap_single(cfg->iommu_dev, __arm_lpae_dma_addr(pages),
- size, DMA_TO_DEVICE);
-
- if (cfg->free)
- cfg->free(cookie, pages, size);
- else
- iommu_free_pages(pages);
-}
-
-static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
- struct io_pgtable_cfg *cfg)
-{
- dma_sync_single_for_device(cfg->iommu_dev, __arm_lpae_dma_addr(ptep),
- sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
-}
-
static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg, int num_entries)
{
for (int i = 0; i < num_entries; i++)
@@ -360,7 +281,7 @@ static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
arm_lpae_iopte old, new;
struct io_pgtable_cfg *cfg = &data->iop.cfg;
- new = paddr_to_iopte(__pa(table), data) | ARM_LPAE_PTE_TYPE_TABLE;
+ new = paddr_to_iopte(__arm_lpae_virt_to_phys(table), data) | ARM_LPAE_PTE_TYPE_TABLE;
if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
new |= ARM_LPAE_PTE_NSTABLE;
@@ -581,7 +502,7 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
__arm_lpae_free_pgtable(data, data->start_level, data->pgd);
- kfree(data);
+ __arm_lpae_free_data(data);
}
static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
@@ -895,7 +816,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
if (cfg->oas > ARM_LPAE_MAX_ADDR_BITS)
return NULL;
- data = kmalloc(sizeof(*data), GFP_KERNEL);
+ data = __arm_lpae_alloc_data(sizeof(*data), GFP_KERNEL);
if (!data)
return NULL;
@@ -1018,11 +939,11 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
wmb();
/* TTBR */
- cfg->arm_lpae_s1_cfg.ttbr = virt_to_phys(data->pgd);
+ cfg->arm_lpae_s1_cfg.ttbr = __arm_lpae_virt_to_phys(data->pgd);
return &data->iop;
out_free_data:
- kfree(data);
+ __arm_lpae_free_data(data);
return NULL;
}
@@ -1114,11 +1035,11 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
wmb();
/* VTTBR */
- cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd);
+ cfg->arm_lpae_s2_cfg.vttbr = __arm_lpae_virt_to_phys(data->pgd);
return &data->iop;
out_free_data:
- kfree(data);
+ __arm_lpae_free_data(data);
return NULL;
}
@@ -1188,7 +1109,7 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
/* Ensure the empty pgd is visible before TRANSTAB can be written */
wmb();
- cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) |
+ cfg->arm_mali_lpae_cfg.transtab = __arm_lpae_virt_to_phys(data->pgd) |
ARM_MALI_LPAE_TTBR_READ_INNER |
ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
if (cfg->coherent_walk)
@@ -1197,7 +1118,7 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
return &data->iop;
out_free_data:
- kfree(data);
+ __arm_lpae_free_data(data);
return NULL;
}
diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
index a06a23543cff..7d9f0b759275 100644
--- a/drivers/iommu/io-pgtable-arm.h
+++ b/drivers/iommu/io-pgtable-arm.h
@@ -68,4 +68,18 @@ struct arm_lpae_io_pgtable {
typedef u64 arm_lpae_iopte;
+void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
+ struct io_pgtable_cfg *cfg);
+void __arm_lpae_free_pages(void *pages, size_t size,
+ struct io_pgtable_cfg *cfg,
+ void *cookie);
+void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
+ struct io_pgtable_cfg *cfg,
+ void *cookie);
+void *__arm_lpae_alloc_data(size_t size, gfp_t gfp);
+void __arm_lpae_free_data(void *p);
+#ifndef __KVM_NVHE_HYPERVISOR__
+#define __arm_lpae_virt_to_phys __pa
+#define __arm_lpae_phys_to_virt __va
+#endif /* !__KVM_NVHE_HYPERVISOR__ */
#endif /* IO_PGTABLE_ARM_H_ */
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 06/28] iommu/arm-smmu-v3: Split code with hyp
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (4 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 05/28] iommu/io-pgtable-arm: Factor kernel specific code out Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 07/28] iommu/arm-smmu-v3: Move TLB range invalidation into a macro Mostafa Saleh
` (21 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
The KVM SMMUv3 driver would re-use some of the cmdq code inside
the hypervisor, move these functions to a new common c file that
is shared between the host kernel and the hypervisor.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
drivers/iommu/arm/arm-smmu-v3/Makefile | 2 +-
.../arm/arm-smmu-v3/arm-smmu-v3-common-hyp.c | 114 ++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 146 ------------------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 44 ++++++
4 files changed, 159 insertions(+), 147 deletions(-)
create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common-hyp.c
diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index 493a659cc66b..1918b4a64cb0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
-arm_smmu_v3-y := arm-smmu-v3.o
+arm_smmu_v3-y := arm-smmu-v3.o arm-smmu-v3-common-hyp.o
arm_smmu_v3-$(CONFIG_ARM_SMMU_V3_IOMMUFD) += arm-smmu-v3-iommufd.o
arm_smmu_v3-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
arm_smmu_v3-$(CONFIG_TEGRA241_CMDQV) += tegra241-cmdqv.o
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common-hyp.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common-hyp.c
new file mode 100644
index 000000000000..62744c8548a8
--- /dev/null
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common-hyp.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2015 ARM Limited
+ *
+ * Author: Will Deacon <will.deacon@arm.com>
+ * Arm SMMUv3 driver functions shared with hypervisor.
+ */
+
+#include "arm-smmu-v3.h"
+#include <asm-generic/errno-base.h>
+
+#include <linux/string.h>
+
+int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
+{
+ memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
+ cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
+
+ switch (ent->opcode) {
+ case CMDQ_OP_TLBI_EL2_ALL:
+ case CMDQ_OP_TLBI_NSNH_ALL:
+ break;
+ case CMDQ_OP_PREFETCH_CFG:
+ cmd[0] |= FIELD_PREP(CMDQ_PREFETCH_0_SID, ent->prefetch.sid);
+ break;
+ case CMDQ_OP_CFGI_CD:
+ cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SSID, ent->cfgi.ssid);
+ fallthrough;
+ case CMDQ_OP_CFGI_STE:
+ cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
+ cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_LEAF, ent->cfgi.leaf);
+ break;
+ case CMDQ_OP_CFGI_CD_ALL:
+ cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
+ break;
+ case CMDQ_OP_CFGI_ALL:
+ /* Cover the entire SID range */
+ cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
+ break;
+ case CMDQ_OP_TLBI_NH_VA:
+ cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
+ fallthrough;
+ case CMDQ_OP_TLBI_EL2_VA:
+ cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_NUM, ent->tlbi.num);
+ cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_SCALE, ent->tlbi.scale);
+ cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
+ cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
+ cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TTL, ent->tlbi.ttl);
+ cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TG, ent->tlbi.tg);
+ cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
+ break;
+ case CMDQ_OP_TLBI_S2_IPA:
+ cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_NUM, ent->tlbi.num);
+ cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_SCALE, ent->tlbi.scale);
+ cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
+ cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
+ cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TTL, ent->tlbi.ttl);
+ cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TG, ent->tlbi.tg);
+ cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_IPA_MASK;
+ break;
+ case CMDQ_OP_TLBI_NH_ASID:
+ cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
+ fallthrough;
+ case CMDQ_OP_TLBI_NH_ALL:
+ case CMDQ_OP_TLBI_S12_VMALL:
+ cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
+ break;
+ case CMDQ_OP_TLBI_EL2_ASID:
+ cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
+ break;
+ case CMDQ_OP_ATC_INV:
+ cmd[0] |= FIELD_PREP(CMDQ_0_SSV, ent->substream_valid);
+ cmd[0] |= FIELD_PREP(CMDQ_ATC_0_GLOBAL, ent->atc.global);
+ cmd[0] |= FIELD_PREP(CMDQ_ATC_0_SSID, ent->atc.ssid);
+ cmd[0] |= FIELD_PREP(CMDQ_ATC_0_SID, ent->atc.sid);
+ cmd[1] |= FIELD_PREP(CMDQ_ATC_1_SIZE, ent->atc.size);
+ cmd[1] |= ent->atc.addr & CMDQ_ATC_1_ADDR_MASK;
+ break;
+ case CMDQ_OP_PRI_RESP:
+ cmd[0] |= FIELD_PREP(CMDQ_0_SSV, ent->substream_valid);
+ cmd[0] |= FIELD_PREP(CMDQ_PRI_0_SSID, ent->pri.ssid);
+ cmd[0] |= FIELD_PREP(CMDQ_PRI_0_SID, ent->pri.sid);
+ cmd[1] |= FIELD_PREP(CMDQ_PRI_1_GRPID, ent->pri.grpid);
+ switch (ent->pri.resp) {
+ case PRI_RESP_DENY:
+ case PRI_RESP_FAIL:
+ case PRI_RESP_SUCC:
+ break;
+ default:
+ return -EINVAL;
+ }
+ cmd[1] |= FIELD_PREP(CMDQ_PRI_1_RESP, ent->pri.resp);
+ break;
+ case CMDQ_OP_RESUME:
+ cmd[0] |= FIELD_PREP(CMDQ_RESUME_0_SID, ent->resume.sid);
+ cmd[0] |= FIELD_PREP(CMDQ_RESUME_0_RESP, ent->resume.resp);
+ cmd[1] |= FIELD_PREP(CMDQ_RESUME_1_STAG, ent->resume.stag);
+ break;
+ case CMDQ_OP_CMD_SYNC:
+ if (ent->sync.msiaddr) {
+ cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_IRQ);
+ cmd[1] |= ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK;
+ } else {
+ cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_SEV);
+ }
+ cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSH, ARM_SMMU_SH_ISH);
+ cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIATTR, ARM_SMMU_MEMATTR_OIWB);
+ break;
+ default:
+ return -ENOENT;
+ }
+
+ return 0;
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 10cc6dc26b7b..1f765b4e36fa 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -138,18 +138,6 @@ static bool queue_has_space(struct arm_smmu_ll_queue *q, u32 n)
return space >= n;
}
-static bool queue_full(struct arm_smmu_ll_queue *q)
-{
- return Q_IDX(q, q->prod) == Q_IDX(q, q->cons) &&
- Q_WRP(q, q->prod) != Q_WRP(q, q->cons);
-}
-
-static bool queue_empty(struct arm_smmu_ll_queue *q)
-{
- return Q_IDX(q, q->prod) == Q_IDX(q, q->cons) &&
- Q_WRP(q, q->prod) == Q_WRP(q, q->cons);
-}
-
static bool queue_consumed(struct arm_smmu_ll_queue *q, u32 prod)
{
return ((Q_WRP(q, q->cons) == Q_WRP(q, prod)) &&
@@ -168,12 +156,6 @@ static void queue_sync_cons_out(struct arm_smmu_queue *q)
writel_relaxed(q->llq.cons, q->cons_reg);
}
-static void queue_inc_cons(struct arm_smmu_ll_queue *q)
-{
- u32 cons = (Q_WRP(q, q->cons) | Q_IDX(q, q->cons)) + 1;
- q->cons = Q_OVF(q->cons) | Q_WRP(q, cons) | Q_IDX(q, cons);
-}
-
static void queue_sync_cons_ovf(struct arm_smmu_queue *q)
{
struct arm_smmu_ll_queue *llq = &q->llq;
@@ -205,12 +187,6 @@ static int queue_sync_prod_in(struct arm_smmu_queue *q)
return ret;
}
-static u32 queue_inc_prod_n(struct arm_smmu_ll_queue *q, int n)
-{
- u32 prod = (Q_WRP(q, q->prod) | Q_IDX(q, q->prod)) + n;
- return Q_OVF(q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod);
-}
-
static void queue_poll_init(struct arm_smmu_device *smmu,
struct arm_smmu_queue_poll *qp)
{
@@ -238,14 +214,6 @@ static int queue_poll(struct arm_smmu_queue_poll *qp)
return 0;
}
-static void queue_write(__le64 *dst, u64 *src, size_t n_dwords)
-{
- int i;
-
- for (i = 0; i < n_dwords; ++i)
- *dst++ = cpu_to_le64(*src++);
-}
-
static void queue_read(u64 *dst, __le64 *src, size_t n_dwords)
{
int i;
@@ -266,108 +234,6 @@ static int queue_remove_raw(struct arm_smmu_queue *q, u64 *ent)
}
/* High-level queue accessors */
-static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
-{
- memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
- cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
-
- switch (ent->opcode) {
- case CMDQ_OP_TLBI_EL2_ALL:
- case CMDQ_OP_TLBI_NSNH_ALL:
- break;
- case CMDQ_OP_PREFETCH_CFG:
- cmd[0] |= FIELD_PREP(CMDQ_PREFETCH_0_SID, ent->prefetch.sid);
- break;
- case CMDQ_OP_CFGI_CD:
- cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SSID, ent->cfgi.ssid);
- fallthrough;
- case CMDQ_OP_CFGI_STE:
- cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
- cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_LEAF, ent->cfgi.leaf);
- break;
- case CMDQ_OP_CFGI_CD_ALL:
- cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
- break;
- case CMDQ_OP_CFGI_ALL:
- /* Cover the entire SID range */
- cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
- break;
- case CMDQ_OP_TLBI_NH_VA:
- cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
- fallthrough;
- case CMDQ_OP_TLBI_EL2_VA:
- cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_NUM, ent->tlbi.num);
- cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_SCALE, ent->tlbi.scale);
- cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
- cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
- cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TTL, ent->tlbi.ttl);
- cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TG, ent->tlbi.tg);
- cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
- break;
- case CMDQ_OP_TLBI_S2_IPA:
- cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_NUM, ent->tlbi.num);
- cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_SCALE, ent->tlbi.scale);
- cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
- cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
- cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TTL, ent->tlbi.ttl);
- cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TG, ent->tlbi.tg);
- cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_IPA_MASK;
- break;
- case CMDQ_OP_TLBI_NH_ASID:
- cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
- fallthrough;
- case CMDQ_OP_TLBI_NH_ALL:
- case CMDQ_OP_TLBI_S12_VMALL:
- cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
- break;
- case CMDQ_OP_TLBI_EL2_ASID:
- cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
- break;
- case CMDQ_OP_ATC_INV:
- cmd[0] |= FIELD_PREP(CMDQ_0_SSV, ent->substream_valid);
- cmd[0] |= FIELD_PREP(CMDQ_ATC_0_GLOBAL, ent->atc.global);
- cmd[0] |= FIELD_PREP(CMDQ_ATC_0_SSID, ent->atc.ssid);
- cmd[0] |= FIELD_PREP(CMDQ_ATC_0_SID, ent->atc.sid);
- cmd[1] |= FIELD_PREP(CMDQ_ATC_1_SIZE, ent->atc.size);
- cmd[1] |= ent->atc.addr & CMDQ_ATC_1_ADDR_MASK;
- break;
- case CMDQ_OP_PRI_RESP:
- cmd[0] |= FIELD_PREP(CMDQ_0_SSV, ent->substream_valid);
- cmd[0] |= FIELD_PREP(CMDQ_PRI_0_SSID, ent->pri.ssid);
- cmd[0] |= FIELD_PREP(CMDQ_PRI_0_SID, ent->pri.sid);
- cmd[1] |= FIELD_PREP(CMDQ_PRI_1_GRPID, ent->pri.grpid);
- switch (ent->pri.resp) {
- case PRI_RESP_DENY:
- case PRI_RESP_FAIL:
- case PRI_RESP_SUCC:
- break;
- default:
- return -EINVAL;
- }
- cmd[1] |= FIELD_PREP(CMDQ_PRI_1_RESP, ent->pri.resp);
- break;
- case CMDQ_OP_RESUME:
- cmd[0] |= FIELD_PREP(CMDQ_RESUME_0_SID, ent->resume.sid);
- cmd[0] |= FIELD_PREP(CMDQ_RESUME_0_RESP, ent->resume.resp);
- cmd[1] |= FIELD_PREP(CMDQ_RESUME_1_STAG, ent->resume.stag);
- break;
- case CMDQ_OP_CMD_SYNC:
- if (ent->sync.msiaddr) {
- cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_IRQ);
- cmd[1] |= ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK;
- } else {
- cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_SEV);
- }
- cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSH, ARM_SMMU_SH_ISH);
- cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIATTR, ARM_SMMU_MEMATTR_OIWB);
- break;
- default:
- return -ENOENT;
- }
-
- return 0;
-}
-
static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu,
struct arm_smmu_cmdq_ent *ent)
{
@@ -1508,18 +1374,6 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master)
}
/* Stream table manipulation functions */
-static void arm_smmu_write_strtab_l1_desc(struct arm_smmu_strtab_l1 *dst,
- dma_addr_t l2ptr_dma)
-{
- u64 val = 0;
-
- val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, STRTAB_SPLIT + 1);
- val |= l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
-
- /* The HW has 64 bit atomicity with stores to the L2 STE table */
- WRITE_ONCE(dst->l2ptr, cpu_to_le64(val));
-}
-
struct arm_smmu_ste_writer {
struct arm_smmu_entry_writer writer;
u32 sid;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index ea41d790463e..2698438cd35c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -997,6 +997,50 @@ void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master,
int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
struct arm_smmu_cmdq *cmdq, u64 *cmds, int n,
bool sync);
+int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent);
+
+/* Queue functions shared between kernel and hyp. */
+static inline bool queue_full(struct arm_smmu_ll_queue *q)
+{
+ return Q_IDX(q, q->prod) == Q_IDX(q, q->cons) &&
+ Q_WRP(q, q->prod) != Q_WRP(q, q->cons);
+}
+static inline bool queue_empty(struct arm_smmu_ll_queue *q)
+{
+ return Q_IDX(q, q->prod) == Q_IDX(q, q->cons) &&
+ Q_WRP(q, q->prod) == Q_WRP(q, q->cons);
+}
+static inline u32 queue_inc_prod_n(struct arm_smmu_ll_queue *q, int n)
+{
+ u32 prod = (Q_WRP(q, q->prod) | Q_IDX(q, q->prod)) + n;
+ return Q_OVF(q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod);
+}
+
+static inline void queue_inc_cons(struct arm_smmu_ll_queue *q)
+{
+ u32 cons = (Q_WRP(q, q->cons) | Q_IDX(q, q->cons)) + 1;
+ q->cons = Q_OVF(q->cons) | Q_WRP(q, cons) | Q_IDX(q, cons);
+}
+
+
+static inline void queue_write(__le64 *dst, u64 *src, size_t n_dwords)
+{
+ int i;
+ for (i = 0; i < n_dwords; ++i)
+ *dst++ = cpu_to_le64(*src++);
+}
+
+static inline void arm_smmu_write_strtab_l1_desc(struct arm_smmu_strtab_l1 *dst,
+ dma_addr_t l2ptr_dma)
+{
+ u64 val = 0;
+
+ val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, STRTAB_SPLIT + 1);
+ val |= l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
+
+ /* The HW has 64 bit atomicity with stores to the L2 STE table */
+ WRITE_ONCE(dst->l2ptr, cpu_to_le64(val));
+}
#ifdef CONFIG_ARM_SMMU_V3_SVA
bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 07/28] iommu/arm-smmu-v3: Move TLB range invalidation into a macro
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (5 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 06/28] iommu/arm-smmu-v3: Split code with hyp Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 08/28] iommu/arm-smmu-v3: Move IDR parsing to common functions Mostafa Saleh
` (20 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
Range TLB invalidation has a very specific algorithm, instead of
re-writing it for the hypervisor, put it in a macro so it can be
re-used.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 59 +------------------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 64 +++++++++++++++++++++
2 files changed, 67 insertions(+), 56 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 1f765b4e36fa..41820a9180f4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2126,68 +2126,15 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
struct arm_smmu_domain *smmu_domain)
{
struct arm_smmu_device *smmu = smmu_domain->smmu;
- unsigned long end = iova + size, num_pages = 0, tg = 0;
- size_t inv_range = granule;
struct arm_smmu_cmdq_batch cmds;
if (!size)
return;
- if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
- /* Get the leaf page size */
- tg = __ffs(smmu_domain->domain.pgsize_bitmap);
-
- num_pages = size >> tg;
-
- /* Convert page size of 12,14,16 (log2) to 1,2,3 */
- cmd->tlbi.tg = (tg - 10) / 2;
-
- /*
- * Determine what level the granule is at. For non-leaf, both
- * io-pgtable and SVA pass a nominal last-level granule because
- * they don't know what level(s) actually apply, so ignore that
- * and leave TTL=0. However for various errata reasons we still
- * want to use a range command, so avoid the SVA corner case
- * where both scale and num could be 0 as well.
- */
- if (cmd->tlbi.leaf)
- cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));
- else if ((num_pages & CMDQ_TLBI_RANGE_NUM_MAX) == 1)
- num_pages++;
- }
-
arm_smmu_cmdq_batch_init(smmu, &cmds, cmd);
-
- while (iova < end) {
- if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
- /*
- * On each iteration of the loop, the range is 5 bits
- * worth of the aligned size remaining.
- * The range in pages is:
- *
- * range = (num_pages & (0x1f << __ffs(num_pages)))
- */
- unsigned long scale, num;
-
- /* Determine the power of 2 multiple number of pages */
- scale = __ffs(num_pages);
- cmd->tlbi.scale = scale;
-
- /* Determine how many chunks of 2^scale size we have */
- num = (num_pages >> scale) & CMDQ_TLBI_RANGE_NUM_MAX;
- cmd->tlbi.num = num - 1;
-
- /* range is num * 2^scale * pgsize */
- inv_range = num << (scale + tg);
-
- /* Clear out the lower order bits for the next iteration */
- num_pages -= num << scale;
- }
-
- cmd->tlbi.addr = iova;
- arm_smmu_cmdq_batch_add(smmu, &cmds, cmd);
- iova += inv_range;
- }
+ arm_smmu_tlb_inv_build(cmd, iova, size, granule,
+ smmu_domain->domain.pgsize_bitmap,
+ smmu, arm_smmu_cmdq_batch_add, &cmds);
arm_smmu_cmdq_batch_submit(smmu, &cmds);
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 2698438cd35c..a222fb7ef2ec 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -1042,6 +1042,70 @@ static inline void arm_smmu_write_strtab_l1_desc(struct arm_smmu_strtab_l1 *dst,
WRITE_ONCE(dst->l2ptr, cpu_to_le64(val));
}
+/**
+ * arm_smmu_tlb_inv_build - Create a range invalidation command
+ * @cmd: Base command initialized with OPCODE (S1, S2..), vmid and asid.
+ * @iova: Start IOVA to invalidate
+ * @size: Size of range
+ * @granule: Granule of invalidation
+ * @pgsize_bitmap: Page size bit map of the page table.
+ * @smmu: Struct for the smmu, must have ::features
+ * @add_cmd: Function to send/batch the invalidation command
+ * @cmds: Incase of batching, it includes the pointer to the batch
+ */
+#define arm_smmu_tlb_inv_build(cmd, iova, size, granule, pgsize_bitmap, smmu, add_cmd, cmds) \
+{ \
+ unsigned long _iova = (iova); \
+ size_t _size = (size); \
+ size_t _granule = (granule); \
+ unsigned long end = _iova + _size, num_pages = 0, tg = 0; \
+ size_t inv_range = _granule; \
+ if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) { \
+ /* Get the leaf page size */ \
+ tg = __ffs(pgsize_bitmap); \
+ num_pages = _size >> tg; \
+ /* Convert page size of 12,14,16 (log2) to 1,2,3 */ \
+ cmd->tlbi.tg = (tg - 10) / 2; \
+ /*
+ * Determine what level the granule is at. For non-leaf, both
+ * io-pgtable and SVA pass a nominal last-level granule because
+ * they don't know what level(s) actually apply, so ignore that
+ * and leave TTL=0. However for various errata reasons we still
+ * want to use a range command, so avoid the SVA corner case
+ * where both scale and num could be 0 as well.
+ */ \
+ if (cmd->tlbi.leaf) \
+ cmd->tlbi.ttl = 4 - ((ilog2(_granule) - 3) / (tg - 3)); \
+ else if ((num_pages & CMDQ_TLBI_RANGE_NUM_MAX) == 1) \
+ num_pages++; \
+ } \
+ while (_iova < end) { \
+ if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) { \
+ /*
+ * On each iteration of the loop, the range is 5 bits
+ * worth of the aligned size remaining.
+ * The range in pages is:
+ *
+ * range = (num_pages & (0x1f << __ffs(num_pages)))
+ */ \
+ unsigned long scale, num; \
+ /* Determine the power of 2 multiple number of pages */ \
+ scale = __ffs(num_pages); \
+ cmd->tlbi.scale = scale; \
+ /* Determine how many chunks of 2^scale size we have */ \
+ num = (num_pages >> scale) & CMDQ_TLBI_RANGE_NUM_MAX; \
+ cmd->tlbi.num = num - 1; \
+ /* range is num * 2^scale * pgsize */ \
+ inv_range = num << (scale + tg); \
+ /* Clear out the lower order bits for the next iteration */ \
+ num_pages -= num << scale; \
+ } \
+ cmd->tlbi.addr = _iova; \
+ add_cmd(smmu, cmds, cmd); \
+ _iova += inv_range; \
+ } \
+} \
+
#ifdef CONFIG_ARM_SMMU_V3_SVA
bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
void arm_smmu_sva_notifier_synchronize(void);
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 08/28] iommu/arm-smmu-v3: Move IDR parsing to common functions
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (6 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 07/28] iommu/arm-smmu-v3: Move TLB range invalidation into a macro Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 09/28] KVM: arm64: iommu: Introduce IOMMU driver infrastructure Mostafa Saleh
` (19 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
Move parsing of IDRs to functions so that it can be re-used
from the hypervisor.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 112 +++-----------------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 111 +++++++++++++++++++
2 files changed, 126 insertions(+), 97 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 41820a9180f4..10ca07c6dbe9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -4112,57 +4112,17 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
/* IDR0 */
reg = readl_relaxed(smmu->base + ARM_SMMU_IDR0);
- /* 2-level structures */
- if (FIELD_GET(IDR0_ST_LVL, reg) == IDR0_ST_LVL_2LVL)
- smmu->features |= ARM_SMMU_FEAT_2_LVL_STRTAB;
-
- if (reg & IDR0_CD2L)
- smmu->features |= ARM_SMMU_FEAT_2_LVL_CDTAB;
-
- /*
- * Translation table endianness.
- * We currently require the same endianness as the CPU, but this
- * could be changed later by adding a new IO_PGTABLE_QUIRK.
- */
- switch (FIELD_GET(IDR0_TTENDIAN, reg)) {
- case IDR0_TTENDIAN_MIXED:
- smmu->features |= ARM_SMMU_FEAT_TT_LE | ARM_SMMU_FEAT_TT_BE;
- break;
-#ifdef __BIG_ENDIAN
- case IDR0_TTENDIAN_BE:
- smmu->features |= ARM_SMMU_FEAT_TT_BE;
- break;
-#else
- case IDR0_TTENDIAN_LE:
- smmu->features |= ARM_SMMU_FEAT_TT_LE;
- break;
-#endif
- default:
+ smmu->features |= smmu_idr0_features(reg);
+ if (FIELD_GET(IDR0_TTENDIAN, reg) == IDR0_TTENDIAN_RESERVED) {
dev_err(smmu->dev, "unknown/unsupported TT endianness!\n");
return -ENXIO;
}
-
- /* Boolean feature flags */
- if (IS_ENABLED(CONFIG_PCI_PRI) && reg & IDR0_PRI)
- smmu->features |= ARM_SMMU_FEAT_PRI;
-
- if (IS_ENABLED(CONFIG_PCI_ATS) && reg & IDR0_ATS)
- smmu->features |= ARM_SMMU_FEAT_ATS;
-
- if (reg & IDR0_SEV)
- smmu->features |= ARM_SMMU_FEAT_SEV;
-
- if (reg & IDR0_MSI) {
- smmu->features |= ARM_SMMU_FEAT_MSI;
- if (coherent && !disable_msipolling)
- smmu->options |= ARM_SMMU_OPT_MSIPOLL;
- }
-
- if (reg & IDR0_HYP) {
- smmu->features |= ARM_SMMU_FEAT_HYP;
- if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
- smmu->features |= ARM_SMMU_FEAT_E2H;
- }
+ if (coherent && !disable_msipolling &&
+ smmu->features & ARM_SMMU_FEAT_MSI)
+ smmu->options |= ARM_SMMU_OPT_MSIPOLL;
+ if (smmu->features & ARM_SMMU_FEAT_HYP &&
+ cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+ smmu->features |= ARM_SMMU_FEAT_E2H;
arm_smmu_get_httu(smmu, reg);
@@ -4174,21 +4134,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
dev_warn(smmu->dev, "IDR0.COHACC overridden by FW configuration (%s)\n",
str_true_false(coherent));
- switch (FIELD_GET(IDR0_STALL_MODEL, reg)) {
- case IDR0_STALL_MODEL_FORCE:
- smmu->features |= ARM_SMMU_FEAT_STALL_FORCE;
- fallthrough;
- case IDR0_STALL_MODEL_STALL:
- smmu->features |= ARM_SMMU_FEAT_STALLS;
- }
-
- if (reg & IDR0_S1P)
- smmu->features |= ARM_SMMU_FEAT_TRANS_S1;
-
- if (reg & IDR0_S2P)
- smmu->features |= ARM_SMMU_FEAT_TRANS_S2;
-
- if (!(reg & (IDR0_S1P | IDR0_S2P))) {
+ if (!(smmu->features & (ARM_SMMU_FEAT_TRANS_S1 | ARM_SMMU_FEAT_TRANS_S2))) {
dev_err(smmu->dev, "no translation support!\n");
return -ENXIO;
}
@@ -4253,10 +4199,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
/* IDR3 */
reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
- if (FIELD_GET(IDR3_RIL, reg))
- smmu->features |= ARM_SMMU_FEAT_RANGE_INV;
- if (FIELD_GET(IDR3_FWB, reg))
- smmu->features |= ARM_SMMU_FEAT_S2FWB;
+ smmu->features |= smmu_idr3_features(reg);
/* IDR5 */
reg = readl_relaxed(smmu->base + ARM_SMMU_IDR5);
@@ -4265,43 +4208,18 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
smmu->evtq.max_stalls = FIELD_GET(IDR5_STALL_MAX, reg);
/* Page sizes */
- if (reg & IDR5_GRAN64K)
- smmu->pgsize_bitmap |= SZ_64K | SZ_512M;
- if (reg & IDR5_GRAN16K)
- smmu->pgsize_bitmap |= SZ_16K | SZ_32M;
- if (reg & IDR5_GRAN4K)
- smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
+ smmu->pgsize_bitmap = smmu_idr5_to_pgsize(reg);
/* Input address size */
if (FIELD_GET(IDR5_VAX, reg) == IDR5_VAX_52_BIT)
smmu->features |= ARM_SMMU_FEAT_VAX;
- /* Output address size */
- switch (FIELD_GET(IDR5_OAS, reg)) {
- case IDR5_OAS_32_BIT:
- smmu->oas = 32;
- break;
- case IDR5_OAS_36_BIT:
- smmu->oas = 36;
- break;
- case IDR5_OAS_40_BIT:
- smmu->oas = 40;
- break;
- case IDR5_OAS_42_BIT:
- smmu->oas = 42;
- break;
- case IDR5_OAS_44_BIT:
- smmu->oas = 44;
- break;
- case IDR5_OAS_52_BIT:
- smmu->oas = 52;
+ smmu->oas = smmu_idr5_to_oas(reg);
+ if (smmu->oas == 52)
smmu->pgsize_bitmap |= 1ULL << 42; /* 4TB */
- break;
- default:
+ else if (!smmu->oas) {
dev_info(smmu->dev,
- "unknown output address size. Truncating to 48-bit\n");
- fallthrough;
- case IDR5_OAS_48_BIT:
+ "unknown output address size. Truncating to 48-bit\n");
smmu->oas = 48;
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index a222fb7ef2ec..8ffcc2e32474 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -26,6 +26,7 @@ struct arm_smmu_device;
#define IDR0_STALL_MODEL_FORCE 2
#define IDR0_TTENDIAN GENMASK(22, 21)
#define IDR0_TTENDIAN_MIXED 0
+#define IDR0_TTENDIAN_RESERVED 1
#define IDR0_TTENDIAN_LE 2
#define IDR0_TTENDIAN_BE 3
#define IDR0_CD2L (1 << 19)
@@ -1042,6 +1043,116 @@ static inline void arm_smmu_write_strtab_l1_desc(struct arm_smmu_strtab_l1 *dst,
WRITE_ONCE(dst->l2ptr, cpu_to_le64(val));
}
+static inline u32 smmu_idr0_features(u32 reg)
+{
+ u32 features = 0;
+
+ /* 2-level structures */
+ if (FIELD_GET(IDR0_ST_LVL, reg) == IDR0_ST_LVL_2LVL)
+ features |= ARM_SMMU_FEAT_2_LVL_STRTAB;
+
+ if (reg & IDR0_CD2L)
+ features |= ARM_SMMU_FEAT_2_LVL_CDTAB;
+
+ /*
+ * Translation table endianness.
+ * We currently require the same endianness as the CPU, but this
+ * could be changed later by adding a new IO_PGTABLE_QUIRK.
+ */
+ switch (FIELD_GET(IDR0_TTENDIAN, reg)) {
+ case IDR0_TTENDIAN_MIXED:
+ features |= ARM_SMMU_FEAT_TT_LE | ARM_SMMU_FEAT_TT_BE;
+ break;
+#ifdef __BIG_ENDIAN
+ case IDR0_TTENDIAN_BE:
+ features |= ARM_SMMU_FEAT_TT_BE;
+ break;
+#else
+ case IDR0_TTENDIAN_LE:
+ features |= ARM_SMMU_FEAT_TT_LE;
+ break;
+#endif
+ }
+
+ /* Boolean feature flags */
+ if (IS_ENABLED(CONFIG_PCI_PRI) && reg & IDR0_PRI)
+ features |= ARM_SMMU_FEAT_PRI;
+
+ if (IS_ENABLED(CONFIG_PCI_ATS) && reg & IDR0_ATS)
+ features |= ARM_SMMU_FEAT_ATS;
+
+ if (reg & IDR0_SEV)
+ features |= ARM_SMMU_FEAT_SEV;
+
+ if (reg & IDR0_MSI)
+ features |= ARM_SMMU_FEAT_MSI;
+
+ if (reg & IDR0_HYP)
+ features |= ARM_SMMU_FEAT_HYP;
+
+ switch (FIELD_GET(IDR0_STALL_MODEL, reg)) {
+ case IDR0_STALL_MODEL_FORCE:
+ features |= ARM_SMMU_FEAT_STALL_FORCE;
+ fallthrough;
+ case IDR0_STALL_MODEL_STALL:
+ features |= ARM_SMMU_FEAT_STALLS;
+ }
+
+ if (reg & IDR0_S1P)
+ features |= ARM_SMMU_FEAT_TRANS_S1;
+
+ if (reg & IDR0_S2P)
+ features |= ARM_SMMU_FEAT_TRANS_S2;
+
+ return features;
+}
+
+static inline u32 smmu_idr3_features(u32 reg)
+{
+ u32 features = 0;
+
+ if (FIELD_GET(IDR3_RIL, reg))
+ features |= ARM_SMMU_FEAT_RANGE_INV;
+ if (FIELD_GET(IDR3_FWB, reg))
+ features |= ARM_SMMU_FEAT_S2FWB;
+
+ return features;
+}
+
+static inline u32 smmu_idr5_to_oas(u32 reg)
+{
+ switch (FIELD_GET(IDR5_OAS, reg)) {
+ case IDR5_OAS_32_BIT:
+ return 32;
+ case IDR5_OAS_36_BIT:
+ return 36;
+ case IDR5_OAS_40_BIT:
+ return 40;
+ case IDR5_OAS_42_BIT:
+ return 42;
+ case IDR5_OAS_44_BIT:
+ return 44;
+ case IDR5_OAS_48_BIT:
+ return 48;
+ case IDR5_OAS_52_BIT:
+ return 52;
+ }
+ return 0;
+}
+
+static inline unsigned long smmu_idr5_to_pgsize(u32 reg)
+{
+ unsigned long pgsize_bitmap = 0;
+
+ if (reg & IDR5_GRAN64K)
+ pgsize_bitmap |= SZ_64K | SZ_512M;
+ if (reg & IDR5_GRAN16K)
+ pgsize_bitmap |= SZ_16K | SZ_32M;
+ if (reg & IDR5_GRAN4K)
+ pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
+ return pgsize_bitmap;
+}
+
/**
* arm_smmu_tlb_inv_build - Create a range invalidation command
* @cmd: Base command initialized with OPCODE (S1, S2..), vmid and asid.
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 09/28] KVM: arm64: iommu: Introduce IOMMU driver infrastructure
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (7 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 08/28] iommu/arm-smmu-v3: Move IDR parsing to common functions Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 10/28] KVM: arm64: iommu: Shadow host stage-2 page table Mostafa Saleh
` (18 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
To establish DMA isolation, KVM needs an IOMMU driver which provide
ops implemented at EL2.
Only one driver can be used and is registered with
kvm_iommu_register_driver() by passing pointer to the ops.
This must be called before module_init() which is the point KVM
initializes.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/kvm/Makefile | 3 ++-
arch/arm64/kvm/hyp/include/nvhe/iommu.h | 13 +++++++++++++
arch/arm64/kvm/hyp/nvhe/Makefile | 3 ++-
arch/arm64/kvm/hyp/nvhe/iommu/iommu.c | 18 ++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/setup.c | 5 +++++
arch/arm64/kvm/iommu.c | 15 +++++++++++++++
7 files changed, 57 insertions(+), 2 deletions(-)
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/iommu.h
create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
create mode 100644 arch/arm64/kvm/iommu.c
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3e41a880b062..1a08066eaf7e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1674,5 +1674,7 @@ void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt);
void get_reg_fixed_bits(struct kvm *kvm, enum vcpu_sysreg reg, u64 *res0, u64 *res1);
void check_feature_map(void);
+struct kvm_iommu_ops;
+int kvm_iommu_register_driver(struct kvm_iommu_ops *hyp_ops);
#endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 7c329e01c557..5528704bfd72 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -23,7 +23,8 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
vgic/vgic-v3.o vgic/vgic-v4.o \
vgic/vgic-mmio.o vgic/vgic-mmio-v2.o \
vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \
- vgic/vgic-its.o vgic/vgic-debug.o vgic/vgic-v3-nested.o
+ vgic/vgic-its.o vgic/vgic-debug.o vgic/vgic-v3-nested.o \
+ iommu.o
kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o
kvm-$(CONFIG_ARM64_PTR_AUTH) += pauth.o
diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
new file mode 100644
index 000000000000..1ac70cc28a9e
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ARM64_KVM_NVHE_IOMMU_H__
+#define __ARM64_KVM_NVHE_IOMMU_H__
+
+#include <asm/kvm_host.h>
+
+struct kvm_iommu_ops {
+ int (*init)(void);
+};
+
+int kvm_iommu_init(void);
+
+#endif /* __ARM64_KVM_NVHE_IOMMU_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index a76522d63c3e..393ff143f0be 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -24,7 +24,8 @@ CFLAGS_switch.nvhe.o += -Wno-override-init
hyp-obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
hyp-main.o hyp-smp.o psci-relay.o early_alloc.o page_alloc.o \
- cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o stacktrace.o ffa.o
+ cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o stacktrace.o ffa.o \
+ iommu/iommu.o
hyp-obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
hyp-obj-$(CONFIG_LIST_HARDENED) += list_debug.o
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
new file mode 100644
index 000000000000..a01c036c55be
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * IOMMU operations for pKVM
+ *
+ * Copyright (C) 2022 Linaro Ltd.
+ */
+#include <nvhe/iommu.h>
+
+/* Only one set of ops supported */
+struct kvm_iommu_ops *kvm_iommu_ops;
+
+int kvm_iommu_init(void)
+{
+ if (!kvm_iommu_ops || !kvm_iommu_ops->init)
+ return -ENODEV;
+
+ return kvm_iommu_ops->init();
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index ee6435473204..bdbc77395e03 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -13,6 +13,7 @@
#include <nvhe/early_alloc.h>
#include <nvhe/ffa.h>
#include <nvhe/gfp.h>
+#include <nvhe/iommu.h>
#include <nvhe/memory.h>
#include <nvhe/mem_protect.h>
#include <nvhe/mm.h>
@@ -320,6 +321,10 @@ void __noreturn __pkvm_init_finalise(void)
if (ret)
goto out;
+ ret = kvm_iommu_init();
+ if (ret)
+ goto out;
+
ret = hyp_ffa_init(ffa_proxy_pages);
if (ret)
goto out;
diff --git a/arch/arm64/kvm/iommu.c b/arch/arm64/kvm/iommu.c
new file mode 100644
index 000000000000..926a1a94698f
--- /dev/null
+++ b/arch/arm64/kvm/iommu.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 Google LLC
+ * Author: Mostafa Saleh <smostafa@google.com>
+ */
+
+#include <linux/kvm_host.h>
+
+extern struct kvm_iommu_ops *kvm_nvhe_sym(kvm_iommu_ops);
+
+int kvm_iommu_register_driver(struct kvm_iommu_ops *hyp_ops)
+{
+ kvm_nvhe_sym(kvm_iommu_ops) = hyp_ops;
+ return 0;
+}
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 10/28] KVM: arm64: iommu: Shadow host stage-2 page table
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (8 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 09/28] KVM: arm64: iommu: Introduce IOMMU driver infrastructure Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 11/28] KVM: arm64: iommu: Add memory pool Mostafa Saleh
` (17 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
Create a shadow page table for the IOMMU that shadows the
host CPU stage-2 into the IOMMUs to establish DMA isolation.
An initial snapshot is created after the driver init, then
on every permission change a callback would be called for
the IOMMU driver to update the page table.
For some cases, an SMMUv3 may be able to share the same page
table used with the host CPU stage-2 directly.
However, this is too strict and requires changes to the core hypervisor
page table code, plus it would require the hypervisor to handle IOMMU
page faults. This can be added later as an optimization for SMMUV3.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
arch/arm64/kvm/hyp/include/nvhe/iommu.h | 4 ++
arch/arm64/kvm/hyp/nvhe/iommu/iommu.c | 83 ++++++++++++++++++++++++-
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 5 ++
3 files changed, 90 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
index 1ac70cc28a9e..219363045b1c 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -3,11 +3,15 @@
#define __ARM64_KVM_NVHE_IOMMU_H__
#include <asm/kvm_host.h>
+#include <asm/kvm_pgtable.h>
struct kvm_iommu_ops {
int (*init)(void);
+ void (*host_stage2_idmap)(phys_addr_t start, phys_addr_t end, int prot);
};
int kvm_iommu_init(void);
+void kvm_iommu_host_stage2_idmap(phys_addr_t start, phys_addr_t end,
+ enum kvm_pgtable_prot prot);
#endif /* __ARM64_KVM_NVHE_IOMMU_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index a01c036c55be..f7d1c8feb358 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -4,15 +4,94 @@
*
* Copyright (C) 2022 Linaro Ltd.
*/
+#include <linux/iommu.h>
+
#include <nvhe/iommu.h>
+#include <nvhe/mem_protect.h>
+#include <nvhe/spinlock.h>
/* Only one set of ops supported */
struct kvm_iommu_ops *kvm_iommu_ops;
+/* Protected by host_mmu.lock */
+static bool kvm_idmap_initialized;
+
+static inline int pkvm_to_iommu_prot(enum kvm_pgtable_prot prot)
+{
+ int iommu_prot = 0;
+
+ if (prot & KVM_PGTABLE_PROT_R)
+ iommu_prot |= IOMMU_READ;
+ if (prot & KVM_PGTABLE_PROT_W)
+ iommu_prot |= IOMMU_WRITE;
+ if (prot == PKVM_HOST_MMIO_PROT)
+ iommu_prot |= IOMMU_MMIO;
+
+ /* We don't understand that, might be dangerous. */
+ WARN_ON(prot & ~PKVM_HOST_MEM_PROT);
+ return iommu_prot;
+}
+
+static int __snapshot_host_stage2(const struct kvm_pgtable_visit_ctx *ctx,
+ enum kvm_pgtable_walk_flags visit)
+{
+ u64 start = ctx->addr;
+ kvm_pte_t pte = *ctx->ptep;
+ u32 level = ctx->level;
+ u64 end = start + kvm_granule_size(level);
+ int prot = IOMMU_READ | IOMMU_WRITE;
+
+ /* Keep unmapped. */
+ if (pte && !kvm_pte_valid(pte))
+ return 0;
+
+ if (kvm_pte_valid(pte))
+ prot = pkvm_to_iommu_prot(kvm_pgtable_stage2_pte_prot(pte));
+ else if (!addr_is_memory(start))
+ prot |= IOMMU_MMIO;
+
+ kvm_iommu_ops->host_stage2_idmap(start, end, prot);
+ return 0;
+}
+
+static int kvm_iommu_snapshot_host_stage2(void)
+{
+ int ret;
+ struct kvm_pgtable_walker walker = {
+ .cb = __snapshot_host_stage2,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+ struct kvm_pgtable *pgt = &host_mmu.pgt;
+
+ hyp_spin_lock(&host_mmu.lock);
+ ret = kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker);
+ /* Start receiving calls to host_stage2_idmap. */
+ kvm_idmap_initialized = !!ret;
+ hyp_spin_unlock(&host_mmu.lock);
+
+ return ret;
+}
+
int kvm_iommu_init(void)
{
- if (!kvm_iommu_ops || !kvm_iommu_ops->init)
+ int ret;
+
+ if (!kvm_iommu_ops || !kvm_iommu_ops->init ||
+ !kvm_iommu_ops->host_stage2_idmap)
return -ENODEV;
- return kvm_iommu_ops->init();
+ ret = kvm_iommu_ops->init();
+ if (ret)
+ return ret;
+ return kvm_iommu_snapshot_host_stage2();
+}
+
+void kvm_iommu_host_stage2_idmap(phys_addr_t start, phys_addr_t end,
+ enum kvm_pgtable_prot prot)
+{
+ hyp_assert_lock_held(&host_mmu.lock);
+
+ if (!kvm_idmap_initialized)
+ return;
+ kvm_iommu_ops->host_stage2_idmap(start, end, pkvm_to_iommu_prot(prot));
}
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index c9a15ef6b18d..bce6690f29c0 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -15,6 +15,7 @@
#include <hyp/fault.h>
#include <nvhe/gfp.h>
+#include <nvhe/iommu.h>
#include <nvhe/memory.h>
#include <nvhe/mem_protect.h>
#include <nvhe/mm.h>
@@ -529,6 +530,7 @@ static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_
int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
{
int ret;
+ enum kvm_pgtable_prot prot;
if (!range_is_memory(addr, addr + size))
return -EPERM;
@@ -538,6 +540,9 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
if (ret)
return ret;
+ prot = owner_id == PKVM_ID_HOST ? PKVM_HOST_MEM_PROT : 0;
+ kvm_iommu_host_stage2_idmap(addr, addr + size, prot);
+
/* Don't forget to update the vmemmap tracking for the host */
if (owner_id == PKVM_ID_HOST)
__host_update_page_state(addr, size, PKVM_PAGE_OWNED);
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 11/28] KVM: arm64: iommu: Add memory pool
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (9 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 10/28] KVM: arm64: iommu: Shadow host stage-2 page table Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 12/28] KVM: arm64: iommu: Support DABT for IOMMU Mostafa Saleh
` (16 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
IOMMU drivers would require to allocate memory for the shadow page
table. Similar to the host stage-2 CPU page table, the IOMMU pool
is allocated early from the carveout and it's memory is added in
a pool which the IOMMU driver can allocate from and reclaim at
run time.
At this point the nr_pages is 0 as there are no driver, in the next
patches when the SMMUv3 driver is added, it will add it's own function
to return the number of pages needed in kvm/iommu.c.
Unfortunately, this part has 2 leak into kvm/iommu as this happens too
early before drivers can have any init calls.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/iommu.h | 5 ++++-
arch/arm64/kvm/hyp/nvhe/iommu/iommu.c | 20 +++++++++++++++++++-
arch/arm64/kvm/hyp/nvhe/setup.c | 10 +++++++++-
arch/arm64/kvm/iommu.c | 11 +++++++++++
arch/arm64/kvm/pkvm.c | 1 +
6 files changed, 45 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 1a08066eaf7e..fcb4b26072f7 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1676,5 +1676,6 @@ void check_feature_map(void);
struct kvm_iommu_ops;
int kvm_iommu_register_driver(struct kvm_iommu_ops *hyp_ops);
+size_t kvm_iommu_pages(void);
#endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
index 219363045b1c..9f4906c6dcc9 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -10,8 +10,11 @@ struct kvm_iommu_ops {
void (*host_stage2_idmap)(phys_addr_t start, phys_addr_t end, int prot);
};
-int kvm_iommu_init(void);
+int kvm_iommu_init(void *pool_base, size_t nr_pages);
void kvm_iommu_host_stage2_idmap(phys_addr_t start, phys_addr_t end,
enum kvm_pgtable_prot prot);
+void *kvm_iommu_donate_pages(u8 order);
+void kvm_iommu_reclaim_pages(void *ptr);
+
#endif /* __ARM64_KVM_NVHE_IOMMU_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index f7d1c8feb358..1673165c7330 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -15,6 +15,7 @@ struct kvm_iommu_ops *kvm_iommu_ops;
/* Protected by host_mmu.lock */
static bool kvm_idmap_initialized;
+static struct hyp_pool iommu_pages_pool;
static inline int pkvm_to_iommu_prot(enum kvm_pgtable_prot prot)
{
@@ -72,7 +73,7 @@ static int kvm_iommu_snapshot_host_stage2(void)
return ret;
}
-int kvm_iommu_init(void)
+int kvm_iommu_init(void *pool_base, size_t nr_pages)
{
int ret;
@@ -80,6 +81,13 @@ int kvm_iommu_init(void)
!kvm_iommu_ops->host_stage2_idmap)
return -ENODEV;
+ if (nr_pages) {
+ ret = hyp_pool_init(&iommu_pages_pool, hyp_virt_to_pfn(pool_base),
+ nr_pages, 0);
+ if (ret)
+ return ret;
+ }
+
ret = kvm_iommu_ops->init();
if (ret)
return ret;
@@ -95,3 +103,13 @@ void kvm_iommu_host_stage2_idmap(phys_addr_t start, phys_addr_t end,
return;
kvm_iommu_ops->host_stage2_idmap(start, end, pkvm_to_iommu_prot(prot));
}
+
+void *kvm_iommu_donate_pages(u8 order)
+{
+ return hyp_alloc_pages(&iommu_pages_pool, order);
+}
+
+void kvm_iommu_reclaim_pages(void *ptr)
+{
+ hyp_put_page(&iommu_pages_pool, ptr);
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index bdbc77395e03..09ecee2cd864 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -21,6 +21,7 @@
#include <nvhe/trap_handler.h>
unsigned long hyp_nr_cpus;
+size_t hyp_kvm_iommu_pages;
#define hyp_percpu_size ((unsigned long)__per_cpu_end - \
(unsigned long)__per_cpu_start)
@@ -33,6 +34,7 @@ static void *selftest_base;
static void *ffa_proxy_pages;
static struct kvm_pgtable_mm_ops pkvm_pgtable_mm_ops;
static struct hyp_pool hpool;
+static void *iommu_base;
static int divide_memory_pool(void *virt, unsigned long size)
{
@@ -70,6 +72,12 @@ static int divide_memory_pool(void *virt, unsigned long size)
if (!ffa_proxy_pages)
return -ENOMEM;
+ if (hyp_kvm_iommu_pages) {
+ iommu_base = hyp_early_alloc_contig(hyp_kvm_iommu_pages);
+ if (!iommu_base)
+ return -ENOMEM;
+ }
+
return 0;
}
@@ -321,7 +329,7 @@ void __noreturn __pkvm_init_finalise(void)
if (ret)
goto out;
- ret = kvm_iommu_init();
+ ret = kvm_iommu_init(iommu_base, hyp_kvm_iommu_pages);
if (ret)
goto out;
diff --git a/arch/arm64/kvm/iommu.c b/arch/arm64/kvm/iommu.c
index 926a1a94698f..5460b1bd44a6 100644
--- a/arch/arm64/kvm/iommu.c
+++ b/arch/arm64/kvm/iommu.c
@@ -7,9 +7,20 @@
#include <linux/kvm_host.h>
extern struct kvm_iommu_ops *kvm_nvhe_sym(kvm_iommu_ops);
+extern size_t kvm_nvhe_sym(hyp_kvm_iommu_pages);
int kvm_iommu_register_driver(struct kvm_iommu_ops *hyp_ops)
{
kvm_nvhe_sym(kvm_iommu_ops) = hyp_ops;
return 0;
}
+
+size_t kvm_iommu_pages(void)
+{
+ /*
+ * This is called very early during setup_arch() where no initcalls,
+ * so this has to call specific functions per each KVM driver.
+ */
+ kvm_nvhe_sym(hyp_kvm_iommu_pages) = 0;
+ return 0;
+}
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index fcd70bfe44fb..6098beda36fa 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -63,6 +63,7 @@ void __init kvm_hyp_reserve(void)
hyp_mem_pages += hyp_vmemmap_pages(STRUCT_HYP_PAGE_SIZE);
hyp_mem_pages += pkvm_selftest_pages();
hyp_mem_pages += hyp_ffa_proxy_pages();
+ hyp_mem_pages += kvm_iommu_pages();
/*
* Try to allocate a PMD-aligned region to reduce TLB pressure once
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 12/28] KVM: arm64: iommu: Support DABT for IOMMU
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (10 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 11/28] KVM: arm64: iommu: Add memory pool Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 13/28] iommu/arm-smmu-v3-kvm: Add SMMUv3 driver Mostafa Saleh
` (15 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
SMMUv3 driver need to trap and emulate access to the MMIO space as
part of the nested implementation.
Add a handler for DABTs for IOMMU drivers to be able to do so, in
case the host fault in page, check if it's part of IOMMU emulation
first.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
arch/arm64/include/asm/kvm_arm.h | 2 ++
arch/arm64/kvm/hyp/include/nvhe/iommu.h | 3 ++-
arch/arm64/kvm/hyp/nvhe/iommu/iommu.c | 15 +++++++++++++++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 10 ++++++++++
4 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 1da290aeedce..8d63308ccd5c 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -331,6 +331,8 @@
/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
#define HPFAR_MASK (~UL(0xf))
+
+#define FAR_MASK GENMASK_ULL(11, 0)
/*
* We have
* PAR [PA_Shift - 1 : 12] = PA [PA_Shift - 1 : 12]
diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
index 9f4906c6dcc9..10fe4fbf7424 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -8,6 +8,7 @@
struct kvm_iommu_ops {
int (*init)(void);
void (*host_stage2_idmap)(phys_addr_t start, phys_addr_t end, int prot);
+ bool (*dabt_handler)(struct user_pt_regs *regs, u64 esr, u64 addr);
};
int kvm_iommu_init(void *pool_base, size_t nr_pages);
@@ -16,5 +17,5 @@ void kvm_iommu_host_stage2_idmap(phys_addr_t start, phys_addr_t end,
enum kvm_pgtable_prot prot);
void *kvm_iommu_donate_pages(u8 order);
void kvm_iommu_reclaim_pages(void *ptr);
-
+bool kvm_iommu_host_dabt_handler(struct user_pt_regs *regs, u64 esr, u64 addr);
#endif /* __ARM64_KVM_NVHE_IOMMU_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index 1673165c7330..376b30f557a2 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -4,6 +4,10 @@
*
* Copyright (C) 2022 Linaro Ltd.
*/
+#include <asm/kvm_hyp.h>
+
+#include <hyp/adjust_pc.h>
+
#include <linux/iommu.h>
#include <nvhe/iommu.h>
@@ -113,3 +117,14 @@ void kvm_iommu_reclaim_pages(void *ptr)
{
hyp_put_page(&iommu_pages_pool, ptr);
}
+
+bool kvm_iommu_host_dabt_handler(struct user_pt_regs *regs, u64 esr, u64 addr)
+{
+ if (kvm_iommu_ops && kvm_iommu_ops->dabt_handler &&
+ kvm_iommu_ops->dabt_handler(regs, esr, addr)) {
+ /* DABT handled by the driver, skip to next instruction. */
+ kvm_skip_host_instr();
+ return true;
+ }
+ return false;
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index bce6690f29c0..7371b2183e1e 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -595,6 +595,11 @@ static int host_stage2_idmap(u64 addr)
return ret;
}
+static bool is_dabt(u64 esr)
+{
+ return ESR_ELx_EC(esr) == ESR_ELx_EC_DABT_LOW;
+}
+
void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
{
struct kvm_vcpu_fault_info fault;
@@ -617,6 +622,11 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
*/
BUG_ON(!(fault.hpfar_el2 & HPFAR_EL2_NS));
addr = FIELD_GET(HPFAR_EL2_FIPA, fault.hpfar_el2) << 12;
+ addr |= fault.far_el2 & FAR_MASK;
+
+ if (is_dabt(esr) && !addr_is_memory(addr) &&
+ kvm_iommu_host_dabt_handler(&host_ctxt->regs, esr, addr))
+ return;
ret = host_stage2_idmap(addr);
BUG_ON(ret && ret != -EAGAIN);
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 13/28] iommu/arm-smmu-v3-kvm: Add SMMUv3 driver
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (11 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 12/28] KVM: arm64: iommu: Support DABT for IOMMU Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 14/28] iommu/arm-smmu-v3: Add KVM mode in the driver Mostafa Saleh
` (14 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
From: Jean-Philippe Brucker <jean-philippe@linaro.org>
Add the skeleton for an Arm SMMUv3 driver at EL2.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
arch/arm64/kvm/hyp/nvhe/Makefile | 5 ++++
drivers/iommu/arm/Kconfig | 9 ++++++
.../iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c | 29 +++++++++++++++++++
.../iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h | 16 ++++++++++
4 files changed, 59 insertions(+)
create mode 100644 drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
create mode 100644 drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 393ff143f0be..c71c96262378 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -31,6 +31,11 @@ hyp-obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
hyp-obj-$(CONFIG_LIST_HARDENED) += list_debug.o
hyp-obj-y += $(lib-objs)
+HYP_SMMU_V3_DRV_PATH = ../../../../../drivers/iommu/arm/arm-smmu-v3
+
+hyp-obj-$(CONFIG_ARM_SMMU_V3_PKVM) += $(HYP_SMMU_V3_DRV_PATH)/pkvm/arm-smmu-v3.o \
+ $(HYP_SMMU_V3_DRV_PATH)/arm-smmu-v3-common-hyp.o
+
##
## Build rules for compiling nVHE hyp code
## Output of this folder is `kvm_nvhe.o`, a partially linked object
diff --git a/drivers/iommu/arm/Kconfig b/drivers/iommu/arm/Kconfig
index ef42bbe07dbe..7eeb94d2499d 100644
--- a/drivers/iommu/arm/Kconfig
+++ b/drivers/iommu/arm/Kconfig
@@ -142,3 +142,12 @@ config QCOM_IOMMU
select ARM_DMA_USE_IOMMU
help
Support for IOMMU on certain Qualcomm SoCs.
+
+config ARM_SMMU_V3_PKVM
+ bool "ARM SMMUv3 support for protected Virtual Machines"
+ depends on KVM && ARM64 && ARM_SMMU_V3=y
+ help
+ Enable a SMMUv3 driver in the KVM hypervisor, to protect VMs against
+ memory accesses from devices owned by the host.
+
+ Say Y here if you intend to enable KVM in protected mode.
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
new file mode 100644
index 000000000000..fa8b71152560
--- /dev/null
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * pKVM hyp driver for the Arm SMMUv3
+ *
+ * Copyright (C) 2022 Linaro Ltd.
+ */
+#include <asm/kvm_hyp.h>
+
+#include <nvhe/iommu.h>
+
+#include "arm_smmu_v3.h"
+
+size_t __ro_after_init kvm_hyp_arm_smmu_v3_count;
+struct hyp_arm_smmu_v3_device *kvm_hyp_arm_smmu_v3_smmus;
+
+static int smmu_init(void)
+{
+ return -ENOSYS;
+}
+
+static void smmu_host_stage2_idmap(phys_addr_t start, phys_addr_t end, int prot)
+{
+}
+
+/* Shared with the kernel driver in EL1 */
+struct kvm_iommu_ops smmu_ops = {
+ .init = smmu_init,
+ .host_stage2_idmap = smmu_host_stage2_idmap,
+};
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
new file mode 100644
index 000000000000..f6ad91d3fb85
--- /dev/null
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_ARM_SMMU_V3_H
+#define __KVM_ARM_SMMU_V3_H
+
+#include <asm/kvm_asm.h>
+
+struct hyp_arm_smmu_v3_device {
+};
+
+extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
+#define kvm_hyp_arm_smmu_v3_count kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count)
+
+extern struct hyp_arm_smmu_v3_device *kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_smmus);
+#define kvm_hyp_arm_smmu_v3_smmus kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_smmus)
+
+#endif /* __KVM_ARM_SMMU_V3_H */
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 14/28] iommu/arm-smmu-v3: Add KVM mode in the driver
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (12 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 13/28] iommu/arm-smmu-v3-kvm: Add SMMUv3 driver Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 15/28] iommu/arm-smmu-v3: Load the driver later in KVM mode Mostafa Saleh
` (13 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
Add a file only compiled for KVM mode.
At the moment it registers the driver with KVM, and add the hook
needed for memory allocation.
Next, it will create the array with available SMMUs and their
description.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
arch/arm64/include/asm/kvm_host.h | 4 +++
arch/arm64/kvm/iommu.c | 10 ++++--
drivers/iommu/arm/arm-smmu-v3/Makefile | 1 +
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c | 36 +++++++++++++++++++
4 files changed, 49 insertions(+), 2 deletions(-)
create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index fcb4b26072f7..52212c0f2e9c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1678,4 +1678,8 @@ struct kvm_iommu_ops;
int kvm_iommu_register_driver(struct kvm_iommu_ops *hyp_ops);
size_t kvm_iommu_pages(void);
+#ifdef CONFIG_ARM_SMMU_V3_PKVM
+size_t smmu_hyp_pgt_pages(void);
+#endif
+
#endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/iommu.c b/arch/arm64/kvm/iommu.c
index 5460b1bd44a6..0475f7c95c6c 100644
--- a/arch/arm64/kvm/iommu.c
+++ b/arch/arm64/kvm/iommu.c
@@ -17,10 +17,16 @@ int kvm_iommu_register_driver(struct kvm_iommu_ops *hyp_ops)
size_t kvm_iommu_pages(void)
{
+ size_t nr_pages = 0;
+
/*
* This is called very early during setup_arch() where no initcalls,
* so this has to call specific functions per each KVM driver.
*/
- kvm_nvhe_sym(hyp_kvm_iommu_pages) = 0;
- return 0;
+#ifdef CONFIG_ARM_SMMU_V3_PKVM
+ nr_pages = smmu_hyp_pgt_pages();
+#endif
+
+ kvm_nvhe_sym(hyp_kvm_iommu_pages) = nr_pages;
+ return nr_pages;
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index 1918b4a64cb0..284ad71c5282 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -4,5 +4,6 @@ arm_smmu_v3-y := arm-smmu-v3.o arm-smmu-v3-common-hyp.o
arm_smmu_v3-$(CONFIG_ARM_SMMU_V3_IOMMUFD) += arm-smmu-v3-iommufd.o
arm_smmu_v3-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
arm_smmu_v3-$(CONFIG_TEGRA241_CMDQV) += tegra241-cmdqv.o
+arm_smmu_v3-$(CONFIG_ARM_SMMU_V3_PKVM) += arm-smmu-v3-kvm.o
obj-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
new file mode 100644
index 000000000000..ac4eac6d567f
--- /dev/null
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * pKVM host driver for the Arm SMMUv3
+ *
+ * Copyright (C) 2022 Linaro Ltd.
+ */
+#include <asm/kvm_mmu.h>
+#include <asm/kvm_pkvm.h>
+
+#include <linux/of_platform.h>
+
+#include "arm-smmu-v3.h"
+#include "pkvm/arm_smmu_v3.h"
+
+extern struct kvm_iommu_ops kvm_nvhe_sym(smmu_ops);
+
+size_t smmu_hyp_pgt_pages(void)
+{
+ /*
+ * SMMUv3 uses the same format as stage-2 and hence have the same memory
+ * requirements, we add extra 500 pages for L2 ste.
+ */
+ if (of_find_compatible_node(NULL, NULL, "arm,smmu-v3"))
+ return host_s2_pgtable_pages() + 500;
+ return 0;
+}
+
+static int kvm_arm_smmu_v3_register(void)
+{
+ if (!is_protected_kvm_enabled())
+ return 0;
+
+ return kvm_iommu_register_driver(kern_hyp_va(lm_alias(&kvm_nvhe_sym(smmu_ops))));
+};
+
+core_initcall(kvm_arm_smmu_v3_register);
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 15/28] iommu/arm-smmu-v3: Load the driver later in KVM mode
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (13 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 14/28] iommu/arm-smmu-v3: Add KVM mode in the driver Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 16/28] iommu/arm-smmu-v3-kvm: Create array for hyp SMMUv3 Mostafa Saleh
` (12 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
While in KVM mode, the driver must be loaded after the hypervisor
initializes.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 25 ++++++++++++++++-----
1 file changed, 19 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 10ca07c6dbe9..a04730b5fe41 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -4576,12 +4576,6 @@ static const struct of_device_id arm_smmu_of_match[] = {
};
MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
-static void arm_smmu_driver_unregister(struct platform_driver *drv)
-{
- arm_smmu_sva_notifier_synchronize();
- platform_driver_unregister(drv);
-}
-
static struct platform_driver arm_smmu_driver = {
.driver = {
.name = "arm-smmu-v3",
@@ -4592,8 +4586,27 @@ static struct platform_driver arm_smmu_driver = {
.remove = arm_smmu_device_remove,
.shutdown = arm_smmu_device_shutdown,
};
+
+#ifndef CONFIG_ARM_SMMU_V3_PKVM
+static void arm_smmu_driver_unregister(struct platform_driver *drv)
+{
+ arm_smmu_sva_notifier_synchronize();
+ platform_driver_unregister(drv);
+}
+
module_driver(arm_smmu_driver, platform_driver_register,
arm_smmu_driver_unregister);
+#else
+/*
+ * Must be done after the hypervisor initializes at module_init()
+ * No need for unregister as this is a built in driver.
+ */
+static int arm_smmu_driver_register(void)
+{
+ return platform_driver_register(&arm_smmu_driver);
+}
+device_initcall_sync(arm_smmu_driver_register);
+#endif /* !CONFIG_ARM_SMMU_V3_PKVM */
MODULE_DESCRIPTION("IOMMU API for ARM architected SMMUv3 implementations");
MODULE_AUTHOR("Will Deacon <will@kernel.org>");
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 16/28] iommu/arm-smmu-v3-kvm: Create array for hyp SMMUv3
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (14 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 15/28] iommu/arm-smmu-v3: Load the driver later in KVM mode Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 17/28] iommu/arm-smmu-v3-kvm: Take over SMMUs Mostafa Saleh
` (11 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
As the hypervisor has no access to firmware tables, the device discovery
is done from the kernel, where it parses firmware tables and populates a
list of devices to the hypervisor, which later takes over.
At the moment only the device tree is supported.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c | 93 ++++++++++++++++++-
.../iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h | 13 +++
2 files changed, 105 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index ac4eac6d567f..27ea39c0fb1f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -7,6 +7,7 @@
#include <asm/kvm_mmu.h>
#include <asm/kvm_pkvm.h>
+#include <linux/of_address.h>
#include <linux/of_platform.h>
#include "arm-smmu-v3.h"
@@ -14,6 +15,75 @@
extern struct kvm_iommu_ops kvm_nvhe_sym(smmu_ops);
+static size_t kvm_arm_smmu_count;
+static struct hyp_arm_smmu_v3_device *kvm_arm_smmu_array;
+
+static void kvm_arm_smmu_array_free(void)
+{
+ int order;
+
+ order = get_order(kvm_arm_smmu_count * sizeof(*kvm_arm_smmu_array));
+ free_pages((unsigned long)kvm_arm_smmu_array, order);
+}
+
+/*
+ * The hypervisor have to know the basic information about the SMMUs
+ * from the firmware.
+ * This has to be done before the SMMUv3 probes and does anything meaningful
+ * with the hardware, otherwise it becomes harder to reason about the SMMU
+ * state and we'd require to hand-off the state to the hypervisor at certain point
+ * while devices are live, which is complicated and dangerous.
+ * Instead, the hypervisor is interested in a very small part of the probe path,
+ * so just add a separate logic for it.
+ */
+static int kvm_arm_smmu_array_alloc(void)
+{
+ int smmu_order;
+ struct device_node *np;
+ int ret;
+ int i = 0;
+
+ kvm_arm_smmu_count = 0;
+ for_each_compatible_node(np, NULL, "arm,smmu-v3")
+ kvm_arm_smmu_count++;
+
+ if (!kvm_arm_smmu_count)
+ return -ENODEV;
+
+ smmu_order = get_order(kvm_arm_smmu_count * sizeof(*kvm_arm_smmu_array));
+ kvm_arm_smmu_array = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, smmu_order);
+ if (!kvm_arm_smmu_array)
+ return -ENOMEM;
+
+ /* Basic device tree parsing. */
+ for_each_compatible_node(np, NULL, "arm,smmu-v3") {
+ struct resource res;
+
+ ret = of_address_to_resource(np, 0, &res);
+ if (ret)
+ goto out_err;
+ kvm_arm_smmu_array[i].mmio_addr = res.start;
+ kvm_arm_smmu_array[i].mmio_size = resource_size(&res);
+ if (kvm_arm_smmu_array[i].mmio_size < SZ_128K) {
+ pr_err("SMMUv3(%s) has unsupported size(0x%lx)\n", np->name,
+ kvm_arm_smmu_array[i].mmio_size);
+ ret = -EINVAL;
+ goto out_err;
+ }
+
+ if (of_dma_is_coherent(np))
+ kvm_arm_smmu_array[i].features |= ARM_SMMU_FEAT_COHERENCY;
+
+ i++;
+ }
+
+ return 0;
+
+out_err:
+ kvm_arm_smmu_array_free();
+ return ret;
+}
+
size_t smmu_hyp_pgt_pages(void)
{
/*
@@ -27,10 +97,31 @@ size_t smmu_hyp_pgt_pages(void)
static int kvm_arm_smmu_v3_register(void)
{
+ int ret;
+
if (!is_protected_kvm_enabled())
return 0;
- return kvm_iommu_register_driver(kern_hyp_va(lm_alias(&kvm_nvhe_sym(smmu_ops))));
+ ret = kvm_arm_smmu_array_alloc();
+ if (ret)
+ return ret;
+
+ ret = kvm_iommu_register_driver(kern_hyp_va(lm_alias(&kvm_nvhe_sym(smmu_ops))));
+ if (ret)
+ goto out_err;
+
+ /*
+ * These variables are stored in the nVHE image, and won't be accessible
+ * after KVM initialization. Ownership of kvm_arm_smmu_array will be
+ * transferred to the hypervisor as well.
+ */
+ kvm_hyp_arm_smmu_v3_smmus = kvm_arm_smmu_array;
+ kvm_hyp_arm_smmu_v3_count = kvm_arm_smmu_count;
+ return ret;
+
+out_err:
+ kvm_arm_smmu_array_free();
+ return ret;
};
core_initcall(kvm_arm_smmu_v3_register);
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
index f6ad91d3fb85..744ee2b7f0b4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
@@ -4,7 +4,20 @@
#include <asm/kvm_asm.h>
+/*
+ * Parameters from the trusted host:
+ * @mmio_addr base address of the SMMU registers
+ * @mmio_size size of the registers resource
+ * @features Features of SMMUv3, subset of the main driver
+ *
+ * Other members are filled and used at runtime by the SMMU driver.
+ * @base Virtual address of SMMU registers
+ */
struct hyp_arm_smmu_v3_device {
+ phys_addr_t mmio_addr;
+ size_t mmio_size;
+ void __iomem *base;
+ u32 features;
};
extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 17/28] iommu/arm-smmu-v3-kvm: Take over SMMUs
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (15 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 16/28] iommu/arm-smmu-v3-kvm: Create array for hyp SMMUv3 Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 18/28] iommu/arm-smmu-v3-kvm: Probe SMMU HW Mostafa Saleh
` (10 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
Donate the array with SMMU description to the hypervisor as it
can't be changed by the host after de-privileges.
Also, donate the SMMU resources to the hypervisor.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
.../iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c | 81 ++++++++++++++++++-
1 file changed, 80 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
index fa8b71152560..b56feae81dda 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
@@ -7,15 +7,94 @@
#include <asm/kvm_hyp.h>
#include <nvhe/iommu.h>
+#include <nvhe/mem_protect.h>
#include "arm_smmu_v3.h"
size_t __ro_after_init kvm_hyp_arm_smmu_v3_count;
struct hyp_arm_smmu_v3_device *kvm_hyp_arm_smmu_v3_smmus;
+#define for_each_smmu(smmu) \
+ for ((smmu) = kvm_hyp_arm_smmu_v3_smmus; \
+ (smmu) != &kvm_hyp_arm_smmu_v3_smmus[kvm_hyp_arm_smmu_v3_count]; \
+ (smmu)++)
+
+/* Transfer ownership of memory */
+static int smmu_take_pages(u64 phys, size_t size)
+{
+ WARN_ON(!PAGE_ALIGNED(phys) || !PAGE_ALIGNED(size));
+ return __pkvm_host_donate_hyp(phys >> PAGE_SHIFT, size >> PAGE_SHIFT);
+}
+
+static void smmu_reclaim_pages(u64 phys, size_t size)
+{
+ WARN_ON(!PAGE_ALIGNED(phys) || !PAGE_ALIGNED(size));
+ WARN_ON(__pkvm_hyp_donate_host(phys >> PAGE_SHIFT, size >> PAGE_SHIFT));
+}
+
+/* Put the device in a state that can be probed by the host driver. */
+static void smmu_deinit_device(struct hyp_arm_smmu_v3_device *smmu)
+{
+ int i;
+ size_t nr_pages = smmu->mmio_size >> PAGE_SHIFT;
+
+ for (i = 0 ; i < nr_pages ; ++i) {
+ u64 pfn = (smmu->mmio_addr >> PAGE_SHIFT) + i;
+
+ WARN_ON(__pkvm_hyp_donate_host_mmio(pfn));
+ }
+}
+
+static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
+{
+ int i;
+ size_t nr_pages;
+
+ if (!PAGE_ALIGNED(smmu->mmio_addr | smmu->mmio_size))
+ return -EINVAL;
+
+ nr_pages = smmu->mmio_size >> PAGE_SHIFT;
+ for (i = 0 ; i < nr_pages ; ++i) {
+ u64 pfn = (smmu->mmio_addr >> PAGE_SHIFT) + i;
+
+ /*
+ * This should never happen, so it's fine to be strict to avoid
+ * complicated error handling.
+ */
+ WARN_ON(__pkvm_host_donate_hyp_mmio(pfn));
+ }
+ smmu->base = hyp_phys_to_virt(smmu->mmio_addr);
+
+ return 0;
+}
+
static int smmu_init(void)
{
- return -ENOSYS;
+ int ret;
+ struct hyp_arm_smmu_v3_device *smmu;
+ size_t smmu_arr_size = PAGE_ALIGN(sizeof(*kvm_hyp_arm_smmu_v3_smmus) *
+ kvm_hyp_arm_smmu_v3_count);
+
+ kvm_hyp_arm_smmu_v3_smmus = kern_hyp_va(kvm_hyp_arm_smmu_v3_smmus);
+ ret = smmu_take_pages(hyp_virt_to_phys(kvm_hyp_arm_smmu_v3_smmus),
+ smmu_arr_size);
+ if (ret)
+ return ret;
+
+ for_each_smmu(smmu) {
+ ret = smmu_init_device(smmu);
+ if (ret)
+ goto out_reclaim_smmu;
+ }
+
+ return 0;
+
+out_reclaim_smmu:
+ while (smmu != kvm_hyp_arm_smmu_v3_smmus)
+ smmu_deinit_device(--smmu);
+ smmu_reclaim_pages(hyp_virt_to_phys(kvm_hyp_arm_smmu_v3_smmus),
+ smmu_arr_size);
+ return ret;
}
static void smmu_host_stage2_idmap(phys_addr_t start, phys_addr_t end, int prot)
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 18/28] iommu/arm-smmu-v3-kvm: Probe SMMU HW
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (16 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 17/28] iommu/arm-smmu-v3-kvm: Take over SMMUs Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 19/28] iommu/arm-smmu-v3-kvm: Add MMIO emulation Mostafa Saleh
` (9 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
Probe SMMU features from the IDR register space, most of
the logic is common with the kernel.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
.../iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c | 57 ++++++++++++++++++-
.../iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h | 8 +++
3 files changed, 64 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 8ffcc2e32474..f0e1feee8a49 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -48,6 +48,7 @@ struct arm_smmu_device;
#define IDR0_S2P (1 << 0)
#define ARM_SMMU_IDR1 0x4
+#define IDR1_ECMDQ (1 << 31)
#define IDR1_TABLES_PRESET (1 << 30)
#define IDR1_QUEUES_PRESET (1 << 29)
#define IDR1_REL (1 << 28)
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
index b56feae81dda..e45b4e50b1e4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
@@ -10,6 +10,7 @@
#include <nvhe/mem_protect.h>
#include "arm_smmu_v3.h"
+#include "../arm-smmu-v3.h"
size_t __ro_after_init kvm_hyp_arm_smmu_v3_count;
struct hyp_arm_smmu_v3_device *kvm_hyp_arm_smmu_v3_smmus;
@@ -45,9 +46,56 @@ static void smmu_deinit_device(struct hyp_arm_smmu_v3_device *smmu)
}
}
+/*
+ * Mini-probe and validation for the hypervisor.
+ */
+static int smmu_probe(struct hyp_arm_smmu_v3_device *smmu)
+{
+ u32 reg;
+
+ /* IDR0 */
+ reg = readl_relaxed(smmu->base + ARM_SMMU_IDR0);
+ smmu->features = smmu_idr0_features(reg);
+
+ /*
+ * Some MMU600 and MMU700 have errata that prevent them from using nesting,
+ * not sure how can we identify those, so it's recommended not to enable this
+ * drivers on such systems.
+ * And preventing any of those will be too restrictive
+ */
+ if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1) ||
+ !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
+ return -ENXIO;
+
+ reg = readl_relaxed(smmu->base + ARM_SMMU_IDR1);
+ if (reg & (IDR1_TABLES_PRESET | IDR1_QUEUES_PRESET | IDR1_REL | IDR1_ECMDQ))
+ return -EINVAL;
+
+ smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg);
+ /* Follows the kernel logic */
+ if (smmu->sid_bits <= STRTAB_SPLIT)
+ smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB;
+
+ reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
+ smmu->features |= smmu_idr3_features(reg);
+
+ reg = readl_relaxed(smmu->base + ARM_SMMU_IDR5);
+ smmu->pgsize_bitmap = smmu_idr5_to_pgsize(reg);
+
+ smmu->oas = smmu_idr5_to_oas(reg);
+ if (smmu->oas == 52)
+ smmu->pgsize_bitmap |= 1ULL << 42;
+ else if (!smmu->oas)
+ smmu->oas = 48;
+
+ smmu->ias = 64;
+ smmu->ias = min(smmu->ias, smmu->oas);
+ return 0;
+}
+
static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
{
- int i;
+ int i, ret;
size_t nr_pages;
if (!PAGE_ALIGNED(smmu->mmio_addr | smmu->mmio_size))
@@ -64,8 +112,13 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
WARN_ON(__pkvm_host_donate_hyp_mmio(pfn));
}
smmu->base = hyp_phys_to_virt(smmu->mmio_addr);
-
+ ret = smmu_probe(smmu);
+ if (ret)
+ goto out_ret;
return 0;
+out_ret:
+ smmu_deinit_device(smmu);
+ return ret;
}
static int smmu_init(void)
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
index 744ee2b7f0b4..3550fa695539 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
@@ -12,12 +12,20 @@
*
* Other members are filled and used at runtime by the SMMU driver.
* @base Virtual address of SMMU registers
+ * @ias IPA size
+ * @oas PA size
+ * @pgsize_bitmap Supported page sizes
+ * @sid_bits Max number of SID bits supported
*/
struct hyp_arm_smmu_v3_device {
phys_addr_t mmio_addr;
size_t mmio_size;
void __iomem *base;
u32 features;
+ unsigned long ias;
+ unsigned long oas;
+ unsigned long pgsize_bitmap;
+ unsigned int sid_bits;
};
extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 19/28] iommu/arm-smmu-v3-kvm: Add MMIO emulation
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (17 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 18/28] iommu/arm-smmu-v3-kvm: Probe SMMU HW Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 20/28] iommu/arm-smmu-v3-kvm: Shadow the command queue Mostafa Saleh
` (8 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
At the moment most registers are just passthrough, then in the next
patches CMDQ/STE emulation will be added which inserts logic to some
register access.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
.../iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c | 125 ++++++++++++++++++
.../iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h | 10 ++
2 files changed, 135 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
index e45b4e50b1e4..32f199aeec9e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
@@ -8,6 +8,7 @@
#include <nvhe/iommu.h>
#include <nvhe/mem_protect.h>
+#include <nvhe/trap_handler.h>
#include "arm_smmu_v3.h"
#include "../arm-smmu-v3.h"
@@ -140,6 +141,8 @@ static int smmu_init(void)
goto out_reclaim_smmu;
}
+ BUILD_BUG_ON(sizeof(hyp_spinlock_t) != sizeof(u32));
+
return 0;
out_reclaim_smmu:
@@ -150,6 +153,127 @@ static int smmu_init(void)
return ret;
}
+static bool smmu_dabt_device(struct hyp_arm_smmu_v3_device *smmu,
+ struct user_pt_regs *regs,
+ u64 esr, u32 off)
+{
+ bool is_write = esr & ESR_ELx_WNR;
+ unsigned int len = BIT((esr & ESR_ELx_SAS) >> ESR_ELx_SAS_SHIFT);
+ int rd = (esr & ESR_ELx_SRT_MASK) >> ESR_ELx_SRT_SHIFT;
+ const u64 read_write = -1ULL;
+ const u64 no_access = 0;
+ u64 mask = no_access;
+ const u64 read_only = is_write ? no_access : read_write;
+ u64 val = regs->regs[rd];
+
+ switch (off) {
+ case ARM_SMMU_IDR0:
+ /* Clear stage-2 support, hide MSI to avoid write back to cmdq */
+ mask = read_only & ~(IDR0_S2P | IDR0_VMID16 | IDR0_MSI | IDR0_HYP);
+ WARN_ON(len != sizeof(u32));
+ break;
+ /* Passthrough the register access for bisectiblity, handled later */
+ case ARM_SMMU_CMDQ_BASE:
+ case ARM_SMMU_CMDQ_PROD:
+ case ARM_SMMU_CMDQ_CONS:
+ case ARM_SMMU_STRTAB_BASE:
+ case ARM_SMMU_STRTAB_BASE_CFG:
+ case ARM_SMMU_GBPA:
+ mask = read_write;
+ break;
+ case ARM_SMMU_CR0:
+ mask = read_write;
+ WARN_ON(len != sizeof(u32));
+ break;
+ case ARM_SMMU_CR1: {
+ /* Based on Linux implementation */
+ u64 cr2_template = FIELD_PREP(CR1_TABLE_SH, ARM_SMMU_SH_ISH) |
+ FIELD_PREP(CR1_TABLE_OC, CR1_CACHE_WB) |
+ FIELD_PREP(CR1_TABLE_IC, CR1_CACHE_WB) |
+ FIELD_PREP(CR1_QUEUE_SH, ARM_SMMU_SH_ISH) |
+ FIELD_PREP(CR1_QUEUE_OC, CR1_CACHE_WB) |
+ FIELD_PREP(CR1_QUEUE_IC, CR1_CACHE_WB);
+ /* Don't mess with shareability/cacheability. */
+ if (is_write)
+ WARN_ON(val != cr2_template);
+ mask = read_write;
+ WARN_ON(len != sizeof(u32));
+ break;
+ }
+ /*
+ * These should be safe, just enforce RO or RW and size according to architecture.
+ * There are some other registers that are not used by Linux as IDR2, IDR4
+ * that won't be allowed.
+ */
+ case ARM_SMMU_EVTQ_PROD + SZ_64K:
+ case ARM_SMMU_EVTQ_CONS + SZ_64K:
+ case ARM_SMMU_EVTQ_IRQ_CFG1:
+ case ARM_SMMU_EVTQ_IRQ_CFG2:
+ case ARM_SMMU_PRIQ_PROD + SZ_64K:
+ case ARM_SMMU_PRIQ_CONS + SZ_64K:
+ case ARM_SMMU_PRIQ_IRQ_CFG1:
+ case ARM_SMMU_PRIQ_IRQ_CFG2:
+ case ARM_SMMU_GERRORN:
+ case ARM_SMMU_GERROR_IRQ_CFG1:
+ case ARM_SMMU_GERROR_IRQ_CFG2:
+ case ARM_SMMU_IRQ_CTRLACK:
+ case ARM_SMMU_IRQ_CTRL:
+ case ARM_SMMU_CR0ACK:
+ case ARM_SMMU_CR2:
+ /* These are 32 bit registers. */
+ WARN_ON(len != sizeof(u32));
+ fallthrough;
+ case ARM_SMMU_EVTQ_BASE:
+ case ARM_SMMU_EVTQ_IRQ_CFG0:
+ case ARM_SMMU_PRIQ_BASE:
+ case ARM_SMMU_PRIQ_IRQ_CFG0:
+ case ARM_SMMU_GERROR_IRQ_CFG0:
+ mask = read_write;
+ break;
+ case ARM_SMMU_IIDR:
+ case ARM_SMMU_IDR5:
+ case ARM_SMMU_IDR3:
+ case ARM_SMMU_IDR1:
+ case ARM_SMMU_GERROR:
+ WARN_ON(len != sizeof(u32));
+ mask = read_only;
+ };
+
+ if (WARN_ON(!mask))
+ goto out_ret;
+
+ if (is_write) {
+ if (len == sizeof(u64))
+ writeq_relaxed(regs->regs[rd] & mask, smmu->base + off);
+ else
+ writel_relaxed(regs->regs[rd] & mask, smmu->base + off);
+ } else {
+ if (len == sizeof(u64))
+ regs->regs[rd] = readq_relaxed(smmu->base + off) & mask;
+ else
+ regs->regs[rd] = readl_relaxed(smmu->base + off) & mask;
+ }
+
+out_ret:
+ return true;
+}
+
+static bool smmu_dabt_handler(struct user_pt_regs *regs, u64 esr, u64 addr)
+{
+ struct hyp_arm_smmu_v3_device *smmu;
+ bool ret;
+
+ for_each_smmu(smmu) {
+ if (addr < smmu->mmio_addr || addr >= smmu->mmio_addr + smmu->mmio_size)
+ continue;
+ hyp_spin_lock(&smmu->lock);
+ ret = smmu_dabt_device(smmu, regs, esr, addr - smmu->mmio_addr);
+ hyp_spin_unlock(&smmu->lock);
+ return ret;
+ }
+ return false;
+}
+
static void smmu_host_stage2_idmap(phys_addr_t start, phys_addr_t end, int prot)
{
}
@@ -158,4 +282,5 @@ static void smmu_host_stage2_idmap(phys_addr_t start, phys_addr_t end, int prot)
struct kvm_iommu_ops smmu_ops = {
.init = smmu_init,
.host_stage2_idmap = smmu_host_stage2_idmap,
+ .dabt_handler = smmu_dabt_handler,
};
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
index 3550fa695539..dfeaed728982 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
@@ -4,6 +4,10 @@
#include <asm/kvm_asm.h>
+#ifdef __KVM_NVHE_HYPERVISOR__
+#include <nvhe/spinlock.h>
+#endif
+
/*
* Parameters from the trusted host:
* @mmio_addr base address of the SMMU registers
@@ -16,6 +20,7 @@
* @oas PA size
* @pgsize_bitmap Supported page sizes
* @sid_bits Max number of SID bits supported
+ * @lock Lock to protect SMMU
*/
struct hyp_arm_smmu_v3_device {
phys_addr_t mmio_addr;
@@ -26,6 +31,11 @@ struct hyp_arm_smmu_v3_device {
unsigned long oas;
unsigned long pgsize_bitmap;
unsigned int sid_bits;
+#ifdef __KVM_NVHE_HYPERVISOR__
+ hyp_spinlock_t lock;
+#else
+ u32 lock;
+#endif
};
extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 20/28] iommu/arm-smmu-v3-kvm: Shadow the command queue
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (18 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 19/28] iommu/arm-smmu-v3-kvm: Add MMIO emulation Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 21/28] iommu/arm-smmu-v3-kvm: Add CMDQ functions Mostafa Saleh
` (7 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
At boot allocate a command queue per SMMU which is used as a shadow
by the hypervisor.
The command queue size is 64K which is more than enough, as the
hypervisor would consume all the entries per a command queue prod
write, which means it can handle up to 4096 at a time.
Then, the host command queue needs to be pinned in a shared state, so
it can't be donated to VMs, and avoid tricking the hypervisor into
accessing them. This is done each time the command queue is enabled,
and undone each time the command queue is disabled.
The hypervisor won’t access the host command queue when it is disabled
from the host.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c | 20 ++++
.../iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c | 108 +++++++++++++++++-
.../iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h | 8 ++
3 files changed, 135 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 27ea39c0fb1f..86e6c68aad4e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -13,6 +13,8 @@
#include "arm-smmu-v3.h"
#include "pkvm/arm_smmu_v3.h"
+#define SMMU_KVM_CMDQ_ORDER 4
+
extern struct kvm_iommu_ops kvm_nvhe_sym(smmu_ops);
static size_t kvm_arm_smmu_count;
@@ -58,6 +60,7 @@ static int kvm_arm_smmu_array_alloc(void)
/* Basic device tree parsing. */
for_each_compatible_node(np, NULL, "arm,smmu-v3") {
struct resource res;
+ void *cmdq_base;
ret = of_address_to_resource(np, 0, &res);
if (ret)
@@ -74,6 +77,23 @@ static int kvm_arm_smmu_array_alloc(void)
if (of_dma_is_coherent(np))
kvm_arm_smmu_array[i].features |= ARM_SMMU_FEAT_COHERENCY;
+ /*
+ * Allocate shadow for the command queue, it doesn't have to be the same
+ * size as the host.
+ * Only populate base_dma and llq.max_n_shift, the hypervisor will init
+ * the rest.
+ * We don't what size the host will choose at this point, the shadow copy
+ * will 64K which is a reasonable size.
+ */
+ cmdq_base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, SMMU_KVM_CMDQ_ORDER);
+ if (!cmdq_base) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ kvm_arm_smmu_array[i].cmdq.base_dma = virt_to_phys(cmdq_base);
+ kvm_arm_smmu_array[i].cmdq.llq.max_n_shift = SMMU_KVM_CMDQ_ORDER + PAGE_SHIFT -
+ CMDQ_ENT_SZ_SHIFT;
i++;
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
index 32f199aeec9e..d3ab4b814be4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
@@ -11,7 +11,6 @@
#include <nvhe/trap_handler.h>
#include "arm_smmu_v3.h"
-#include "../arm-smmu-v3.h"
size_t __ro_after_init kvm_hyp_arm_smmu_v3_count;
struct hyp_arm_smmu_v3_device *kvm_hyp_arm_smmu_v3_smmus;
@@ -34,6 +33,35 @@ static void smmu_reclaim_pages(u64 phys, size_t size)
WARN_ON(__pkvm_hyp_donate_host(phys >> PAGE_SHIFT, size >> PAGE_SHIFT));
}
+/*
+ * CMDQ, STE host copies are accessed by the hypervisor, we share them to
+ * - Prevent the host from passing protected VM memory.
+ * - Having them mapped in the hyp page table.
+ */
+static int smmu_share_pages(phys_addr_t addr, size_t size)
+{
+ int i;
+ size_t nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+ for (i = 0 ; i < nr_pages ; ++i)
+ WARN_ON(__pkvm_host_share_hyp((addr + i * PAGE_SIZE) >> PAGE_SHIFT));
+
+ return hyp_pin_shared_mem(hyp_phys_to_virt(addr), hyp_phys_to_virt(addr + size));
+}
+
+static int smmu_unshare_pages(phys_addr_t addr, size_t size)
+{
+ int i;
+ size_t nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+ hyp_unpin_shared_mem(hyp_phys_to_virt(addr), hyp_phys_to_virt(addr + size));
+
+ for (i = 0 ; i < nr_pages ; ++i)
+ WARN_ON(__pkvm_host_unshare_hyp((addr + i * PAGE_SIZE) >> PAGE_SHIFT));
+
+ return 0;
+}
+
/* Put the device in a state that can be probed by the host driver. */
static void smmu_deinit_device(struct hyp_arm_smmu_v3_device *smmu)
{
@@ -94,6 +122,41 @@ static int smmu_probe(struct hyp_arm_smmu_v3_device *smmu)
return 0;
}
+/*
+ * The kernel part of the driver will allocate the shadow cmdq,
+ * which is different from the one used by the driver.
+ * This function only donates it.
+ */
+static int smmu_init_cmdq(struct hyp_arm_smmu_v3_device *smmu)
+{
+ size_t cmdq_size;
+ int ret;
+ enum kvm_pgtable_prot prot = PAGE_HYP;
+
+ cmdq_size = (1 << (smmu->cmdq.llq.max_n_shift)) *
+ CMDQ_ENT_DWORDS * 8;
+
+ if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
+ prot |= KVM_PGTABLE_PROT_NORMAL_NC;
+
+ ret = ___pkvm_host_donate_hyp(smmu->cmdq.base_dma >> PAGE_SHIFT,
+ PAGE_ALIGN(cmdq_size) >> PAGE_SHIFT, prot);
+ if (ret)
+ return ret;
+
+ smmu->cmdq.base = hyp_phys_to_virt(smmu->cmdq.base_dma);
+ smmu->cmdq.prod_reg = smmu->base + ARM_SMMU_CMDQ_PROD;
+ smmu->cmdq.cons_reg = smmu->base + ARM_SMMU_CMDQ_CONS;
+ smmu->cmdq.q_base = smmu->cmdq.base_dma |
+ FIELD_PREP(Q_BASE_LOG2SIZE, smmu->cmdq.llq.max_n_shift);
+ smmu->cmdq.ent_dwords = CMDQ_ENT_DWORDS;
+ memset(smmu->cmdq.base, 0, cmdq_size);
+ writel_relaxed(0, smmu->cmdq.prod_reg);
+ writel_relaxed(0, smmu->cmdq.cons_reg);
+ writeq_relaxed(smmu->cmdq.q_base, smmu->base + ARM_SMMU_CMDQ_BASE);
+ return 0;
+}
+
static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
{
int i, ret;
@@ -116,7 +179,13 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
ret = smmu_probe(smmu);
if (ret)
goto out_ret;
+
+ ret = smmu_init_cmdq(smmu);
+ if (ret)
+ goto out_ret;
+
return 0;
+
out_ret:
smmu_deinit_device(smmu);
return ret;
@@ -153,6 +222,27 @@ static int smmu_init(void)
return ret;
}
+static bool is_cmdq_enabled(struct hyp_arm_smmu_v3_device *smmu)
+{
+ return FIELD_GET(CR0_CMDQEN, smmu->cr0);
+}
+
+static void smmu_emulate_cmdq_enable(struct hyp_arm_smmu_v3_device *smmu)
+{
+ size_t cmdq_size;
+
+ smmu->cmdq_host.llq.max_n_shift = smmu->cmdq_host.q_base & Q_BASE_LOG2SIZE;
+ cmdq_size = (1 << smmu->cmdq_host.llq.max_n_shift) * CMDQ_ENT_DWORDS * 8;
+ WARN_ON(smmu_share_pages(smmu->cmdq_host.q_base & Q_BASE_ADDR_MASK, cmdq_size));
+}
+
+static void smmu_emulate_cmdq_disable(struct hyp_arm_smmu_v3_device *smmu)
+{
+ size_t cmdq_size = cmdq_size = (1 << smmu->cmdq_host.llq.max_n_shift) * CMDQ_ENT_DWORDS * 8;
+
+ WARN_ON(smmu_unshare_pages(smmu->cmdq_host.q_base & Q_BASE_ADDR_MASK, cmdq_size));
+}
+
static bool smmu_dabt_device(struct hyp_arm_smmu_v3_device *smmu,
struct user_pt_regs *regs,
u64 esr, u32 off)
@@ -174,6 +264,13 @@ static bool smmu_dabt_device(struct hyp_arm_smmu_v3_device *smmu,
break;
/* Passthrough the register access for bisectiblity, handled later */
case ARM_SMMU_CMDQ_BASE:
+
+ /* Not allowed by the architecture */
+ WARN_ON(is_cmdq_enabled(smmu));
+ if (is_write)
+ smmu->cmdq_host.q_base = val;
+ mask = read_write;
+ break;
case ARM_SMMU_CMDQ_PROD:
case ARM_SMMU_CMDQ_CONS:
case ARM_SMMU_STRTAB_BASE:
@@ -182,6 +279,15 @@ static bool smmu_dabt_device(struct hyp_arm_smmu_v3_device *smmu,
mask = read_write;
break;
case ARM_SMMU_CR0:
+ if (is_write) {
+ bool last_cmdq_en = is_cmdq_enabled(smmu);
+
+ smmu->cr0 = val;
+ if (!last_cmdq_en && is_cmdq_enabled(smmu))
+ smmu_emulate_cmdq_enable(smmu);
+ else if (last_cmdq_en && !is_cmdq_enabled(smmu))
+ smmu_emulate_cmdq_disable(smmu);
+ }
mask = read_write;
WARN_ON(len != sizeof(u32));
break;
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
index dfeaed728982..330da53f80d0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
@@ -8,6 +8,8 @@
#include <nvhe/spinlock.h>
#endif
+#include "../arm-smmu-v3.h"
+
/*
* Parameters from the trusted host:
* @mmio_addr base address of the SMMU registers
@@ -21,6 +23,9 @@
* @pgsize_bitmap Supported page sizes
* @sid_bits Max number of SID bits supported
* @lock Lock to protect SMMU
+ * @cmdq CMDQ as observed by HW
+ * @cmdq_host Host view of the command queue
+ * @cr0 Last value of CR0
*/
struct hyp_arm_smmu_v3_device {
phys_addr_t mmio_addr;
@@ -36,6 +41,9 @@ struct hyp_arm_smmu_v3_device {
#else
u32 lock;
#endif
+ struct arm_smmu_queue cmdq;
+ struct arm_smmu_queue cmdq_host;
+ u32 cr0;
};
extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 21/28] iommu/arm-smmu-v3-kvm: Add CMDQ functions
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (19 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 20/28] iommu/arm-smmu-v3-kvm: Shadow the command queue Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 22/28] iommu/arm-smmu-v3-kvm: Emulate CMDQ for host Mostafa Saleh
` (6 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
Add functions to access the command queue, there are 2 main usage:
- Hypervisor's own commands, as TLB invalidation, would use functions
as smmu_send_cmd(), which creates and sends a command.
- Add host commands to the shadow command queue, after being filtered,
these will be added with smmu_add_cmd_raw.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
.../iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c | 98 +++++++++++++++++++
1 file changed, 98 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
index d3ab4b814be4..554229e466f3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
@@ -20,6 +20,33 @@ struct hyp_arm_smmu_v3_device *kvm_hyp_arm_smmu_v3_smmus;
(smmu) != &kvm_hyp_arm_smmu_v3_smmus[kvm_hyp_arm_smmu_v3_count]; \
(smmu)++)
+/*
+ * Wait until @cond is true.
+ * Return 0 on success, or -ETIMEDOUT
+ */
+#define smmu_wait(_cond) \
+({ \
+ int __ret = 0; \
+ u64 delay = pkvm_time_get() + ARM_SMMU_POLL_TIMEOUT_US; \
+ \
+ while (!(_cond)) { \
+ if (pkvm_time_get() >= delay) { \
+ __ret = -ETIMEDOUT; \
+ break; \
+ } \
+ } \
+ __ret; \
+})
+
+#define smmu_wait_event(_smmu, _cond) \
+({ \
+ if ((_smmu)->features & ARM_SMMU_FEAT_SEV) { \
+ while (!(_cond)) \
+ wfe(); \
+ } \
+ smmu_wait(_cond); \
+})
+
/* Transfer ownership of memory */
static int smmu_take_pages(u64 phys, size_t size)
{
@@ -62,6 +89,77 @@ static int smmu_unshare_pages(phys_addr_t addr, size_t size)
return 0;
}
+static bool smmu_cmdq_full(struct arm_smmu_queue *cmdq)
+{
+ struct arm_smmu_ll_queue *llq = &cmdq->llq;
+
+ WRITE_ONCE(llq->cons, readl_relaxed(cmdq->cons_reg));
+ return queue_full(llq);
+}
+
+static bool smmu_cmdq_empty(struct arm_smmu_queue *cmdq)
+{
+ struct arm_smmu_ll_queue *llq = &cmdq->llq;
+
+ WRITE_ONCE(llq->cons, readl_relaxed(cmdq->cons_reg));
+ return queue_empty(llq);
+}
+
+static void smmu_add_cmd_raw(struct hyp_arm_smmu_v3_device *smmu,
+ u64 *cmd)
+{
+ struct arm_smmu_queue *q = &smmu->cmdq;
+ struct arm_smmu_ll_queue *llq = &q->llq;
+
+ queue_write(Q_ENT(q, llq->prod), cmd, CMDQ_ENT_DWORDS);
+ llq->prod = queue_inc_prod_n(llq, 1);
+ writel_relaxed(llq->prod, q->prod_reg);
+}
+
+static int smmu_add_cmd(struct hyp_arm_smmu_v3_device *smmu,
+ struct arm_smmu_cmdq_ent *ent)
+{
+ int ret;
+ u64 cmd[CMDQ_ENT_DWORDS];
+
+ ret = smmu_wait_event(smmu, !smmu_cmdq_full(&smmu->cmdq));
+ if (ret)
+ return ret;
+
+ ret = arm_smmu_cmdq_build_cmd(cmd, ent);
+ if (ret)
+ return ret;
+
+ smmu_add_cmd_raw(smmu, cmd);
+ return 0;
+}
+
+static int smmu_sync_cmd(struct hyp_arm_smmu_v3_device *smmu)
+{
+ int ret;
+ struct arm_smmu_cmdq_ent cmd = {
+ .opcode = CMDQ_OP_CMD_SYNC,
+ };
+
+ ret = smmu_add_cmd(smmu, &cmd);
+ if (ret)
+ return ret;
+
+ return smmu_wait_event(smmu, smmu_cmdq_empty(&smmu->cmdq));
+}
+
+__maybe_unused
+static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
+ struct arm_smmu_cmdq_ent *cmd)
+{
+ int ret = smmu_add_cmd(smmu, cmd);
+
+ if (ret)
+ return ret;
+
+ return smmu_sync_cmd(smmu);
+}
+
/* Put the device in a state that can be probed by the host driver. */
static void smmu_deinit_device(struct hyp_arm_smmu_v3_device *smmu)
{
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 22/28] iommu/arm-smmu-v3-kvm: Emulate CMDQ for host
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (20 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 21/28] iommu/arm-smmu-v3-kvm: Add CMDQ functions Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 23/28] iommu/arm-smmu-v3-kvm: Shadow stream table Mostafa Saleh
` (5 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
Don’t allow access to the command queue from the host:
- ARM_SMMU_CMDQ_BASE: Only allowed to be written when CMDQ is disabled, we
use it to keep track of the host command queue base.
Reads return the saved value.
- ARM_SMMU_CMDQ_PROD: Writes trigger command queue emulation which sanitises
and filters the whole range. Reads returns the host copy.
- ARM_SMMU_CMDQ_CONS: Writes move the sw copy of the cons, but the host can’t
skip commands once submitted. Reads return the emulated value and the error
bits in the actual cons.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
.../iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c | 108 +++++++++++++++++-
1 file changed, 105 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
index 554229e466f3..10c6461bbf12 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
@@ -325,6 +325,88 @@ static bool is_cmdq_enabled(struct hyp_arm_smmu_v3_device *smmu)
return FIELD_GET(CR0_CMDQEN, smmu->cr0);
}
+static bool smmu_filter_command(struct hyp_arm_smmu_v3_device *smmu, u64 *command)
+{
+ u64 type = FIELD_GET(CMDQ_0_OP, command[0]);
+
+ switch (type) {
+ case CMDQ_OP_CFGI_STE:
+ /* TBD: SHADOW_STE*/
+ break;
+ case CMDQ_OP_CFGI_ALL:
+ {
+ /*
+ * Linux doesn't use range STE invalidation, and only use this
+ * for CFGI_ALL, which is done on reset and not on an new STE
+ * being used.
+ * Although, this is not architectural we rely on the current Linux
+ * implementation.
+ */
+ WARN_ON((FIELD_GET(CMDQ_CFGI_1_RANGE, command[1]) != 31));
+ break;
+ }
+ case CMDQ_OP_TLBI_NH_ASID:
+ case CMDQ_OP_TLBI_NH_VA:
+ case 0x13: /* CMD_TLBI_NH_VAA: Not used by Linux */
+ {
+ /* Only allow VMID = 0*/
+ if (FIELD_GET(CMDQ_TLBI_0_VMID, command[0]) == 0)
+ break;
+ break;
+ }
+ case 0x10: /* CMD_TLBI_NH_ALL: Not used by Linux */
+ case CMDQ_OP_TLBI_EL2_ALL:
+ case CMDQ_OP_TLBI_EL2_VA:
+ case CMDQ_OP_TLBI_EL2_ASID:
+ case CMDQ_OP_TLBI_S12_VMALL:
+ case 0x23: /* CMD_TLBI_EL2_VAA: Not used by Linux */
+ /* Malicous host */
+ return WARN_ON(true);
+ case CMDQ_OP_CMD_SYNC:
+ if (FIELD_GET(CMDQ_SYNC_0_CS, command[0]) == CMDQ_SYNC_0_CS_IRQ) {
+ /* Allow it, but let the host timeout, as this should never happen. */
+ command[0] &= ~CMDQ_SYNC_0_CS;
+ command[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_SEV);
+ command[1] &= ~CMDQ_SYNC_1_MSIADDR_MASK;
+ }
+ break;
+ }
+
+ return false;
+}
+
+static void smmu_emulate_cmdq_insert(struct hyp_arm_smmu_v3_device *smmu)
+{
+ u64 *host_cmdq = hyp_phys_to_virt(smmu->cmdq_host.q_base & Q_BASE_ADDR_MASK);
+ int idx;
+ u64 cmd[CMDQ_ENT_DWORDS];
+ bool skip;
+
+ if (!is_cmdq_enabled(smmu))
+ return;
+
+ while (!queue_empty(&smmu->cmdq_host.llq)) {
+ /* Wait for the command queue to have some space. */
+ WARN_ON(smmu_wait_event(smmu, !smmu_cmdq_full(&smmu->cmdq)));
+
+ idx = Q_IDX(&smmu->cmdq_host.llq, smmu->cmdq_host.llq.cons);
+ /* Avoid TOCTOU */
+ memcpy(cmd, &host_cmdq[idx * CMDQ_ENT_DWORDS], CMDQ_ENT_DWORDS << 3);
+ skip = smmu_filter_command(smmu, cmd);
+ if (!skip)
+ smmu_add_cmd_raw(smmu, cmd);
+ queue_inc_cons(&smmu->cmdq_host.llq);
+ }
+
+ /*
+ * Wait till consumed, this can be improved a bit by returning to the host
+ * while flagging the current offset in the command queue with the host,
+ * this would be maintained from the hyp entering command or when the
+ * host issuing another read to cons.
+ */
+ WARN_ON(smmu_wait_event(smmu, smmu_cmdq_empty(&smmu->cmdq)));
+}
+
static void smmu_emulate_cmdq_enable(struct hyp_arm_smmu_v3_device *smmu)
{
size_t cmdq_size;
@@ -360,17 +442,37 @@ static bool smmu_dabt_device(struct hyp_arm_smmu_v3_device *smmu,
mask = read_only & ~(IDR0_S2P | IDR0_VMID16 | IDR0_MSI | IDR0_HYP);
WARN_ON(len != sizeof(u32));
break;
- /* Passthrough the register access for bisectiblity, handled later */
case ARM_SMMU_CMDQ_BASE:
/* Not allowed by the architecture */
WARN_ON(is_cmdq_enabled(smmu));
if (is_write)
smmu->cmdq_host.q_base = val;
- mask = read_write;
- break;
+ else
+ regs->regs[rd] = smmu->cmdq_host.q_base;
+ goto out_ret;
case ARM_SMMU_CMDQ_PROD:
+ if (is_write) {
+ smmu->cmdq_host.llq.prod = val;
+ smmu_emulate_cmdq_insert(smmu);
+ } else {
+ regs->regs[rd] = smmu->cmdq_host.llq.prod;
+ }
+ goto out_ret;
case ARM_SMMU_CMDQ_CONS:
+ if (is_write) {
+ /* Not allowed by the architecture */
+ WARN_ON(is_cmdq_enabled(smmu));
+ smmu->cmdq_host.llq.cons = val;
+ } else {
+ /* Propagate errors back to the host.*/
+ u32 cons = readl_relaxed(smmu->base + ARM_SMMU_CMDQ_CONS);
+ u32 err = CMDQ_CONS_ERR & cons;
+
+ regs->regs[rd] = smmu->cmdq_host.llq.cons | err;
+ }
+ goto out_ret;
+ /* Passthrough the register access for bisectiblity, handled later */
case ARM_SMMU_STRTAB_BASE:
case ARM_SMMU_STRTAB_BASE_CFG:
case ARM_SMMU_GBPA:
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 23/28] iommu/arm-smmu-v3-kvm: Shadow stream table
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (21 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 22/28] iommu/arm-smmu-v3-kvm: Emulate CMDQ for host Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 24/28] iommu/arm-smmu-v3-kvm: Shadow STEs Mostafa Saleh
` (4 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
This patch allocates the shadow stream table per SMMU.
We choose the size of that table to be 1MB which is the
max size used by host in the case of 2 levels.
In this patch all the host writes are still paththrough for
bisectibility, that is changed next where CFGI commands will be
trapped and used to update the shadow copy hypervisor that
will be used by HW.
Similar to the command queue, the host stream table is
shared/unshared each time the SMMU is enabled/disabled.
Handling of L2 tables is also done in the next patch when
the shadowing is added.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c | 13 +-
.../iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c | 111 ++++++++++++++++++
.../iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h | 10 ++
3 files changed, 133 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
index 86e6c68aad4e..821190abac5a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
@@ -14,6 +14,8 @@
#include "pkvm/arm_smmu_v3.h"
#define SMMU_KVM_CMDQ_ORDER 4
+#define SMMU_KVM_STRTAB_ORDER (get_order(STRTAB_MAX_L1_ENTRIES * \
+ sizeof(struct arm_smmu_strtab_l1)))
extern struct kvm_iommu_ops kvm_nvhe_sym(smmu_ops);
@@ -60,7 +62,7 @@ static int kvm_arm_smmu_array_alloc(void)
/* Basic device tree parsing. */
for_each_compatible_node(np, NULL, "arm,smmu-v3") {
struct resource res;
- void *cmdq_base;
+ void *cmdq_base, *strtab;
ret = of_address_to_resource(np, 0, &res);
if (ret)
@@ -94,6 +96,15 @@ static int kvm_arm_smmu_array_alloc(void)
kvm_arm_smmu_array[i].cmdq.base_dma = virt_to_phys(cmdq_base);
kvm_arm_smmu_array[i].cmdq.llq.max_n_shift = SMMU_KVM_CMDQ_ORDER + PAGE_SHIFT -
CMDQ_ENT_SZ_SHIFT;
+
+ strtab = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, SMMU_KVM_STRTAB_ORDER);
+ if (!strtab) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+ kvm_arm_smmu_array[i].strtab_dma = virt_to_phys(strtab);
+ kvm_arm_smmu_array[i].strtab_size = PAGE_SIZE << SMMU_KVM_STRTAB_ORDER;
+
i++;
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
index 10c6461bbf12..d722f8ce0635 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
@@ -15,6 +15,14 @@
size_t __ro_after_init kvm_hyp_arm_smmu_v3_count;
struct hyp_arm_smmu_v3_device *kvm_hyp_arm_smmu_v3_smmus;
+/* strtab accessors */
+#define strtab_log2size(smmu) (FIELD_GET(STRTAB_BASE_CFG_LOG2SIZE, (smmu)->host_ste_cfg))
+#define strtab_size(smmu) ((1 << strtab_log2size(smmu)) * STRTAB_STE_DWORDS * 8)
+#define strtab_host_base(smmu) ((smmu)->host_ste_base & STRTAB_BASE_ADDR_MASK)
+#define strtab_split(smmu) (FIELD_GET(STRTAB_BASE_CFG_SPLIT, (smmu)->host_ste_cfg))
+#define strtab_l1_size(smmu) ((1 << (strtab_log2size(smmu) - strtab_split(smmu))) * \
+ (sizeof(struct arm_smmu_strtab_l1)))
+
#define for_each_smmu(smmu) \
for ((smmu) = kvm_hyp_arm_smmu_v3_smmus; \
(smmu) != &kvm_hyp_arm_smmu_v3_smmus[kvm_hyp_arm_smmu_v3_count]; \
@@ -255,6 +263,48 @@ static int smmu_init_cmdq(struct hyp_arm_smmu_v3_device *smmu)
return 0;
}
+static int smmu_init_strtab(struct hyp_arm_smmu_v3_device *smmu)
+{
+ int ret;
+ u32 reg;
+ enum kvm_pgtable_prot prot = PAGE_HYP;
+ struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+
+ if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
+ prot |= KVM_PGTABLE_PROT_NORMAL_NC;
+
+ ret = ___pkvm_host_donate_hyp(hyp_phys_to_pfn(smmu->strtab_dma),
+ smmu->strtab_size >> PAGE_SHIFT, prot);
+ if (ret)
+ return ret;
+ if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
+ unsigned int last_sid_idx =
+ arm_smmu_strtab_l1_idx((1ULL << smmu->sid_bits) - 1);
+
+ cfg->l2.l1tab = hyp_phys_to_virt(smmu->strtab_dma);
+ cfg->l2.l1_dma = smmu->strtab_dma;
+ cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
+
+ reg = FIELD_PREP(STRTAB_BASE_CFG_FMT,
+ STRTAB_BASE_CFG_FMT_2LVL) |
+ FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE,
+ ilog2(cfg->l2.num_l1_ents) + STRTAB_SPLIT) |
+ FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
+ } else {
+ cfg->linear.table = hyp_phys_to_virt(smmu->strtab_dma);
+ cfg->linear.ste_dma = smmu->strtab_dma;
+ cfg->linear.num_ents = 1UL << smmu->sid_bits;
+ reg = FIELD_PREP(STRTAB_BASE_CFG_FMT,
+ STRTAB_BASE_CFG_FMT_LINEAR) |
+ FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
+ }
+
+ writeq_relaxed((smmu->strtab_dma & STRTAB_BASE_ADDR_MASK) | STRTAB_BASE_RA,
+ smmu->base + ARM_SMMU_STRTAB_BASE);
+ writel_relaxed(reg, smmu->base + ARM_SMMU_STRTAB_BASE_CFG);
+ return 0;
+}
+
static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
{
int i, ret;
@@ -282,6 +332,10 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
if (ret)
goto out_ret;
+ ret = smmu_init_strtab(smmu);
+ if (ret)
+ goto out_ret;
+
return 0;
out_ret:
@@ -320,6 +374,11 @@ static int smmu_init(void)
return ret;
}
+static bool is_smmu_enabled(struct hyp_arm_smmu_v3_device *smmu)
+{
+ return FIELD_GET(CR0_SMMUEN, smmu->cr0);
+}
+
static bool is_cmdq_enabled(struct hyp_arm_smmu_v3_device *smmu)
{
return FIELD_GET(CR0_CMDQEN, smmu->cr0);
@@ -407,6 +466,39 @@ static void smmu_emulate_cmdq_insert(struct hyp_arm_smmu_v3_device *smmu)
WARN_ON(smmu_wait_event(smmu, smmu_cmdq_empty(&smmu->cmdq)));
}
+static void smmu_update_ste_shadow(struct hyp_arm_smmu_v3_device *smmu, bool enabled)
+{
+ size_t strtab_size;
+ u32 fmt = FIELD_GET(STRTAB_BASE_CFG_FMT, smmu->host_ste_cfg);
+
+ /* Linux doesn't change the fmt nor size of the strtab in the run time. */
+ if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
+ strtab_size = strtab_l1_size(smmu);
+ WARN_ON(fmt != STRTAB_BASE_CFG_FMT_2LVL);
+ WARN_ON((strtab_split(smmu) != STRTAB_SPLIT));
+ } else {
+ strtab_size = strtab_size(smmu);
+ WARN_ON(fmt != STRTAB_BASE_CFG_FMT_LINEAR);
+ WARN_ON(FIELD_GET(STRTAB_BASE_CFG_LOG2SIZE, smmu->host_ste_cfg) >
+ smmu->sid_bits);
+ }
+
+ if (enabled)
+ WARN_ON(smmu_share_pages(strtab_host_base(smmu), strtab_size));
+ else
+ WARN_ON(smmu_unshare_pages(strtab_host_base(smmu), strtab_size));
+}
+
+static void smmu_emulate_enable(struct hyp_arm_smmu_v3_device *smmu)
+{
+ smmu_update_ste_shadow(smmu, true);
+}
+
+static void smmu_emulate_disable(struct hyp_arm_smmu_v3_device *smmu)
+{
+ smmu_update_ste_shadow(smmu, false);
+}
+
static void smmu_emulate_cmdq_enable(struct hyp_arm_smmu_v3_device *smmu)
{
size_t cmdq_size;
@@ -474,19 +566,38 @@ static bool smmu_dabt_device(struct hyp_arm_smmu_v3_device *smmu,
goto out_ret;
/* Passthrough the register access for bisectiblity, handled later */
case ARM_SMMU_STRTAB_BASE:
+ if (is_write) {
+ /* Must only be written when SMMU_CR0.SMMUEN == 0.*/
+ WARN_ON(is_smmu_enabled(smmu));
+ smmu->host_ste_base = val;
+ }
+ mask = read_write;
+ break;
case ARM_SMMU_STRTAB_BASE_CFG:
+ if (is_write) {
+ /* Must only be written when SMMU_CR0.SMMUEN == 0.*/
+ WARN_ON(is_smmu_enabled(smmu));
+ smmu->host_ste_cfg = val;
+ }
+ mask = read_write;
+ break;
case ARM_SMMU_GBPA:
mask = read_write;
break;
case ARM_SMMU_CR0:
if (is_write) {
bool last_cmdq_en = is_cmdq_enabled(smmu);
+ bool last_smmu_en = is_smmu_enabled(smmu);
smmu->cr0 = val;
if (!last_cmdq_en && is_cmdq_enabled(smmu))
smmu_emulate_cmdq_enable(smmu);
else if (last_cmdq_en && !is_cmdq_enabled(smmu))
smmu_emulate_cmdq_disable(smmu);
+ if (!last_smmu_en && is_smmu_enabled(smmu))
+ smmu_emulate_enable(smmu);
+ else if (last_smmu_en && !is_smmu_enabled(smmu))
+ smmu_emulate_disable(smmu);
}
mask = read_write;
WARN_ON(len != sizeof(u32));
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
index 330da53f80d0..cf85e5efdd9e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
@@ -15,6 +15,8 @@
* @mmio_addr base address of the SMMU registers
* @mmio_size size of the registers resource
* @features Features of SMMUv3, subset of the main driver
+ * @strtab_dma Phys address of stream table
+ * @strtab_size Stream table size
*
* Other members are filled and used at runtime by the SMMU driver.
* @base Virtual address of SMMU registers
@@ -26,6 +28,9 @@
* @cmdq CMDQ as observed by HW
* @cmdq_host Host view of the command queue
* @cr0 Last value of CR0
+ * @host_ste_cfg Host stream table config
+ * @host_ste_base Host stream table base
+ * @strtab_cfg Stream table as seen by HW
*/
struct hyp_arm_smmu_v3_device {
phys_addr_t mmio_addr;
@@ -44,6 +49,11 @@ struct hyp_arm_smmu_v3_device {
struct arm_smmu_queue cmdq;
struct arm_smmu_queue cmdq_host;
u32 cr0;
+ dma_addr_t strtab_dma;
+ size_t strtab_size;
+ u64 host_ste_cfg;
+ u64 host_ste_base;
+ struct arm_smmu_strtab_cfg strtab_cfg;
};
extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 24/28] iommu/arm-smmu-v3-kvm: Shadow STEs
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (22 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 23/28] iommu/arm-smmu-v3-kvm: Shadow stream table Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 25/28] iommu/arm-smmu-v3-kvm: Emulate GBPA Mostafa Saleh
` (3 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
This patch adds STE emulation, this is done when the host sends the
CFGI_STE command.
In this patch we copy the STE as is to the shadow owned by the hypervisor,
in the next patch, stage-2 page table will be attached.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
.../iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c | 95 +++++++++++++++++--
1 file changed, 89 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
index d722f8ce0635..0f890a7d8db3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
@@ -22,6 +22,9 @@ struct hyp_arm_smmu_v3_device *kvm_hyp_arm_smmu_v3_smmus;
#define strtab_split(smmu) (FIELD_GET(STRTAB_BASE_CFG_SPLIT, (smmu)->host_ste_cfg))
#define strtab_l1_size(smmu) ((1 << (strtab_log2size(smmu) - strtab_split(smmu))) * \
(sizeof(struct arm_smmu_strtab_l1)))
+#define strtab_hyp_base(smmu) ((smmu)->features & ARM_SMMU_FEAT_2_LVL_STRTAB ? \
+ (u64 *)(smmu)->strtab_cfg.l2.l1tab :\
+ (u64 *)(smmu)->strtab_cfg.linear.table)
#define for_each_smmu(smmu) \
for ((smmu) = kvm_hyp_arm_smmu_v3_smmus; \
@@ -263,6 +266,83 @@ static int smmu_init_cmdq(struct hyp_arm_smmu_v3_device *smmu)
return 0;
}
+/* Get an STE for a stream table base. */
+static struct arm_smmu_ste *smmu_get_ste_ptr(struct hyp_arm_smmu_v3_device *smmu,
+ u32 sid, u64 *strtab)
+{
+ struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+ struct arm_smmu_ste *table = (struct arm_smmu_ste *)strtab;
+
+ if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
+ struct arm_smmu_strtab_l1 *l1tab = (struct arm_smmu_strtab_l1 *)strtab;
+ u32 l1_idx = arm_smmu_strtab_l1_idx(sid);
+ struct arm_smmu_strtab_l2 *l2ptr;
+
+ if (WARN_ON(l1_idx >= cfg->l2.num_l1_ents) ||
+ !(l1tab[l1_idx].l2ptr & STRTAB_L1_DESC_SPAN))
+ return NULL;
+
+ l2ptr = hyp_phys_to_virt(l1tab[l1_idx].l2ptr & STRTAB_L1_DESC_L2PTR_MASK);
+ /* Two-level walk */
+ return &l2ptr->stes[arm_smmu_strtab_l2_idx(sid)];
+ }
+ if (WARN_ON(sid >= cfg->linear.num_ents))
+ return NULL;
+ return &table[sid];
+}
+
+static int smmu_shadow_l2_strtab(struct hyp_arm_smmu_v3_device *smmu, u32 sid)
+{
+ struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+ struct arm_smmu_strtab_l2 *l2table;
+ u32 idx = arm_smmu_strtab_l1_idx(sid);
+ u64 *host_ste_base = hyp_phys_to_virt(strtab_host_base(smmu));
+ u64 l1_desc_host = host_ste_base[idx];
+ struct arm_smmu_strtab_l1 *l1_desc = &cfg->l2.l1tab[idx];
+
+ l2table = kvm_iommu_donate_pages(get_order(sizeof(*l2table)));
+ if (!l2table)
+ return -ENOMEM;
+ arm_smmu_write_strtab_l1_desc(l1_desc, hyp_virt_to_phys(l2table));
+ if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY))
+ kvm_flush_dcache_to_poc(l1_desc, sizeof(*l1_desc));
+
+ /*
+ * Now set the hyp l1 to a shared state.
+ * As mentioned in smmu_reshadow_ste() Linux never clears L1 ptrs,
+ * so no need to handle that case. Otherwise, we need to unshare
+ * the tables and emulate STE clear.
+ */
+ smmu_share_pages(l1_desc_host & STRTAB_L1_DESC_L2PTR_MASK, sizeof(*l2table));
+ return 0;
+}
+
+static void smmu_reshadow_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid, bool leaf)
+{
+ u64 *host_ste_base = hyp_phys_to_virt(strtab_host_base(smmu));
+ u64 *hyp_ste_base = strtab_hyp_base(smmu);
+ struct arm_smmu_ste *host_ste_ptr = smmu_get_ste_ptr(smmu, sid, host_ste_base);
+ struct arm_smmu_ste *hyp_ste_ptr = smmu_get_ste_ptr(smmu, sid, hyp_ste_base);
+ int i;
+
+ /*
+ * Linux only uses leaf = 1, when leaf is 0, we need to verify that this
+ * is a 2 level table and reshadow of l2.
+ * Also Linux never clears l1 ptr, that needs to free the old shadow.
+ */
+ if (WARN_ON(!leaf || !host_ste_ptr))
+ return;
+
+ /* If host is valid and hyp is not, means a new L1 installed. */
+ if (!hyp_ste_ptr) {
+ WARN_ON(smmu_shadow_l2_strtab(smmu, sid));
+ hyp_ste_ptr = smmu_get_ste_ptr(smmu, sid, hyp_ste_base);
+ }
+
+ for (i = 0; i < STRTAB_STE_DWORDS; i++)
+ WRITE_ONCE(hyp_ste_ptr->data[i], host_ste_ptr->data[i]);
+}
+
static int smmu_init_strtab(struct hyp_arm_smmu_v3_device *smmu)
{
int ret;
@@ -390,8 +470,13 @@ static bool smmu_filter_command(struct hyp_arm_smmu_v3_device *smmu, u64 *comman
switch (type) {
case CMDQ_OP_CFGI_STE:
- /* TBD: SHADOW_STE*/
+ {
+ u32 sid = FIELD_GET(CMDQ_CFGI_0_SID, command[0]);
+ u32 leaf = FIELD_GET(CMDQ_CFGI_1_LEAF, command[1]);
+
+ smmu_reshadow_ste(smmu, sid, leaf);
break;
+ }
case CMDQ_OP_CFGI_ALL:
{
/*
@@ -564,23 +649,21 @@ static bool smmu_dabt_device(struct hyp_arm_smmu_v3_device *smmu,
regs->regs[rd] = smmu->cmdq_host.llq.cons | err;
}
goto out_ret;
- /* Passthrough the register access for bisectiblity, handled later */
case ARM_SMMU_STRTAB_BASE:
if (is_write) {
/* Must only be written when SMMU_CR0.SMMUEN == 0.*/
WARN_ON(is_smmu_enabled(smmu));
smmu->host_ste_base = val;
}
- mask = read_write;
- break;
+ goto out_ret;
case ARM_SMMU_STRTAB_BASE_CFG:
if (is_write) {
/* Must only be written when SMMU_CR0.SMMUEN == 0.*/
WARN_ON(is_smmu_enabled(smmu));
smmu->host_ste_cfg = val;
}
- mask = read_write;
- break;
+ goto out_ret;
+ /* Passthrough the register access for bisectiblity, handled later */
case ARM_SMMU_GBPA:
mask = read_write;
break;
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 25/28] iommu/arm-smmu-v3-kvm: Emulate GBPA
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (23 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 24/28] iommu/arm-smmu-v3-kvm: Shadow STEs Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 26/28] iommu/arm-smmu-v3-kvm: Support io-pgtable Mostafa Saleh
` (2 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
The last bit of emulation is GBPA. it must be always set to ABORT,
as when the SMMU is disabled it’s not allowed for the host to bypass
the SMMU.
That ‘s is done by setting the GBPA to ABORT at init time, when the host:
- Writes, we ignore the write and save the value without the UPDATE bit.
- Reads, return the saved value.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
.../iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c | 21 ++++++++++++++++---
.../iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h | 2 ++
2 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
index 0f890a7d8db3..db9d9caaca2c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
@@ -100,6 +100,13 @@ static int smmu_unshare_pages(phys_addr_t addr, size_t size)
return 0;
}
+static int smmu_abort_gbpa(struct hyp_arm_smmu_v3_device *smmu)
+{
+ writel_relaxed(GBPA_UPDATE | GBPA_ABORT, smmu->base + ARM_SMMU_GBPA);
+ /* Wait till UPDATE is cleared. */
+ return smmu_wait(readl_relaxed(smmu->base + ARM_SMMU_GBPA) == GBPA_ABORT);
+}
+
static bool smmu_cmdq_full(struct arm_smmu_queue *cmdq)
{
struct arm_smmu_ll_queue *llq = &cmdq->llq;
@@ -416,6 +423,10 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
if (ret)
goto out_ret;
+ ret = smmu_abort_gbpa(smmu);
+ if (ret)
+ goto out_ret;
+
return 0;
out_ret:
@@ -663,10 +674,14 @@ static bool smmu_dabt_device(struct hyp_arm_smmu_v3_device *smmu,
smmu->host_ste_cfg = val;
}
goto out_ret;
- /* Passthrough the register access for bisectiblity, handled later */
case ARM_SMMU_GBPA:
- mask = read_write;
- break;
+ if (is_write)
+ smmu->gbpa = val & ~GBPA_UPDATE;
+ else
+ regs->regs[rd] = smmu->gbpa;
+
+ WARN_ON(len != sizeof(u32));
+ goto out_ret;
case ARM_SMMU_CR0:
if (is_write) {
bool last_cmdq_en = is_cmdq_enabled(smmu);
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
index cf85e5efdd9e..aab585dd9fd8 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
@@ -31,6 +31,7 @@
* @host_ste_cfg Host stream table config
* @host_ste_base Host stream table base
* @strtab_cfg Stream table as seen by HW
+ * @gbpa Last value of GBPA from the host
*/
struct hyp_arm_smmu_v3_device {
phys_addr_t mmio_addr;
@@ -54,6 +55,7 @@ struct hyp_arm_smmu_v3_device {
u64 host_ste_cfg;
u64 host_ste_base;
struct arm_smmu_strtab_cfg strtab_cfg;
+ u32 gbpa;
};
extern size_t kvm_nvhe_sym(kvm_hyp_arm_smmu_v3_count);
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 26/28] iommu/arm-smmu-v3-kvm: Support io-pgtable
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (24 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 25/28] iommu/arm-smmu-v3-kvm: Emulate GBPA Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 27/28] iommu/arm-smmu-v3-kvm: Shadow the CPU stage-2 page table Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 28/28] iommu/arm-smmu-v3-kvm: Enable nesting Mostafa Saleh
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
Add hooks needed to support io-pgtable-arm, mostly about
memory allocation.
Also add a function to allocate s2 64 bit page table.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
arch/arm64/kvm/hyp/nvhe/Makefile | 4 +-
.../arm/arm-smmu-v3/pkvm/io-pgtable-arm-hyp.c | 64 +++++++++++++++++++
drivers/iommu/io-pgtable-arm.c | 2 +-
drivers/iommu/io-pgtable-arm.h | 11 ++++
4 files changed, 79 insertions(+), 2 deletions(-)
create mode 100644 drivers/iommu/arm/arm-smmu-v3/pkvm/io-pgtable-arm-hyp.c
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index c71c96262378..10090be6b067 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -34,7 +34,9 @@ hyp-obj-y += $(lib-objs)
HYP_SMMU_V3_DRV_PATH = ../../../../../drivers/iommu/arm/arm-smmu-v3
hyp-obj-$(CONFIG_ARM_SMMU_V3_PKVM) += $(HYP_SMMU_V3_DRV_PATH)/pkvm/arm-smmu-v3.o \
- $(HYP_SMMU_V3_DRV_PATH)/arm-smmu-v3-common-hyp.o
+ $(HYP_SMMU_V3_DRV_PATH)/arm-smmu-v3-common-hyp.o \
+ $(HYP_SMMU_V3_DRV_PATH)/pkvm/io-pgtable-arm-hyp.o \
+ $(HYP_SMMU_V3_DRV_PATH)/../../io-pgtable-arm.o
##
## Build rules for compiling nVHE hyp code
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/io-pgtable-arm-hyp.c b/drivers/iommu/arm/arm-smmu-v3/pkvm/io-pgtable-arm-hyp.c
new file mode 100644
index 000000000000..6cf9e5bb76e7
--- /dev/null
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/io-pgtable-arm-hyp.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2022 Arm Ltd.
+ */
+#include <nvhe/iommu.h>
+
+#include "../../../io-pgtable-arm.h"
+
+struct io_pgtable_ops *kvm_alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
+ struct io_pgtable_cfg *cfg,
+ void *cookie)
+{
+ struct io_pgtable *iop;
+
+ if (fmt != ARM_64_LPAE_S2)
+ return NULL;
+
+ iop = arm_64_lpae_alloc_pgtable_s2(cfg, cookie);
+ iop->fmt = fmt;
+ iop->cookie = cookie;
+ iop->cfg = *cfg;
+
+ return &iop->ops;
+}
+
+void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
+ struct io_pgtable_cfg *cfg, void *cookie)
+{
+ void *addr;
+
+ addr = kvm_iommu_donate_pages(get_order(size));
+
+ if (addr && !cfg->coherent_walk)
+ kvm_flush_dcache_to_poc(addr, size);
+
+ return addr;
+}
+
+void __arm_lpae_free_pages(void *addr, size_t size, struct io_pgtable_cfg *cfg,
+ void *cookie)
+{
+ if (!cfg->coherent_walk)
+ kvm_flush_dcache_to_poc(addr, size);
+
+ kvm_iommu_reclaim_pages(addr);
+}
+
+void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
+ struct io_pgtable_cfg *cfg)
+{
+ if (!cfg->coherent_walk)
+ kvm_flush_dcache_to_poc(ptep, sizeof(*ptep) * num_entries);
+}
+
+/* At the moment this is only used once, so rounding up to a page is not really a problem. */
+void *__arm_lpae_alloc_data(size_t size, gfp_t gfp)
+{
+ return kvm_iommu_donate_pages(get_order(size));
+}
+
+void __arm_lpae_free_data(void *p)
+{
+ return kvm_iommu_reclaim_pages(p);
+}
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 2ca09081c3b0..211f6d54b902 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -947,7 +947,7 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
return NULL;
}
-static struct io_pgtable *
+struct io_pgtable *
arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
{
u64 sl;
diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
index 7d9f0b759275..194c3e975288 100644
--- a/drivers/iommu/io-pgtable-arm.h
+++ b/drivers/iommu/io-pgtable-arm.h
@@ -78,8 +78,19 @@ void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
void *cookie);
void *__arm_lpae_alloc_data(size_t size, gfp_t gfp);
void __arm_lpae_free_data(void *p);
+struct io_pgtable *
+arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie);
#ifndef __KVM_NVHE_HYPERVISOR__
#define __arm_lpae_virt_to_phys __pa
#define __arm_lpae_phys_to_virt __va
+#else
+#include <nvhe/memory.h>
+#define __arm_lpae_virt_to_phys hyp_virt_to_phys
+#define __arm_lpae_phys_to_virt hyp_phys_to_virt
+#undef WARN_ONCE
+#define WARN_ONCE(condition, format...) WARN_ON(1)
+struct io_pgtable_ops *kvm_alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
+ struct io_pgtable_cfg *cfg,
+ void *cookie);
#endif /* !__KVM_NVHE_HYPERVISOR__ */
#endif /* IO_PGTABLE_ARM_H_ */
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 27/28] iommu/arm-smmu-v3-kvm: Shadow the CPU stage-2 page table
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (25 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 26/28] iommu/arm-smmu-v3-kvm: Support io-pgtable Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 28/28] iommu/arm-smmu-v3-kvm: Enable nesting Mostafa Saleh
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
Based on the callbacks from the hypervisor, update the SMMUv3
Identity mapped page table.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
.../iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c | 171 +++++++++++++++++-
1 file changed, 169 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
index db9d9caaca2c..2d4ff21f83f9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
@@ -11,6 +11,7 @@
#include <nvhe/trap_handler.h>
#include "arm_smmu_v3.h"
+#include "../../../io-pgtable-arm.h"
size_t __ro_after_init kvm_hyp_arm_smmu_v3_count;
struct hyp_arm_smmu_v3_device *kvm_hyp_arm_smmu_v3_smmus;
@@ -58,6 +59,9 @@ struct hyp_arm_smmu_v3_device *kvm_hyp_arm_smmu_v3_smmus;
smmu_wait(_cond); \
})
+/* Protected by host_mmu.lock from core code. */
+static struct io_pgtable *idmap_pgtable;
+
/* Transfer ownership of memory */
static int smmu_take_pages(u64 phys, size_t size)
{
@@ -166,7 +170,6 @@ static int smmu_sync_cmd(struct hyp_arm_smmu_v3_device *smmu)
return smmu_wait_event(smmu, smmu_cmdq_empty(&smmu->cmdq));
}
-__maybe_unused
static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
struct arm_smmu_cmdq_ent *cmd)
{
@@ -178,6 +181,66 @@ static int smmu_send_cmd(struct hyp_arm_smmu_v3_device *smmu,
return smmu_sync_cmd(smmu);
}
+static void __smmu_add_cmd(struct hyp_arm_smmu_v3_device *smmu, void *unused,
+ struct arm_smmu_cmdq_ent *cmd)
+{
+ WARN_ON(smmu_add_cmd(smmu, cmd));
+}
+
+static int smmu_tlb_inv_range_smmu(struct hyp_arm_smmu_v3_device *smmu,
+ struct arm_smmu_cmdq_ent *cmd,
+ unsigned long iova, size_t size, size_t granule)
+{
+ arm_smmu_tlb_inv_build(cmd, iova, size, granule,
+ idmap_pgtable->cfg.pgsize_bitmap, smmu,
+ __smmu_add_cmd, NULL);
+ return smmu_sync_cmd(smmu);
+}
+
+static void smmu_tlb_inv_range(unsigned long iova, size_t size, size_t granule,
+ bool leaf)
+{
+ struct arm_smmu_cmdq_ent cmd = {
+ .opcode = CMDQ_OP_TLBI_S2_IPA,
+ .tlbi = {
+ .leaf = leaf,
+ .vmid = 0,
+ },
+ };
+ struct arm_smmu_cmdq_ent cmd_s1 = {
+ .opcode = CMDQ_OP_TLBI_NH_ALL,
+ .tlbi = {
+ .vmid = 0,
+ },
+ };
+ struct hyp_arm_smmu_v3_device *smmu;
+
+ for_each_smmu(smmu) {
+ hyp_spin_lock(&smmu->lock);
+ WARN_ON(smmu_tlb_inv_range_smmu(smmu, &cmd, iova, size, granule));
+ WARN_ON(smmu_send_cmd(smmu, &cmd_s1));
+ hyp_spin_unlock(&smmu->lock);
+ }
+}
+
+static void smmu_tlb_flush_walk(unsigned long iova, size_t size,
+ size_t granule, void *cookie)
+{
+ smmu_tlb_inv_range(iova, size, granule, false);
+}
+
+static void smmu_tlb_add_page(struct iommu_iotlb_gather *gather,
+ unsigned long iova, size_t granule,
+ void *cookie)
+{
+ smmu_tlb_inv_range(iova, granule, granule, true);
+}
+
+static const struct iommu_flush_ops smmu_tlb_ops = {
+ .tlb_flush_walk = smmu_tlb_flush_walk,
+ .tlb_add_page = smmu_tlb_add_page,
+};
+
/* Put the device in a state that can be probed by the host driver. */
static void smmu_deinit_device(struct hyp_arm_smmu_v3_device *smmu)
{
@@ -434,6 +497,37 @@ static int smmu_init_device(struct hyp_arm_smmu_v3_device *smmu)
return ret;
}
+static int smmu_init_pgt(void)
+{
+ /* Default values overridden based on SMMUs common features. */
+ struct io_pgtable_cfg cfg = (struct io_pgtable_cfg) {
+ .tlb = &smmu_tlb_ops,
+ .pgsize_bitmap = -1,
+ .ias = 48,
+ .oas = 48,
+ .coherent_walk = true,
+ };
+ struct hyp_arm_smmu_v3_device *smmu;
+ struct io_pgtable_ops *ops;
+
+ for_each_smmu(smmu) {
+ cfg.ias = min(cfg.ias, smmu->ias);
+ cfg.oas = min(cfg.oas, smmu->oas);
+ cfg.pgsize_bitmap &= smmu->pgsize_bitmap;
+ cfg.coherent_walk &= !!(smmu->features & ARM_SMMU_FEAT_COHERENCY);
+ }
+
+ /* At least PAGE_SIZE must be supported by all SMMUs*/
+ if ((cfg.pgsize_bitmap & PAGE_SIZE) == 0)
+ return -EINVAL;
+
+ ops = kvm_alloc_io_pgtable_ops(ARM_64_LPAE_S2, &cfg, NULL);
+ if (!ops)
+ return -ENOMEM;
+ idmap_pgtable = io_pgtable_ops_to_pgtable(ops);
+ return 0;
+}
+
static int smmu_init(void)
{
int ret;
@@ -455,7 +549,7 @@ static int smmu_init(void)
BUILD_BUG_ON(sizeof(hyp_spinlock_t) != sizeof(u32));
- return 0;
+ return smmu_init_pgt();
out_reclaim_smmu:
while (smmu != kvm_hyp_arm_smmu_v3_smmus)
@@ -789,8 +883,81 @@ static bool smmu_dabt_handler(struct user_pt_regs *regs, u64 esr, u64 addr)
return false;
}
+static size_t smmu_pgsize_idmap(size_t size, u64 paddr, size_t pgsize_bitmap)
+{
+ size_t pgsizes;
+
+ /* Remove page sizes that are larger than the current size */
+ pgsizes = pgsize_bitmap & GENMASK_ULL(__fls(size), 0);
+
+ /* Remove page sizes that the address is not aligned to. */
+ if (likely(paddr))
+ pgsizes &= GENMASK_ULL(__ffs(paddr), 0);
+
+ WARN_ON(!pgsizes);
+
+ /* Return the larget page size that fits. */
+ return BIT(__fls(pgsizes));
+}
+
static void smmu_host_stage2_idmap(phys_addr_t start, phys_addr_t end, int prot)
{
+ size_t size = end - start;
+ size_t pgsize = PAGE_SIZE, pgcount;
+ size_t mapped, unmapped;
+ int ret;
+ struct io_pgtable *pgtable = idmap_pgtable;
+
+ end = min(end, BIT(pgtable->cfg.oas));
+ if (start >= end)
+ return;
+
+ if (prot) {
+ if (!(prot & IOMMU_MMIO))
+ prot |= IOMMU_CACHE;
+
+ while (size) {
+ mapped = 0;
+ /*
+ * We handle pages size for memory and MMIO differently:
+ * - memory: Map everything with PAGE_SIZE, that is guaranteed to
+ * find memory as we allocated enough pages to cover the entire
+ * memory, we do that as io-pgtable-arm doesn't support
+ * split_blk_unmap logic any more, so we can't break blocks once
+ * mapped to tables.
+ * - MMIO: Unlike memory, pKVM allocate 1G to for all MMIO, while
+ * the MMIO space can be large, as it is assumed to cover the
+ * whole IAS that is not memory, we have to use block mappings,
+ * that is fine for MMIO as it is never donated at the moment,
+ * so we never need to unmap MMIO at the run time triggereing
+ * split block logic.
+ */
+ if (prot & IOMMU_MMIO)
+ pgsize = smmu_pgsize_idmap(size, start, pgtable->cfg.pgsize_bitmap);
+
+ pgcount = size / pgsize;
+ ret = pgtable->ops.map_pages(&pgtable->ops, start, start,
+ pgsize, pgcount, prot, 0, &mapped);
+ size -= mapped;
+ start += mapped;
+ if (!mapped || ret)
+ return;
+ }
+ } else {
+ /* Shouldn't happen. */
+ WARN_ON(prot & IOMMU_MMIO);
+ while (size) {
+ pgcount = size / pgsize;
+ unmapped = pgtable->ops.unmap_pages(&pgtable->ops, start,
+ pgsize, pgcount, NULL);
+ size -= unmapped;
+ start += unmapped;
+ if (!unmapped)
+ return;
+ }
+ /* Some memory were not unmapped. */
+ WARN_ON(size);
+ }
}
/* Shared with the kernel driver in EL1 */
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 28/28] iommu/arm-smmu-v3-kvm: Enable nesting
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
` (26 preceding siblings ...)
2025-08-19 21:51 ` [PATCH v4 27/28] iommu/arm-smmu-v3-kvm: Shadow the CPU stage-2 page table Mostafa Saleh
@ 2025-08-19 21:51 ` Mostafa Saleh
27 siblings, 0 replies; 29+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
To: linux-kernel, kvmarm, linux-arm-kernel, iommu
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, robin.murphy, jean-philippe, qperret,
tabba, jgg, mark.rutland, praan, Mostafa Saleh
Now, as the hypervisor controls the command queue, stream table,
and shadows the stage-2 page table.
Enable stage-2 in case the host puts an STE in bypass or stage-1.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
.../iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c | 72 ++++++++++++++++++-
1 file changed, 70 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
index 2d4ff21f83f9..5be44a37d581 100644
--- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
@@ -336,6 +336,46 @@ static int smmu_init_cmdq(struct hyp_arm_smmu_v3_device *smmu)
return 0;
}
+static void smmu_attach_stage_2(struct hyp_arm_smmu_v3_device *smmu, struct arm_smmu_ste *ste)
+{
+ unsigned long vttbr;
+ unsigned long ts, sl, ic, oc, sh, tg, ps;
+ unsigned long cfg;
+ struct io_pgtable_cfg *pgt_cfg = &idmap_pgtable->cfg;
+
+ cfg = FIELD_GET(STRTAB_STE_0_CFG, ste->data[0]);
+ if (!FIELD_GET(STRTAB_STE_0_V, ste->data[0]) ||
+ (cfg == STRTAB_STE_0_CFG_ABORT))
+ return;
+ /* S2 is not advertised, that should never be attempted. */
+ if (WARN_ON(cfg == STRTAB_STE_0_CFG_NESTED))
+ return;
+ vttbr = pgt_cfg->arm_lpae_s2_cfg.vttbr;
+ ps = pgt_cfg->arm_lpae_s2_cfg.vtcr.ps;
+ tg = pgt_cfg->arm_lpae_s2_cfg.vtcr.tg;
+ sh = pgt_cfg->arm_lpae_s2_cfg.vtcr.sh;
+ oc = pgt_cfg->arm_lpae_s2_cfg.vtcr.orgn;
+ ic = pgt_cfg->arm_lpae_s2_cfg.vtcr.irgn;
+ sl = pgt_cfg->arm_lpae_s2_cfg.vtcr.sl;
+ ts = pgt_cfg->arm_lpae_s2_cfg.vtcr.tsz;
+
+ ste->data[1] |= FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING);
+ /* The host shouldn't write dwords 2 and 3, overwrite them. */
+ ste->data[2] = FIELD_PREP(STRTAB_STE_2_VTCR,
+ FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, ps) |
+ FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, tg) |
+ FIELD_PREP(STRTAB_STE_2_VTCR_S2SH0, sh) |
+ FIELD_PREP(STRTAB_STE_2_VTCR_S2OR0, oc) |
+ FIELD_PREP(STRTAB_STE_2_VTCR_S2IR0, ic) |
+ FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, sl) |
+ FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, ts)) |
+ FIELD_PREP(STRTAB_STE_2_S2VMID, 0) |
+ STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2R;
+ ste->data[3] = vttbr & STRTAB_STE_3_S2TTB_MASK;
+ /* Convert S1 => nested and bypass => S2 */
+ ste->data[0] |= FIELD_PREP(STRTAB_STE_0_CFG, cfg | BIT(1));
+}
+
/* Get an STE for a stream table base. */
static struct arm_smmu_ste *smmu_get_ste_ptr(struct hyp_arm_smmu_v3_device *smmu,
u32 sid, u64 *strtab)
@@ -394,6 +434,10 @@ static void smmu_reshadow_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid, bool
struct arm_smmu_ste *host_ste_ptr = smmu_get_ste_ptr(smmu, sid, host_ste_base);
struct arm_smmu_ste *hyp_ste_ptr = smmu_get_ste_ptr(smmu, sid, hyp_ste_base);
int i;
+ struct arm_smmu_ste target = {};
+ struct arm_smmu_cmdq_ent cfgi_cmd = {
+ .opcode = CMDQ_OP_CFGI_ALL,
+ };
/*
* Linux only uses leaf = 1, when leaf is 0, we need to verify that this
@@ -409,8 +453,32 @@ static void smmu_reshadow_ste(struct hyp_arm_smmu_v3_device *smmu, u32 sid, bool
hyp_ste_ptr = smmu_get_ste_ptr(smmu, sid, hyp_ste_base);
}
- for (i = 0; i < STRTAB_STE_DWORDS; i++)
- WRITE_ONCE(hyp_ste_ptr->data[i], host_ste_ptr->data[i]);
+ memcpy(target.data, host_ste_ptr->data, STRTAB_STE_DWORDS << 3);
+
+ /*
+ * Typically, STE update is done as the following
+ * 1- Write last 7 dwords, while STE is invalid
+ * 2- CFGI
+ * 3- Write first dword, making STE valid
+ * 4- CFGI
+ * As the SMMU has to least to load 64 bits atomically
+ * that gurantees that there is no race between writing
+ * the STE and the CFGI where the SMMU observes parts
+ * of the STE.
+ * In the shadow we update the STE to enable nested translation,
+ * which requires updating first 3 dwords.
+ * That is only done if the STE is valid and not in abort.
+ * Which means it happens at step 4)
+ * So we need to also write the last 7 dwords and send CFGI
+ * before writing the first dword.
+ * There is no need for last CFGI as the host will do that.
+ */
+ smmu_attach_stage_2(smmu, &target);
+ for (i = 1; i < STRTAB_STE_DWORDS; i++)
+ WRITE_ONCE(hyp_ste_ptr->data[i], target.data[i]);
+
+ WARN_ON(smmu_send_cmd(smmu, &cfgi_cmd));
+ WRITE_ONCE(hyp_ste_ptr->data[0], target.data[0]);
}
static int smmu_init_strtab(struct hyp_arm_smmu_v3_device *smmu)
--
2.51.0.rc1.167.g924127e9c0-goog
^ permalink raw reply related [flat|nested] 29+ messages in thread
end of thread, other threads:[~2025-08-20 0:44 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 01/28] KVM: arm64: Add a new function to donate memory with prot Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 02/28] KVM: arm64: Donate MMIO to the hypervisor Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 03/28] KVM: arm64: pkvm: Add pkvm_time_get() Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 04/28] iommu/io-pgtable-arm: Move selftests to a separate file Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 05/28] iommu/io-pgtable-arm: Factor kernel specific code out Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 06/28] iommu/arm-smmu-v3: Split code with hyp Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 07/28] iommu/arm-smmu-v3: Move TLB range invalidation into a macro Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 08/28] iommu/arm-smmu-v3: Move IDR parsing to common functions Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 09/28] KVM: arm64: iommu: Introduce IOMMU driver infrastructure Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 10/28] KVM: arm64: iommu: Shadow host stage-2 page table Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 11/28] KVM: arm64: iommu: Add memory pool Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 12/28] KVM: arm64: iommu: Support DABT for IOMMU Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 13/28] iommu/arm-smmu-v3-kvm: Add SMMUv3 driver Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 14/28] iommu/arm-smmu-v3: Add KVM mode in the driver Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 15/28] iommu/arm-smmu-v3: Load the driver later in KVM mode Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 16/28] iommu/arm-smmu-v3-kvm: Create array for hyp SMMUv3 Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 17/28] iommu/arm-smmu-v3-kvm: Take over SMMUs Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 18/28] iommu/arm-smmu-v3-kvm: Probe SMMU HW Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 19/28] iommu/arm-smmu-v3-kvm: Add MMIO emulation Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 20/28] iommu/arm-smmu-v3-kvm: Shadow the command queue Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 21/28] iommu/arm-smmu-v3-kvm: Add CMDQ functions Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 22/28] iommu/arm-smmu-v3-kvm: Emulate CMDQ for host Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 23/28] iommu/arm-smmu-v3-kvm: Shadow stream table Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 24/28] iommu/arm-smmu-v3-kvm: Shadow STEs Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 25/28] iommu/arm-smmu-v3-kvm: Emulate GBPA Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 26/28] iommu/arm-smmu-v3-kvm: Support io-pgtable Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 27/28] iommu/arm-smmu-v3-kvm: Shadow the CPU stage-2 page table Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 28/28] iommu/arm-smmu-v3-kvm: Enable nesting Mostafa Saleh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).