linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate)
@ 2025-08-19 21:51 Mostafa Saleh
  2025-08-19 21:51 ` [PATCH v4 01/28] KVM: arm64: Add a new function to donate memory with prot Mostafa Saleh
                   ` (27 more replies)
  0 siblings, 28 replies; 82+ messages in thread
From: Mostafa Saleh @ 2025-08-19 21:51 UTC (permalink / raw)
  To: linux-kernel, kvmarm, linux-arm-kernel, iommu
  Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, robin.murphy, jean-philippe, qperret,
	tabba, jgg, mark.rutland, praan, Mostafa Saleh

This is v4 of pKVM SMMUv3 support, this version is quite different from
the previous ones as it implements nested SMMUv3 using trap and emulate:
v1: Implements full fledged pv interface
https://lore.kernel.org/kvmarm/20230201125328.2186498-1-jean-philippe@linaro.org/

v2:  Implements full fledged pv interface (+ more features as evtq and s1)
https://lore.kernel.org/kvmarm/20241212180423.1578358-1-smostafa@google.com/

v3: Only DMA isolation (using pv)
https://lore.kernel.org/kvmarm/20250728175316.3706196-1-smostafa@google.com/

Based on the feedback on v3, having a separate driver was too complicated
to maintain. So, the alternatives were either to integrate the KVM
implementation in the current driver and rely on impl ops, I have PoC for it:
https://android-kvm.googlesource.com/linux/+log/refs/heads/pkvm-smmu-implops-poc

Or just go for the final goal which is nested translation using trap and
emulate which is implemented in this series.

Other major changes, is that io-pgtable-arm is not split to a common file,
however kernel specific code was factored out (mostly memory allocation
and selftests) based on Robin feedback.

Design:
=======

Assumptions:
------------

As mentioned, this is a completely different approach which uses trapping
of the SMMUv3 MMIO space and emulating some of these accesses.

One of the important points, is that this doesn’t emulate the full SMMUv3
architecture, but only the parts used by Linux kernel, that’s why enablement
of this (ARM_SMMU_V3_PKVM) depends on (ARM_SMMU_V3=y) so we are sure of the
driver behaviour.

Any new change in the driver will likely trigger a WARN_ON ending up in panic.

Most notable assumptions:
- Changing of stream table format/size or l2 pointers is not allowed after
initialization.
- leaf=0 CFGI is not allowed
- CFGI_ALL with any value but 31 is not allowed
- Some commands which are not used are not allowed (ex CMD_TLBI_NH_ALL)
- Values set in ARM_SMMU_CR1 are hardcoded and don't change.


Emulation logic mainly targets:

1) Command Queue
----------------
At boot time, the hypervisor will allocate a shadow command queue (doesn’t need
to match the host size) which then sets up in HW, then it will trap access to

i) ARM_SMMU_CMDQ_BASE
That can only be written when the cmdq is disabled. Then on enable, the hypervisor
will put the host command queue in a shared state to avoid transition into the
hypervisor or VMs. It will be unshared with the cmdq is disabled

ii) ARM_SMMU_CMDQ_PROD
Trigger emulation code, where the hypervisor will copy the commands between cons and
prod, of the host queue and sanitise them (mostly WARNs if the host is malicious and
issuing commands it shouldn’t) then eagerly consume them, updating the host cons.

iii) ARM_SMMU_CMDQ_CONS
No much logic, just return the emulated cons + error bits.

2) Stream table
---------------
Similar to the command queue, the first level is allocated at boot with max possible
size, then the hypervisor  will trap access to:
i) ARM_SMMU_STRTAB_BASE/ARM_SMMU_STRTAB_BASE_CFG: Keep track of the stream table to
put it in a shared state.

On CFGI_STE, the hypervisor will read the STE in scope from the host copy, shadow
L2 pointers if needed and attach stage-2.

3) GBPA
-------
The hypervisor will set GBPA to abort at boot, then any access to GBPA from the host
will return the value set by the host. The host only sets ABORT so that is fine.
Otherwise we can always return ABORT is set even if not which makes it look like HW
is not responding to updates.


Bisectibility:
==============
I wrote the patches where most of them are bisectable at run time (so we can run
with a prefix of the series till MMIO emulation, cmdq emulation, STE or full nested)
that was very helpful in debugging, and I kept like this to make debugging easier.

Constraints:
============
1) Discovery:
-------------
Only device tree is supported at the moment.
I don’t usually use ACPI, but I can look into adding that later. (not make this
series bigger)

2) Errata:
----------
Some HW with both stage-1 and stage-2 but can’t run nested translation due to some errata,
which makes the driver remove nesting for MMU_700, I believe this is too restrictive.
At the moment KVM will use nesting if advertised. (Or we need other mechanism to exclude
only the affected HW)

3) Shadow page table
--------------------
Uses page granularity (leaf) for memory, that’s because of the lack of split_block_unmap()
logic. I am currently looking into the possibility of sharing page tables,
if that turned complicated (as expected) it might be worth to re-add this logic

Boot flow:
==========
The hypervisor initialises at “module_init”.
Before that, at “core_initcall” the SMMUv3 KVM code will
- Register the hypervisor ops with the hypervisor
- Parse the device tree and populate an array with the SMMUs to the hypervisor.

At “module_init”, the hypervisor init will run, where the SMMU driver will:
- Take over the SMMus description
- Probe the SMMUs (from IDRs) I tried to make most of this code common using macros.
- Take over the SMMUs MMIO space so it will be trapped.
- Take over and set up the shadow command queue and stream table.

With “ARM_SMMU_V3_PKVM” enabled, the current SMMU driver will register at
“device_initcall_sync” so it can run after the kernel de-privileges and the
hypervisor is set up.

Future work
===========
1) Sharing page tables will be an interesting optimization, but requires dealing with
stage-2 page faults (which are handled by the kernel), BBM and possibly more complexity.

2) There is currently ongoing work to enable RPM, that will possibly enable/disable
the SMMU frequently, we might need some optimizations to avoid re-shadowing the
CMDQ/STE unnecessarily.

3) Look into ACPI support.

Patches overview
=================
The patches are split as follows:

Patches 01-03: Core hypervisor: Add donation for NC, dealing with MMIO,
               and arch timer abstraction.
Patches 04-08: Refactoring of io-pgtable-arm and SMMUv3 driver
Patches 09-12: Hypervisor IOMMU core: IOMMU pagetable management, dabts…
Patches 13-28: KVM SMMUv3 code


Tested on Qemu(S1 only, S2 only and nested)  and Morello board.
Also tested with PAGE_SIZE 4k,16k, and 64k.

A development branch can be found in:
https://android-kvm.googlesource.com/linux/+log/refs/heads/pkvm-smmu-v4


Jean-Philippe Brucker (1):
  iommu/arm-smmu-v3-kvm: Add SMMUv3 driver

Mostafa Saleh (27):
  KVM: arm64: Add a new function to donate memory with prot
  KVM: arm64: Donate MMIO to the hypervisor
  KVM: arm64: pkvm: Add pkvm_time_get()
  iommu/io-pgtable-arm: Move selftests to a separate file
  iommu/io-pgtable-arm: Factor kernel specific code out
  iommu/arm-smmu-v3: Split code with hyp
  iommu/arm-smmu-v3: Move TLB range invalidation into a macro
  iommu/arm-smmu-v3: Move IDR parsing to common functions
  KVM: arm64: iommu: Introduce IOMMU driver infrastructure
  KVM: arm64: iommu: Shadow host stage-2 page table
  KVM: arm64: iommu: Add memory pool
  KVM: arm64: iommu: Support DABT for IOMMU
  iommu/arm-smmu-v3: Add KVM mode in the driver
  iommu/arm-smmu-v3: Load the driver later in KVM mode
  iommu/arm-smmu-v3-kvm: Create array for hyp SMMUv3
  iommu/arm-smmu-v3-kvm: Take over SMMUs
  iommu/arm-smmu-v3-kvm: Probe SMMU HW
  iommu/arm-smmu-v3-kvm: Add MMIO emulation
  iommu/arm-smmu-v3-kvm: Shadow the command queue
  iommu/arm-smmu-v3-kvm: Add CMDQ functions
  iommu/arm-smmu-v3-kvm: Emulate CMDQ for host
  iommu/arm-smmu-v3-kvm: Shadow stream table
  iommu/arm-smmu-v3-kvm: Shadow STEs
  iommu/arm-smmu-v3-kvm: Emulate GBPA
  iommu/arm-smmu-v3-kvm: Support io-pgtable
  iommu/arm-smmu-v3-kvm: Shadow the CPU stage-2 page table
  iommu/arm-smmu-v3-kvm: Enable nesting

 arch/arm64/include/asm/kvm_arm.h              |    2 +
 arch/arm64/include/asm/kvm_host.h             |    7 +
 arch/arm64/kvm/Makefile                       |    3 +-
 arch/arm64/kvm/hyp/include/nvhe/iommu.h       |   21 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |    3 +
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |    2 +
 arch/arm64/kvm/hyp/nvhe/Makefile              |   10 +-
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c         |  130 +++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |   90 +-
 arch/arm64/kvm/hyp/nvhe/setup.c               |   17 +
 arch/arm64/kvm/hyp/nvhe/timer-sr.c            |   33 +
 arch/arm64/kvm/hyp/pgtable.c                  |    9 +-
 arch/arm64/kvm/iommu.c                        |   32 +
 arch/arm64/kvm/pkvm.c                         |    1 +
 drivers/iommu/Makefile                        |    2 +-
 drivers/iommu/arm/Kconfig                     |    9 +
 drivers/iommu/arm/arm-smmu-v3/Makefile        |    3 +-
 .../arm/arm-smmu-v3/arm-smmu-v3-common-hyp.c  |  114 ++
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   |  158 +++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  342 +-----
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  220 ++++
 .../iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c  | 1036 +++++++++++++++++
 .../iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h  |   67 ++
 .../arm/arm-smmu-v3/pkvm/io-pgtable-arm-hyp.c |   64 +
 drivers/iommu/io-pgtable-arm-kernel.c         |  305 +++++
 drivers/iommu/io-pgtable-arm.c                |  346 +-----
 drivers/iommu/io-pgtable-arm.h                |   66 ++
 27 files changed, 2439 insertions(+), 653 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/iommu.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
 create mode 100644 arch/arm64/kvm/iommu.c
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common-hyp.c
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/pkvm/io-pgtable-arm-hyp.c
 create mode 100644 drivers/iommu/io-pgtable-arm-kernel.c

-- 
2.51.0.rc1.167.g924127e9c0-goog



^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2025-11-06 17:17 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-19 21:51 [PATCH v4 00/28] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 01/28] KVM: arm64: Add a new function to donate memory with prot Mostafa Saleh
2025-09-09 13:46   ` Will Deacon
2025-09-14 19:23     ` Pranjal Shrivastava
2025-09-16 11:58       ` Mostafa Saleh
2025-09-16 11:56     ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 02/28] KVM: arm64: Donate MMIO to the hypervisor Mostafa Saleh
2025-09-09 14:12   ` Will Deacon
2025-09-16 13:27     ` Mostafa Saleh
2025-09-26 14:33       ` Will Deacon
2025-09-29 10:57         ` Mostafa Saleh
2025-09-14 20:41   ` Pranjal Shrivastava
2025-09-16 13:43     ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 03/28] KVM: arm64: pkvm: Add pkvm_time_get() Mostafa Saleh
2025-09-09 14:16   ` Will Deacon
2025-09-09 15:56     ` Marc Zyngier
2025-09-15 11:10       ` Pranjal Shrivastava
2025-09-16 14:04       ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 04/28] iommu/io-pgtable-arm: Move selftests to a separate file Mostafa Saleh
2025-09-15 14:37   ` Pranjal Shrivastava
2025-09-16 14:07     ` Mostafa Saleh
2025-09-15 16:45   ` Jason Gunthorpe
2025-09-16 14:09     ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 05/28] iommu/io-pgtable-arm: Factor kernel specific code out Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 06/28] iommu/arm-smmu-v3: Split code with hyp Mostafa Saleh
2025-09-09 14:23   ` Will Deacon
2025-09-16 14:10     ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 07/28] iommu/arm-smmu-v3: Move TLB range invalidation into a macro Mostafa Saleh
2025-09-09 14:25   ` Will Deacon
2025-08-19 21:51 ` [PATCH v4 08/28] iommu/arm-smmu-v3: Move IDR parsing to common functions Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 09/28] KVM: arm64: iommu: Introduce IOMMU driver infrastructure Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 10/28] KVM: arm64: iommu: Shadow host stage-2 page table Mostafa Saleh
2025-09-09 14:42   ` Will Deacon
2025-09-16 14:24     ` Mostafa Saleh
2025-09-26 14:42       ` Will Deacon
2025-09-29 11:01         ` Mostafa Saleh
2025-09-30 12:38           ` Jason Gunthorpe
2025-09-30 12:55             ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 11/28] KVM: arm64: iommu: Add memory pool Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 12/28] KVM: arm64: iommu: Support DABT for IOMMU Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 13/28] iommu/arm-smmu-v3-kvm: Add SMMUv3 driver Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 14/28] iommu/arm-smmu-v3: Add KVM mode in the driver Mostafa Saleh
2025-09-12 13:52   ` Will Deacon
2025-09-16 14:30     ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 15/28] iommu/arm-smmu-v3: Load the driver later in KVM mode Mostafa Saleh
2025-09-12 13:54   ` Will Deacon
2025-09-23 14:35     ` Mostafa Saleh
2025-09-23 17:38       ` Jason Gunthorpe
2025-09-29 11:10         ` Mostafa Saleh
2025-10-02 15:13           ` Jason Gunthorpe
2025-11-05 16:40             ` Mostafa Saleh
2025-11-05 17:12               ` Jason Gunthorpe
2025-11-06 11:06                 ` Mostafa Saleh
2025-11-06 13:23                   ` Jason Gunthorpe
2025-11-06 16:54                     ` Mostafa Saleh
2025-11-06 17:16                       ` Jason Gunthorpe
2025-08-19 21:51 ` [PATCH v4 16/28] iommu/arm-smmu-v3-kvm: Create array for hyp SMMUv3 Mostafa Saleh
2025-09-09 18:30   ` Daniel Mentz
2025-09-16 14:35     ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 17/28] iommu/arm-smmu-v3-kvm: Take over SMMUs Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 18/28] iommu/arm-smmu-v3-kvm: Probe SMMU HW Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 19/28] iommu/arm-smmu-v3-kvm: Add MMIO emulation Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 20/28] iommu/arm-smmu-v3-kvm: Shadow the command queue Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 21/28] iommu/arm-smmu-v3-kvm: Add CMDQ functions Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 22/28] iommu/arm-smmu-v3-kvm: Emulate CMDQ for host Mostafa Saleh
2025-09-12 14:18   ` Will Deacon
2025-09-15 16:38     ` Jason Gunthorpe
2025-09-16 15:19       ` Mostafa Saleh
2025-09-17 12:36         ` Jason Gunthorpe
2025-09-17 15:01           ` Will Deacon
2025-09-17 15:16             ` Jason Gunthorpe
2025-09-17 15:25               ` Will Deacon
2025-09-17 15:59                 ` Jason Gunthorpe
2025-09-18 10:26                   ` Will Deacon
2025-09-18 14:36                     ` Jason Gunthorpe
2025-09-16 14:50     ` Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 23/28] iommu/arm-smmu-v3-kvm: Shadow stream table Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 24/28] iommu/arm-smmu-v3-kvm: Shadow STEs Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 25/28] iommu/arm-smmu-v3-kvm: Emulate GBPA Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 26/28] iommu/arm-smmu-v3-kvm: Support io-pgtable Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 27/28] iommu/arm-smmu-v3-kvm: Shadow the CPU stage-2 page table Mostafa Saleh
2025-08-19 21:51 ` [PATCH v4 28/28] iommu/arm-smmu-v3-kvm: Enable nesting Mostafa Saleh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).