public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/25] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate)
@ 2026-05-01 11:19 Mostafa Saleh
  2026-05-01 11:19 ` [PATCH v6 01/25] KVM: arm64: Generalize trace clock Mostafa Saleh
                   ` (24 more replies)
  0 siblings, 25 replies; 38+ messages in thread
From: Mostafa Saleh @ 2026-05-01 11:19 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, kvmarm, iommu
  Cc: catalin.marinas, will, maz, oliver.upton, joey.gouly,
	suzuki.poulose, yuzenghui, joro, jean-philippe, jgg, mark.rutland,
	qperret, tabba, vdonnefort, sebastianene, keirf, Mostafa Saleh

This is v6 of pKVM SMMUv3 support with trap and emulate

v1: Implements full fledged pv interface
https://lore.kernel.org/kvmarm/20230201125328.2186498-1-jean-philippe@linaro.org/

v2:  Implements full fledged pv interface (+ more features as evtq and s1)
https://lore.kernel.org/kvmarm/20241212180423.1578358-1-smostafa@google.com/

v3: Only DMA isolation (using pv)
https://lore.kernel.org/kvmarm/20250728175316.3706196-1-smostafa@google.com/

v4: Trap and emulate
https://lore.kernel.org/all/20250819215156.2494305-1-smostafa@google.com/

v5: Trap and emulate
https://lore.kernel.org/all/20251117184815.1027271-1-smostafa@google.com/


This series is based on the review feedback on v5 + improvements,
most notably:
- Rebase on ToT which includes the newly merged protected VM support!
- Drop non-coherent support to make the patches smaller, this can be
  added in a later series.
- Re-work the io-pgtable-arm split to rely on iommu-pages [Jason]
- Use the newly added clock for tracing instead of adding new
  functions in the hypervisor.
- Align nesting support with the upstream driver in terms of
  supported IPs [Jason]
- Keep STE hiltless updating when possible [Jason]
- Move some of the refactored code to the c file [Jason]
- Add support for evtq and priq tracking
- Add extra hardening checks, handle failures and massively reduce the
  amount of WARN_ONs and other cleanups.
- Don’t enforce DMA isolation to not regress pKVM booting.

Notes about Sashiko
===================
I ran Sashiko locally and it was helpful in discovering problems in
the series. However, it still shows large number of critical and high
severity issues, I went through them and I believe they are false
positive, mainly because (in the order of frequently reported):
- It doesn’t understand de-privilege and which data is trusted that
  the driver populated at boot (keeps complaining about missing checks
  for zero mmio size..)
- It doesn’t understand WARNs are fatal in the hypervisor.
- It doesn’t understand that a malicious host can DoS the system and
  pKVM doesn’t guarantee availability
- It doesn’t understand the SMMUv3 spec and makes stuff up (eg.  about
  CMD_SYC CS field it makes up an non-existent encoding or wrong
  semantics for the gbpa register)
- It seems to look at one patch at a time and not the whole series, and
  as the series is written in a way to be bisectable that confuses it.
- Sometimes it complains about code which is not related to the change.

Fuad is currently working on updating review prompts to make it work better
with protected KVM [1]

Design:
=======
Assumptions:
------------
One of the important points, is that this doesn’t emulate the full
SMMUv3 architecture, but only the parts used by Linux kernel,
that’s why enablement of this (ARM_SMMU_V3_PKVM) depends on
(ARM_SMMU_V3=y) so we are sure of the driver behaviour.

Any new change in the driver will likely trigger a WARN_ON ending up
in panic, that will require to support also in the hypervisor.

Most notable assumptions:
- Changing of stream table format/size or l2 pointers is not allowed
  after initialization.
- leaf=0 CFGI is not allowed.
- CFGI_ALL with any value but 31 is not allowed.
- Some commands which are not used are not allowed.
- Values set in ARM_SMMU_CR1 are hardcoded and don't change.

Emulation logic mainly targets:

1) Command Queue
----------------
At boot time, the hypervisor will allocate a shadow command queue
(doesn’t need to match the host size) which then sets up in HW, then
it will trap access to

i) ARM_SMMU_CMDQ_BASE
That can only be written when the cmdq is disabled. Then on enable,
the hypervisor will put the host command queue in a shared state to
avoid transition into the hypervisor or VMs. It will be unshared with
the cmdq is disabled

ii) ARM_SMMU_CMDQ_PROD
Trigger emulation code, where the hypervisor will copy the commands
between cons and prod, of the host queue and sanitise them (mostly
WARNs if the host is malicious and issuing commands it shouldn’t)
then eagerly consume them, updating the host cons.

iii) ARM_SMMU_CMDQ_CONS
No much logic, just return the emulated cons + error bits.

2) Stream table
---------------
Similar to the command queue, the first level is allocated at boot
with max possible size, then the hypervisor  will trap access to:
i) ARM_SMMU_STRTAB_BASE/ARM_SMMU_STRTAB_BASE_CFG: Keep track of
   the stream table to put it in a shared state.

On CFGI_STE, the hypervisor will read the STE in scope from the host
copy, shadow L2 pointers if needed and attach stage-2.

3) GBPA
-------
The hypervisor will set GBPA to abort at boot, then any read from the
host will return ABORT and writes are ignored.
If the host tries to clear GBPA, it will look like GBPA is refusing
to update and time out.

4) EVTQ and PRIDQ
No shadowing needed for those queues, but the hypervisor needs to keep
track of them to put them in a shared state so they can’t be used by
the host or the hypervisor.

Bisectibility:
==============
I wrote the patches where most of them are bisectable at run time (so
we can run with a prefix of the series till MMIO emulation, cmdq
emulation, STE or full nested) that was very helpful in debugging,
and I kept it like this to make debugging easier.

Constraints:
============
1) Discovery:
-------------
Only device trees are supported at the moment.
I don’t usually use ACPI, but I can look into adding that later.
(not make this series bigger)

1) Shadow page table
--------------------
Uses page granularity (leaf) for memory, that’s because of the lack
of split_block_unmap() logic. I am currently looking into the
possibility of sharing page tables, if that turned complicated (as
expected) it might be worth to re-add this logic

Boot and Probe ordering:
=======================
The main SMMUv3 MUST be only bound/probed after KVM fully initialises
so it can set up the MMIO emulation.

The KVM SMMUv3 driver is loaded early before KVM init so it can
register itself, during that point it will probe all the SMMUs from the
platform bus and bind them to the driver.

Then at a later init call it will create an auxiliary device per SMMU,
that the main driver will probe. The main driver still relies on this
device(parent) for all driver activity. (Check comment in patch 14.

Future work
===========
1) Sharing page tables will be an interesting optimization, but
   requires dealing with stage-2 page faults (which are handled
   by the kernel), BBM and possibly more complexity.

2) There is currently ongoing work to enable RPM, that will possibly
   enable/disable the SMMU frequently, we might need some optimizations
   to avoid re-shadowing the CMDQ/STE unnecessarily.

3) Add support for non-coherent SMMUs

4) Optimizations (as using block mappings for memory)

Patches overview
=================
The patches are split as follows:

Patches 01-02: Core hypervisor: Dealing with MMIO and timers.
Patches 04-06: Refactoring of io-pgtable-arm and SMMUv3 driver.
Patches 07-10: Hypervisor IOMMU core: pagetable management, dabts.
Patches 11-25: KVM SMMUv3 code.

Tested on Lenovo IdeaCentre mini X and Qemu.

A development branch can be found at [2]

[1] https://github.com/ftabba/review-prompts/commits/local-arm64-kvm/
[2] https://android-kvm.googlesource.com/linux/+/refs/heads/pkvm-smmu-v6


Jean-Philippe Brucker (1):
  iommu/arm-smmu-v3-kvm: Add SMMUv3 driver

Mostafa Saleh (24):
  KVM: arm64: Generalize trace clock
  KVM: arm64: Donate MMIO to the hypervisor
  iommu/arm-smmu-v3: Split code with hyp
  iommu/arm-smmu-v3: Move TLB range invalidation into common code
  iommu/arm-smmu-v3: Move IDR parsing to common functions
  iommu/io-pgtable-arm: Rework to use the iommu-pages API
  KVM: arm64: iommu: Introduce IOMMU driver infrastructure
  KVM: arm64: iommu: Shadow host stage-2 page table
  KVM: arm64: iommu: Add memory pool
  KVM: arm64: iommu: Support DABT for IOMMU
  iommu/arm-smmu-v3-kvm: Add the kernel driver
  iommu/arm-smmu-v3-kvm: Probe SMMU HW
  iommu/arm-smmu-v3-kvm: Add MMIO emulation
  iommu/arm-smmu-v3-kvm: Shadow the command queue
  iommu/arm-smmu-v3-kvm: Add CMDQ functions
  iommu/arm-smmu-v3-kvm: Emulate CMDQ for host
  iommu/arm-smmu-v3-kvm: Shadow stream table
  iommu/arm-smmu-v3-kvm: Shadow STEs
  iommu/arm-smmu-v3-kvm: Share other queues
  iommu/arm-smmu-v3-kvm: Emulate GBPA
  iommu/io-pgtable-arm: Support io-pgtable-arm in the hypervisor
  iommu/arm-smmu-v3-kvm: Shadow the CPU stage-2 page table
  iommu/arm-smmu-v3-kvm: Enable nesting
  KVM: arm64: Add documentation for pKVM DMA isolation

 .../admin-guide/kernel-parameters.txt         |    4 +
 Documentation/virt/kvm/arm/pkvm.rst           |   19 +-
 arch/arm64/include/asm/kvm_host.h             |    6 +
 arch/arm64/kvm/Makefile                       |    2 +-
 arch/arm64/kvm/hyp/include/nvhe/clock.h       |   11 +-
 arch/arm64/kvm/hyp/include/nvhe/iommu.h       |   23 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |    4 +
 arch/arm64/kvm/hyp/nvhe/Makefile              |   13 +-
 arch/arm64/kvm/hyp/nvhe/clock.c               |   44 +-
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c         |  156 ++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  169 ++-
 arch/arm64/kvm/hyp/nvhe/setup.c               |   20 +
 arch/arm64/kvm/hyp/nvhe/trace.c               |    4 +-
 arch/arm64/kvm/hyp/pgtable.c                  |    9 +-
 arch/arm64/kvm/iommu.c                        |   57 +
 arch/arm64/kvm/pkvm.c                         |    1 +
 drivers/iommu/arm/Kconfig                     |    9 +
 drivers/iommu/arm/arm-smmu-v3/Makefile        |    3 +-
 .../arm/arm-smmu-v3/arm-smmu-v3-common-lib.c  |  224 +++
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c   |  232 +++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  387 +----
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  150 ++
 .../iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c  | 1250 +++++++++++++++++
 .../iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h  |   67 +
 drivers/iommu/io-pgtable-arm.c                |   68 +-
 drivers/iommu/io-pgtable-arm.h                |    6 +
 drivers/iommu/iommu-pages.h                   |   99 ++
 27 files changed, 2668 insertions(+), 369 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/iommu.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
 create mode 100644 arch/arm64/kvm/iommu.c
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-common-lib.c
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-kvm.c
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/pkvm/arm_smmu_v3.h

-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2026-05-04 12:30 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-01 11:19 [PATCH v6 00/25] KVM: arm64: SMMUv3 driver for pKVM (trap and emulate) Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 01/25] KVM: arm64: Generalize trace clock Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 02/25] KVM: arm64: Donate MMIO to the hypervisor Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 03/25] iommu/arm-smmu-v3: Split code with hyp Mostafa Saleh
2026-05-01 12:44   ` Jason Gunthorpe
2026-05-04 12:13     ` Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 04/25] iommu/arm-smmu-v3: Move TLB range invalidation into common code Mostafa Saleh
2026-05-01 12:41   ` Jason Gunthorpe
2026-05-04 12:15     ` Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 05/25] iommu/arm-smmu-v3: Move IDR parsing to common functions Mostafa Saleh
2026-05-01 12:47   ` Jason Gunthorpe
2026-05-04 12:16     ` Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 06/25] iommu/io-pgtable-arm: Rework to use the iommu-pages API Mostafa Saleh
2026-05-01 12:24   ` Jason Gunthorpe
2026-05-04 12:19     ` Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 07/25] KVM: arm64: iommu: Introduce IOMMU driver infrastructure Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 08/25] KVM: arm64: iommu: Shadow host stage-2 page table Mostafa Saleh
2026-05-01 13:00   ` Jason Gunthorpe
2026-05-04 12:28     ` Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 09/25] KVM: arm64: iommu: Add memory pool Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 10/25] KVM: arm64: iommu: Support DABT for IOMMU Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 11/25] iommu/arm-smmu-v3-kvm: Add SMMUv3 driver Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 12/25] iommu/arm-smmu-v3-kvm: Add the kernel driver Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 13/25] iommu/arm-smmu-v3-kvm: Probe SMMU HW Mostafa Saleh
2026-05-01 12:51   ` Jason Gunthorpe
2026-05-04 12:30     ` Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 14/25] iommu/arm-smmu-v3-kvm: Add MMIO emulation Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 15/25] iommu/arm-smmu-v3-kvm: Shadow the command queue Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 16/25] iommu/arm-smmu-v3-kvm: Add CMDQ functions Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 17/25] iommu/arm-smmu-v3-kvm: Emulate CMDQ for host Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 18/25] iommu/arm-smmu-v3-kvm: Shadow stream table Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 19/25] iommu/arm-smmu-v3-kvm: Shadow STEs Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 20/25] iommu/arm-smmu-v3-kvm: Share other queues Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 21/25] iommu/arm-smmu-v3-kvm: Emulate GBPA Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 22/25] iommu/io-pgtable-arm: Support io-pgtable-arm in the hypervisor Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 23/25] iommu/arm-smmu-v3-kvm: Shadow the CPU stage-2 page table Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 24/25] iommu/arm-smmu-v3-kvm: Enable nesting Mostafa Saleh
2026-05-01 11:19 ` [PATCH v6 25/25] KVM: arm64: Add documentation for pKVM DMA isolation Mostafa Saleh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox