From: Jason Gunthorpe <jgg@ziepe.ca>
To: Mostafa Saleh <smostafa@google.com>
Cc: iommu@lists.linux.dev, kvmarm@lists.linux.dev,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com,
will@kernel.org, maz@kernel.org, oliver.upton@linux.dev,
joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com,
robdclark@gmail.com, joro@8bytes.org, robin.murphy@arm.com,
jean-philippe@linaro.org, nicolinc@nvidia.com,
vdonnefort@google.com, qperret@google.com, tabba@google.com,
danielmentz@google.com, tzukui@google.com
Subject: Re: [RFC PATCH v2 00/58] KVM: Arm SMMUv3 driver for pKVM
Date: Thu, 12 Dec 2024 15:41:19 -0400 [thread overview]
Message-ID: <20241212194119.GA4679@ziepe.ca> (raw)
In-Reply-To: <20241212180423.1578358-1-smostafa@google.com>
On Thu, Dec 12, 2024 at 06:03:24PM +0000, Mostafa Saleh wrote:
> This series adds a hypervisor driver for the Arm SMMUv3 IOMMU. Since the
> hypervisor part of pKVM (called nVHE here) is minimal, moving the whole
> host SMMU driver into nVHE isn't really an option. It is too large and
> complex and requires infrastructure from all over the kernel. We add a
> reduced nVHE driver that deals with populating the SMMU tables and the
> command queue, and the host driver still deals with probing and some
> initialization.
The cover letter doesn't explain why someone needs page tables in the
guest at all?
If you are able to implement nested support then you can boot the
guest with no-iommu and an effective identity translation through a
hypervisor controlled S2. ie no guest map/unmap. Great DMA
performance.
I thought the point of doing the paravirt here was to allow dynamic
pinning of the guest memory? This is the primary downside with nested.
The entire guest memory has to be pinned down at guest boot.
> 1. Paravirtual I/O page tables
> This is the solution implemented in this series. The host creates
> IOVA->HPA mappings with two hypercalls map_pages() and unmap_pages(), and
> the hypervisor populates the page tables. Page tables are abstracted into
> IOMMU domains, which allow multiple devices to share the same address
> space. Another four hypercalls, alloc_domain(), attach_dev(), detach_dev()
> and free_domain(), manage the domains, the semantics of those hypercalls
> are almost identical to the IOMMU ops which make the kernel driver part
> simpler.
That is re-inventing virtio-iommu. I don't really understand why this
series is hacking up arm-smmuv3 so much, that is not, and should not,
be a paravirt driver. Why not create a clean new pkvm specific driver
for the paravirt?? Or find a way to re-use parts of virtio-iommu?
Shouldn't other arch versions of pkvm be able to re-use the same guest
iommu driver?
> b- Locking: The io-pgtable-arm is lockless under some guarantees of how
> the IOMMU code behaves. However with pKVM, the kernel is not trusted
> and a malicious kernel can issue concurrent requests causing memory
> corruption or UAF, so that it has to be locked in the hypervisor.
? I don't get it, the hypervisor page table has to be private to the
hypervisor. It is not that io-pgtable-arm is lockless, it is that it
relies on a particular kind of caller supplied locking. pkvm's calls
into its private io-pgtable-arm would need pkvm specific locking that
makes sense for it. Where does a malicious guest kernel get into this?
> 2. Nested SMMUv3 translation (with emulation)
> Another approach is to rely on nested translation support which is
> optional in SMMUv3, that requires an architecturally accurate emulation
> of SMMUv3 which can be complicated including cmdq emulation.
The confidential compute folks are going in this direction.
> The trade off between the 2 approaches can be roughly summarised as:
> Paravirtualization:
> - Compatible with more HW (and IOMMUs).
> - Better DMA performance due to shorter table walks/less TLB pressure
> - Needs extra complexity to squeeze the last bit of optimization (around
> unmap, and map_sg).
It has better straight line DMA performance if the DMAs are all
static. Generally much, much worse performance if the DMAs are
dynamically mapped as you have to trap so much stuff.
The other negative is there is no way to get SVA support with
para-virtualization.
The positive is you don't have to pin the VM's memory.
> Nested Emulation
> - Faster map_pages (not sure about unmap because it requires cmdq
> emulation for TLB invalidation if DVM not used).
If you can do nested then you can run in identity mode and then you
don't have any performance down side. It is a complete win.
Even if you do non-idenity nested is still likely faster for changing
translation than paravirt approaches. A single cmdq range invalidate
should be about the same broad overhead as a single paravirt call to
unmap except they can be batched under load.
Things like vCMDQ eliminate this overhead entirely, to my mind that is
the future direction of this HW as you obviously need to HW optimize
invalidation...
> - Needs extra complexity for architecturally emulating SMMUv3.
Lots of people have now done this, it is not really so bad. In
exchange you get a full architected feature set, better performance,
and are ready for HW optimizations.
> - Add IDENTITY_DOMAIN support, I already have some patches for that, but
> didn’t want to complicate this series, I can send them separately.
This seems kind of pointless to me. If you can tolerate identity (ie
pin all memory) then do nested, and maybe don't even bother with a
guest iommu.
If you want most of the guest memory to be swappable/movable/whatever
then paravirt is the only choice, and you really don't want the guest
to have any identiy support at all.
Really, I think you'd want to have both options, there is no "best"
here. It depends what people want to use the VM for.
My advice for merging would be to start with the pkvm side setting up
a fully pinned S2 and do not have a guest driver. Nesting without
emulating smmuv3. Basically you get protected identity DMA support. I
think that would be a much less sprawling patch series. From there it
would be well positioned to add both smmuv3 emulation and a paravirt
iommu flow.
Jason
next prev parent reply other threads:[~2024-12-12 19:41 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-12 18:03 [RFC PATCH v2 00/58] KVM: Arm SMMUv3 driver for pKVM Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 01/58] iommu/io-pgtable-arm: Split the page table driver Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 02/58] iommu/io-pgtable-arm: Split initialization Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 03/58] iommu/io-pgtable: Add configure() operation Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 04/58] iommu/arm-smmu-v3: Move some definitions to arm64 include/ Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 05/58] iommu/arm-smmu-v3: Extract driver-specific bits from probe function Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 06/58] iommu/arm-smmu-v3: Move some functions to arm-smmu-v3-common.c Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 07/58] iommu/arm-smmu-v3: Move queue and table allocation " Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 08/58] iommu/arm-smmu-v3: Move firmware probe to arm-smmu-v3-common Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 09/58] iommu/arm-smmu-v3: Move IOMMU registration to arm-smmu-v3-common.c Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 10/58] iommu/arm-smmu-v3: Move common irq code to common file Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 11/58] KVM: arm64: pkvm: Add pkvm_udelay() Mostafa Saleh
2024-12-19 11:14 ` Quentin Perret
2024-12-19 11:21 ` Mostafa Saleh
2024-12-19 11:28 ` Quentin Perret
2024-12-12 18:03 ` [RFC PATCH v2 12/58] KVM: arm64: Add __pkvm_{use, unuse}_dma() Mostafa Saleh
2024-12-19 11:23 ` Quentin Perret
2024-12-12 18:03 ` [RFC PATCH v2 13/58] KVM: arm64: Introduce IOMMU driver infrastructure Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 14/58] KVM: arm64: pkvm: Add IOMMU hypercalls Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 15/58] KVM: arm64: iommu: Add a memory pool for the IOMMU Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 16/58] KVM: arm64: iommu: Add domains Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 17/58] KVM: arm64: iommu: Add {attach, detach}_dev Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 18/58] KVM: arm64: iommu: Add map/unmap() operations Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 19/58] KVM: arm64: iommu: support iommu_iotlb_gather Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 20/58] KVM: arm64: Support power domains Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 21/58] KVM: arm64: pkvm: Add __pkvm_host_add_remove_page() Mostafa Saleh
2024-12-19 11:10 ` Quentin Perret
2024-12-19 11:19 ` Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 22/58] KVM: arm64: pkvm: Support SCMI power domain Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 23/58] KVM: arm64: iommu: Support power management Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 24/58] KVM: arm64: iommu: Support DABT for IOMMU Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 25/58] KVM: arm64: iommu: Add SMMUv3 driver Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 26/58] KVM: arm64: smmu-v3: Initialize registers Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 27/58] KVM: arm64: smmu-v3: Setup command queue Mostafa Saleh
2025-01-23 13:01 ` Robin Murphy
2025-01-29 11:15 ` Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 28/58] KVM: arm64: smmu-v3: Setup stream table Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 29/58] KVM: arm64: smmu-v3: Setup event queue Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 30/58] KVM: arm64: smmu-v3: Reset the device Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 31/58] KVM: arm64: smmu-v3: Support io-pgtable Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 32/58] KVM: arm64: smmu-v3: Add {alloc/free}_domain Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 33/58] KVM: arm64: smmu-v3: Add TLB ops Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 34/58] KVM: arm64: smmu-v3: Add context descriptor functions Mostafa Saleh
2024-12-12 18:03 ` [RFC PATCH v2 35/58] KVM: arm64: smmu-v3: Add attach_dev Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 36/58] KVM: arm64: smmu-v3: Add detach_dev Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 37/58] iommu/io-pgtable: Generalize walker interface Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 38/58] iommu/io-pgtable-arm: Add post table walker callback Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 39/58] drivers/iommu: io-pgtable: Add IO_PGTABLE_QUIRK_UNMAP_INVAL Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 40/58] KVM: arm64: smmu-v3: Add map/unmap pages and iova_to_phys Mostafa Saleh
2024-12-12 19:44 ` Jason Gunthorpe
2024-12-13 19:48 ` Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 41/58] KVM: arm64: smmu-v3: Add DABT handler Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 42/58] iommu/arm-smmu-v3-kvm: Add host driver for pKVM Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 43/58] iommu/arm-smmu-v3-kvm: Pass a list of SMMU devices to the hypervisor Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 44/58] iommu/arm-smmu-v3-kvm: Validate device features Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 45/58] iommu/arm-smmu-v3-kvm: Allocate structures and reset device Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 46/58] KVM: arm64: Add function to topup generic allocator Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 47/58] KVM: arm64: Add macro for SMCCC call with all returns Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 48/58] iommu/arm-smmu-v3-kvm: Add function to topup IOMMU allocator Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 49/58] iommu/arm-smmu-v3-kvm: Add IOMMU ops Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 50/58] iommu/arm-smmu-v3-kvm: Add map, unmap and iova_to_phys operations Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 51/58] iommu/arm-smmu-v3-kvm: Support PASID operations Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 52/58] iommu/arm-smmu-v3-kvm: Add IRQs for the driver Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 53/58] iommu/arm-smmu-v3-kvm: Probe power domains Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 54/58] iommu/arm-smmu-v3-kvm: Enable runtime PM Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 55/58] drivers/iommu: Add deferred map_sg operations Mostafa Saleh
2024-12-19 12:48 ` Robin Murphy
2024-12-19 14:24 ` Mostafa Saleh
2025-01-02 20:18 ` Jason Gunthorpe
2025-01-03 15:35 ` Mostafa Saleh
2025-01-03 15:47 ` Jason Gunthorpe
2025-01-08 12:13 ` Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 56/58] KVM: arm64: iommu: Add hypercall for map_sg Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 57/58] iommu/arm-smmu-v3-kvm: Implement sg operations Mostafa Saleh
2024-12-12 18:04 ` [RFC PATCH v2 58/58] iommu/arm-smmu-v3-kvm: Support command queue batching Mostafa Saleh
2024-12-12 19:41 ` Jason Gunthorpe [this message]
2024-12-13 19:39 ` [RFC PATCH v2 00/58] KVM: Arm SMMUv3 driver for pKVM Mostafa Saleh
2025-01-02 20:16 ` Jason Gunthorpe
2025-01-08 12:09 ` Mostafa Saleh
2025-01-16 6:39 ` Tian, Kevin
2025-01-16 19:14 ` Jason Gunthorpe
2025-01-17 6:57 ` Tian, Kevin
2025-01-22 11:04 ` Mostafa Saleh
2025-01-22 16:20 ` Jason Gunthorpe
2025-01-22 17:17 ` Mostafa Saleh
2025-01-22 19:16 ` Jason Gunthorpe
2025-01-23 8:13 ` Tian, Kevin
2025-01-29 12:16 ` Mostafa Saleh
2025-01-16 8:51 ` Tian, Kevin
2025-01-22 11:28 ` Mostafa Saleh
2025-01-23 8:25 ` Tian, Kevin
2025-01-29 12:21 ` Mostafa Saleh
2025-01-29 13:50 ` Jason Gunthorpe
2025-01-29 14:08 ` Mostafa Saleh
2025-02-18 9:52 ` Tian, Kevin
2025-01-16 19:19 ` Jason Gunthorpe
2025-01-22 11:46 ` Mostafa Saleh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241212194119.GA4679@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=catalin.marinas@arm.com \
--cc=danielmentz@google.com \
--cc=iommu@lists.linux.dev \
--cc=jean-philippe@linaro.org \
--cc=joey.gouly@arm.com \
--cc=joro@8bytes.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=nicolinc@nvidia.com \
--cc=oliver.upton@linux.dev \
--cc=qperret@google.com \
--cc=robdclark@gmail.com \
--cc=robin.murphy@arm.com \
--cc=smostafa@google.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=tzukui@google.com \
--cc=vdonnefort@google.com \
--cc=will@kernel.org \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox