All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mostafa Saleh <smostafa@google.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	maz@kernel.org, oliver.upton@linux.dev, joey.gouly@arm.com,
	suzuki.poulose@arm.com, yuzenghui@huawei.com,
	catalin.marinas@arm.com, will@kernel.org, robin.murphy@arm.com,
	jean-philippe@linaro.org, qperret@google.com, tabba@google.com,
	mark.rutland@arm.com, praan@google.com
Subject: Re: [PATCH v3 29/29] iommu/arm-smmu-v3-kvm: Add IOMMU ops
Date: Wed, 6 Aug 2025 14:10:35 +0000	[thread overview]
Message-ID: <aJNiW48DdXIAFz8r@google.com> (raw)
In-Reply-To: <20250805175753.GY26511@ziepe.ca>

On Tue, Aug 05, 2025 at 02:57:53PM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 04, 2025 at 02:41:31PM +0000, Mostafa Saleh wrote:
> > > Are you saying the event queue is left behind for the kernel? How does
> > > that work if it doesn't have access to the registers?
> > 
> > The evtq itself will be owned by the kernel, However, MMIO access would be
> > trapped and emulated, here the PoC for part-2 of this series (as mentioned in
> > the cover letter) This is very close to how nesting will work.
> > https://android-kvm.googlesource.com/linux/+/refs/heads/for-upstream/pkvm-smmu-v3-part-2/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c#744
> 
> Oh weird, but Ok.
> 
> > > In other words you have two cleanly seperate concerns here, an "pkvm
> > > iommu subsystem" that lets pkvm control iommu HW - and the current
> > > "iommu subsystem" that lets the kernel control iommu HW. The same
> > > driver should not register to both.
> > > 
> > 
> > I am not sure how that would work exactly, for example how would probe_device
> > work, xlate... in a generic way?
> 
> Well, I think it is not so bad actually.
> 
> You just need to call iommu_device_register
> 
> 	ret = iommu_device_register(&smmu->iommu, &arm_smmu_ops, dev);
> 
> Where 'dev' is always the smmu struct device, even if your current
> probing driver is not the smmu device. That will capture all the iommu
> activity the ACPI/DT says is linked to that dev.
> 
> From there you just make a normal small iommu driver with no
> connection back to the SMMUv3.
> 
> Eg you could spawn an aux device from smmuv3 to do this with a far
> cleaner separation.
> 
> xlate is just calling iommu_fwspec_add_ids() it is one line.

I am not sure I understand, the SMMU driver will register its IOMMU
ops to probe the devices, then faux devices would also register
IOMMU ops for the KVM HVCs?
But that means that all DMA operations would still go through the
SMMU one?

> 
> > same for other ops. We can make some of these
> > functions (hypercalls wrappers) in a separate file. 
> 
> What other ops are you worried about?
> 
> static struct iommu_ops arm_smmu_ops = {
> 	.identity_domain	= &arm_smmu_identity_domain,
> 	.blocked_domain		= &arm_smmu_blocked_domain,
> 	.release_device		= arm_smmu_release_device,
> 	.domain_alloc_paging_flags = arm_smmu_domain_alloc_paging_flags,
> 
> 	^^ Those are all your new domain type that does the hypercalls
> 
> 	.probe_device		= arm_smmu_probe_device,
> 	.get_resv_regions	= arm_smmu_get_resv_regions,
> 
>   	^^ These are pretty empty
> 
> 	.device_group		= arm_smmu_device_group,
> 	.of_xlate		= arm_smmu_of_xlate,
> 
>         ^^ Common code
> 
> Don't need these:
> 
> 	.capable		= arm_smmu_capable,
> 	.hw_info		= arm_smmu_hw_info,
> 	.domain_alloc_sva       = arm_smmu_sva_domain_alloc,
> 	.page_response		= arm_smmu_page_response,
> 	.def_domain_type	= arm_smmu_def_domain_type,
> 	.get_viommu_size	= arm_smmu_get_viommu_size,
> 	.viommu_init		= arm_vsmmu_init,
> 	.user_pasid_table	= 1,
> 	.pgsize_bitmap		= -1UL, /* Restricted during device attach */
> 	.owner			= THIS_MODULE,

Makes sense, thanks for the detailed explanation.

> 
> > Also I am not sure how that
> > looks from the kernel perspective (do we have 2 struct devices per SMMU?)
> 
> I think you'd want to have pkvm bound to the physical struct device
> and then spawn new faux, aux or something devices for the virtualized
> IOMMUs that probes the new paravirt driver. This driver would be fully
> self contained.

I think it’s hard to reason about this as 2 devices, from my pov it seems
that the pKVM HVCs are a library that can be part of separate common file,
then called from drivers. (with common ops)
Instead of having extra complexity of 2 drivers (KVM and IOMMU PV).
However, I can see the value of that as it totally abstracts the iommu ops
outside the device specific code, I will give it more thought.
But it feels that might be more suitable for a full fledged PV
implementation (as in RFC v1 and v2).

> 
> > I think we are on the same page about how that will look at the end.
> > For nesting there will be a pKVM driver (as mentioned in the cover letter)
> > to probe register the SMMUs, then it will unbind itself to let the
> > current
> 
> That sounds a little freaky to me. I guess it can be made to work, and
> is maybe the best option, but if this is the longterm plan then
> starting out with a non-iommu subsystem pkvm driver seems like a good
> idea.
> 
> Today the pkvm driver would do enough to boot up pkvm in this limited
> mode, and maybe you have some small code duplications with the iommu
> driver. It forks off a aux device to create the para-virt pkvm iommu
> subsystem driver.
> 
> Tomorrow you further shrink down the pkvm driver and remove the
> para-virt driver, then just somehow bind/unbind the pkvm one at early
> boot.
> 
> Minimal changes to the existing smmu driver in either case.
> 
> > Then the hypervisor driver will use trap and emulate to handle SMMU access
> > to the MMIO, providing an architecturally accurate SMMUv3 emualation, and
> > it will not register iommu_ops.
> 
> Well, it registers normal smmuv3 ops only, I think you mean.

I mean when nesting is added, the arm-smmu-v3-kvm, will not call
“iommu_device_register”

> 
> > So, I will happily drop the hypercalls and the iommu_ops from this series,
> > if there is a better way to enlighten the hypervisor about the SIDs to be
> > in identity.
> 
> The iommu ops are a reasonable and appropriate way to do this.
> 
> But since pkvm is all so special anyhow maybe you could hook
> pci_enable_device() and do your identity hypercall there? Do you
> already trap the config write to catch Bus Master Enable? Can pkvm
> just set identity when BME=1 and ABORT when BME=0?

pKVM doesn’t trap actual device access, also we have to support
platform devices, so it would be better to rely on existing
abstractions as “probe_device”


I had an offline discussion with Will and Robin and they believe it might
be better if we get rid of the kernel KVM SMMUv3 driver at all, and just
rely on ARM_SMMU_V3 + extra hooks, so there is a single driver managing
the SMMUs in the system.
This way we don’t need to split current SMMUv3 or have different IOMMU ops,
and reduces some of the duplication, also that avoids the need for a fake device.

Then we have an extra file for KVM with some of the hooks (similar to the
hooks in arm_smmu_impl_ops we have for tegra)

And that might be more suitable for nesting also, to avoid the bind/unbind flow.

I will investigate that and if feasible I will send v4 (hopefully
shortly) based on this idea, otherwise I will see if we can separate
KVM code and SMMU bootstrap code.


Thanks,
Mostafa

> 
> Jason

  reply	other threads:[~2025-08-06 14:10 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-28 17:52 [PATCH v3 00/29] KVM: arm64: SMMUv3 driver for pKVM Mostafa Saleh
2025-07-28 17:52 ` [PATCH v3 01/29] KVM: arm64: Add a new function to donate memory with prot Mostafa Saleh
2025-07-28 17:52 ` [PATCH v3 02/29] KVM: arm64: Donate MMIO to the hypervisor Mostafa Saleh
2025-07-28 17:52 ` [PATCH v3 03/29] KVM: arm64: pkvm: Add pkvm_time_get() Mostafa Saleh
2025-07-28 17:52 ` [PATCH v3 04/29] iommu/io-pgtable-arm: Split the page table driver Mostafa Saleh
2025-07-28 17:52 ` [PATCH v3 05/29] iommu/io-pgtable-arm: Split initialization Mostafa Saleh
2025-07-28 17:52 ` [PATCH v3 06/29] iommu/arm-smmu-v3: Move some definitions to a new common file Mostafa Saleh
2025-07-28 17:52 ` [PATCH v3 07/29] iommu/arm-smmu-v3: Extract driver-specific bits from probe function Mostafa Saleh
2025-07-28 17:52 ` [PATCH v3 08/29] iommu/arm-smmu-v3: Move some functions to arm-smmu-v3-common.c Mostafa Saleh
2025-07-28 17:52 ` [PATCH v3 09/29] iommu/arm-smmu-v3: Move queue and table allocation " Mostafa Saleh
2025-07-28 17:52 ` [PATCH v3 10/29] iommu/arm-smmu-v3: Move firmware probe to arm-smmu-v3-common Mostafa Saleh
2025-07-28 17:52 ` [PATCH v3 11/29] iommu/arm-smmu-v3: Move IOMMU registration to arm-smmu-v3-common.c Mostafa Saleh
2025-07-28 17:52 ` [PATCH v3 12/29] iommu/arm-smmu-v3: Split cmdq code with hyp Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 13/29] iommu/arm-smmu-v3: Move TLB range invalidation into a macro Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 14/29] KVM: arm64: iommu: Introduce IOMMU driver infrastructure Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 15/29] KVM: arm64: iommu: Shadow host stage-2 page table Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 16/29] KVM: arm64: iommu: Add a memory pool Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 17/29] KVM: arm64: iommu: Add enable/disable hypercalls Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 18/29] iommu/arm-smmu-v3-kvm: Add SMMUv3 driver Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 19/29] iommu/arm-smmu-v3-kvm: Initialize registers Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 20/29] iommu/arm-smmu-v3-kvm: Setup command queue Mostafa Saleh
2025-07-29  6:44   ` Krzysztof Kozlowski
2025-07-29  9:55     ` Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 21/29] iommu/arm-smmu-v3-kvm: Setup stream table Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 22/29] iommu/arm-smmu-v3-kvm: Reset the device Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 23/29] iommu/arm-smmu-v3-kvm: Support io-pgtable Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 24/29] iommu/arm-smmu-v3-kvm: Shadow the CPU stage-2 page table Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 25/29] iommu/arm-smmu-v3-kvm: Add enable/disable device HVCs Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 26/29] iommu/arm-smmu-v3-kvm: Add host driver for pKVM Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 27/29] iommu/arm-smmu-v3-kvm: Pass a list of SMMU devices to the hypervisor Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 28/29] iommu/arm-smmu-v3-kvm: Allocate structures and reset device Mostafa Saleh
2025-07-28 17:53 ` [PATCH v3 29/29] iommu/arm-smmu-v3-kvm: Add IOMMU ops Mostafa Saleh
2025-07-30 14:42   ` Jason Gunthorpe
2025-07-30 15:07     ` Mostafa Saleh
2025-07-30 16:47       ` Jason Gunthorpe
2025-07-31 14:17         ` Mostafa Saleh
2025-07-31 16:57           ` Jason Gunthorpe
2025-07-31 17:44             ` Mostafa Saleh
2025-08-01 18:59               ` Jason Gunthorpe
2025-08-04 14:41                 ` Mostafa Saleh
2025-08-05 17:57                   ` Jason Gunthorpe
2025-08-06 14:10                     ` Mostafa Saleh [this message]
2025-08-11 18:55                       ` Jason Gunthorpe
2025-08-12 10:29                         ` Mostafa Saleh
2025-08-12 12:10                           ` Jason Gunthorpe
2025-08-12 12:37                             ` Mostafa Saleh
2025-08-12 13:48                               ` Jason Gunthorpe
2025-08-13 13:52                                 ` Mostafa Saleh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aJNiW48DdXIAFz8r@google.com \
    --to=smostafa@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=iommu@lists.linux.dev \
    --cc=jean-philippe@linaro.org \
    --cc=jgg@ziepe.ca \
    --cc=joey.gouly@arm.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=praan@google.com \
    --cc=qperret@google.com \
    --cc=robin.murphy@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.