* Re: [PATCH v6 04/25] iommu/arm-smmu-v3: Move TLB range invalidation into common code [not found] ` <20260501124143.GB6912@ziepe.ca> @ 2026-05-04 12:15 ` Mostafa Saleh 2026-05-05 16:17 ` Jason Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Mostafa Saleh @ 2026-05-04 12:15 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Fri, May 01, 2026 at 09:41:43AM -0300, Jason Gunthorpe wrote: > On Fri, May 01, 2026 at 11:19:06AM +0000, Mostafa Saleh wrote: > > Range TLB invalidation has a very specific algorithm. Instead of > > re-writing it for the hypervisor, move it to a function that can > > be re-used. > > I think this is too narrow. > > You should start at __arm_smmu_domain_inv_range() and shove all of > that callchain into a new file "arm-smmuv3-tlbi.c" which you can then > double compile for pkvm. > > pkvm would have to present the tlbi description and the invs array > which shouldn't be hard for it. Then it will enjoy all the same > hypervisor optimizations we are working on for the normal driver. > > I am about to send a patch series here for iommupt that significantly > alters this. I think it will help your pkvm effort as the invalidation > entry point becomes significantly decoupled from the > iommu subsystem: > > static void arm_smmu_domain_tlbi_inv(struct arm_smmu_tlbi *tlbi, > struct arm_smmu_invs *invs) > > struct arm_smmu_tlbi { > struct arm_smmu_domain *smmu_domain; // Can be removed > unsigned long start; > unsigned long last; > u8 leaf_levels_bitmap; > u8 table_levels_bitmap; > }; > I am not sure if it’s worth it, the hypervisor is much simpler, there is a single page table, it’s locked (also identity mapped), it’s updated on VM boot/teardown only, we don’t even use iotlb_gather at the moment, although possible but I wanted to keep this series as simple as I can then we can add more features later. So this patch is the least intrusive change, as whatever the main SMMUv3 driver does, the range tlb invalidation logic is the same. But I am happy to experiment with that when posted. Thanks, Mostafa > Which pkvm should have no trouble invoking. It has to build an invs, > but I guess that is pretty simple and done once at boot for pkvm? > > Once done all the fiddly bits about building the commands would be > shared. There is really no reason this should differ anyhow. > > https://github.com/jgunthorpe/linux/commits/iommu_pt_arm64/ > > cover-letter: Organize SMMUv3 the invalidation flow so iommupt can use it > > Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 04/25] iommu/arm-smmu-v3: Move TLB range invalidation into common code 2026-05-04 12:15 ` [PATCH v6 04/25] iommu/arm-smmu-v3: Move TLB range invalidation into common code Mostafa Saleh @ 2026-05-05 16:17 ` Jason Gunthorpe 2026-05-05 16:43 ` Mostafa Saleh 0 siblings, 1 reply; 25+ messages in thread From: Jason Gunthorpe @ 2026-05-05 16:17 UTC (permalink / raw) To: Mostafa Saleh Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Mon, May 04, 2026 at 12:15:17PM +0000, Mostafa Saleh wrote: > I am not sure if it’s worth it, the hypervisor is much simpler, there > is a single page table, it’s locked (also identity mapped), it’s > updated on VM boot/teardown only, we don’t even use iotlb_gather at > the moment, although possible but I wanted to keep this series as > simple as I can then we can add more features later. > So this patch is the least intrusive change, as whatever the main SMMUv3 > driver does, the range tlb invalidation logic is the same. > But I am happy to experiment with that when posted. Okay, then maybe just always push a full invalidation? I didn't like seeing the range invalidation lifted out, especially how ugly that will turn into after my next series. So, don't use range or use the proper full gather flow seem like the right two options. I'm not so interested in minimal change, but maximum forward maintainability. It is much easier to manage if you double compile more of the driver exactly the same, and call functions in a fairly normal way vs make special cases and special functions.. Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 04/25] iommu/arm-smmu-v3: Move TLB range invalidation into common code 2026-05-05 16:17 ` Jason Gunthorpe @ 2026-05-05 16:43 ` Mostafa Saleh 2026-05-06 9:53 ` Jason Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Mostafa Saleh @ 2026-05-05 16:43 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Tue, May 05, 2026 at 01:17:09PM -0300, Jason Gunthorpe wrote: > On Mon, May 04, 2026 at 12:15:17PM +0000, Mostafa Saleh wrote: > > > I am not sure if it’s worth it, the hypervisor is much simpler, there > > is a single page table, it’s locked (also identity mapped), it’s > > updated on VM boot/teardown only, we don’t even use iotlb_gather at > > the moment, although possible but I wanted to keep this series as > > simple as I can then we can add more features later. > > So this patch is the least intrusive change, as whatever the main SMMUv3 > > driver does, the range tlb invalidation logic is the same. > > But I am happy to experiment with that when posted. > > Okay, then maybe just always push a full invalidation? Like full address space invalidation? that will not be optimal and will affect every device on the system, why would we do that if we know the address? > > I didn't like seeing the range invalidation lifted out, especially how > ugly that will turn into after my next series. So, don't use range or > use the proper full gather flow seem like the right two options. > > I'm not so interested in minimal change, but maximum forward > maintainability. It is much easier to manage if you double compile > more of the driver exactly the same, and call functions in a fairly > normal way vs make special cases and special functions.. Let me try with your series, but the core range invalidation shouldn’t be changed often as it is tied to the hardware and wrapping it in a macro is reasonable (just like the CPU tlb invalidation) The last time this code was changed was in 2023. I am happy to use higher level functions to improve the driver maintainability, but I don’t see what is the problem at the moment. Thanks, Mostafa > > Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 04/25] iommu/arm-smmu-v3: Move TLB range invalidation into common code 2026-05-05 16:43 ` Mostafa Saleh @ 2026-05-06 9:53 ` Jason Gunthorpe 2026-05-07 9:40 ` Mostafa Saleh 0 siblings, 1 reply; 25+ messages in thread From: Jason Gunthorpe @ 2026-05-06 9:53 UTC (permalink / raw) To: Mostafa Saleh Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Tue, May 05, 2026 at 04:43:13PM +0000, Mostafa Saleh wrote: > On Tue, May 05, 2026 at 01:17:09PM -0300, Jason Gunthorpe wrote: > > On Mon, May 04, 2026 at 12:15:17PM +0000, Mostafa Saleh wrote: > > > > > I am not sure if it’s worth it, the hypervisor is much simpler, there > > > is a single page table, it’s locked (also identity mapped), it’s > > > updated on VM boot/teardown only, we don’t even use iotlb_gather at > > > the moment, although possible but I wanted to keep this series as > > > simple as I can then we can add more features later. > > > So this patch is the least intrusive change, as whatever the main SMMUv3 > > > driver does, the range tlb invalidation logic is the same. > > > But I am happy to experiment with that when posted. > > > > Okay, then maybe just always push a full invalidation? > > Like full address space invalidation? that will not be optimal and > will affect every device on the system, why would we do that if we > know the address? Not every device, just wipe the VMID. If you say it is rare then keep it simple. If you need to be narrow then use the proper infastructure with a gather. No reason to make something boutique for pkvm here. Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 04/25] iommu/arm-smmu-v3: Move TLB range invalidation into common code 2026-05-06 9:53 ` Jason Gunthorpe @ 2026-05-07 9:40 ` Mostafa Saleh 2026-05-09 23:29 ` Jason Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Mostafa Saleh @ 2026-05-07 9:40 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Wed, May 06, 2026 at 06:53:00AM -0300, Jason Gunthorpe wrote: > On Tue, May 05, 2026 at 04:43:13PM +0000, Mostafa Saleh wrote: > > On Tue, May 05, 2026 at 01:17:09PM -0300, Jason Gunthorpe wrote: > > > On Mon, May 04, 2026 at 12:15:17PM +0000, Mostafa Saleh wrote: > > > > > > > I am not sure if it’s worth it, the hypervisor is much simpler, there > > > > is a single page table, it’s locked (also identity mapped), it’s > > > > updated on VM boot/teardown only, we don’t even use iotlb_gather at > > > > the moment, although possible but I wanted to keep this series as > > > > simple as I can then we can add more features later. > > > > So this patch is the least intrusive change, as whatever the main SMMUv3 > > > > driver does, the range tlb invalidation logic is the same. > > > > But I am happy to experiment with that when posted. > > > > > > Okay, then maybe just always push a full invalidation? > > > > Like full address space invalidation? that will not be optimal and > > will affect every device on the system, why would we do that if we > > know the address? > > Not every device, just wipe the VMID. If you say it is rare then keep > it simple. If you need to be narrow then use the proper infastructure > with a gather. No reason to make something boutique for pkvm here. But all devices share the same VMID, which impacts all devices on the system, and is not that rare to ignore. I can add gather support, it’s not that complicated, it is currently supported in Android. But that doesn’t solve the problem, which is: At some point, whether eagerly from the page table code, through gather sync or a fancy invalidation array, the driver will need to populate a range invalidation command (tg, ttl, scale...) and this logic is better shared with the main driver which is this patch does. Thanks, Mostafa > > Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 04/25] iommu/arm-smmu-v3: Move TLB range invalidation into common code 2026-05-07 9:40 ` Mostafa Saleh @ 2026-05-09 23:29 ` Jason Gunthorpe 2026-05-11 11:45 ` Mostafa Saleh 0 siblings, 1 reply; 25+ messages in thread From: Jason Gunthorpe @ 2026-05-09 23:29 UTC (permalink / raw) To: Mostafa Saleh Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Thu, May 07, 2026 at 09:40:00AM +0000, Mostafa Saleh wrote: > But that doesn’t solve the problem, which is: At some point, whether > eagerly from the page table code, through gather sync or a fancy > invalidation array, the driver will need to populate a range > invalidation command (tg, ttl, scale...) and this logic is better > shared with the main driver which is this patch does. My point is this patch doesn't share enough. If you do need to issue invalidations then share everything below the top level tlbi entry point and don't try to make a pkvm version of the entire logic just by ripping out the range logic. There is no reason for pkvm to need a different algorithm here. Especially when you get to supporting ATS and multiple devices and smmus you may as well just use the whole thing. Which is why I suggested to copy the entire call chain into a shared file Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 04/25] iommu/arm-smmu-v3: Move TLB range invalidation into common code 2026-05-09 23:29 ` Jason Gunthorpe @ 2026-05-11 11:45 ` Mostafa Saleh 2026-05-11 14:24 ` Jason Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Mostafa Saleh @ 2026-05-11 11:45 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Sat, May 09, 2026 at 08:29:31PM -0300, Jason Gunthorpe wrote: > On Thu, May 07, 2026 at 09:40:00AM +0000, Mostafa Saleh wrote: > > But that doesn’t solve the problem, which is: At some point, whether > > eagerly from the page table code, through gather sync or a fancy > > invalidation array, the driver will need to populate a range > > invalidation command (tg, ttl, scale...) and this logic is better > > shared with the main driver which is this patch does. > > My point is this patch doesn't share enough. If you do need to issue > invalidations then share everything below the top level tlbi entry > point and don't try to make a pkvm version of the entire logic just by > ripping out the range logic. > > There is no reason for pkvm to need a different algorithm > here. Especially when you get to supporting ATS and multiple devices > and smmus you may as well just use the whole thing. > > Which is why I suggested to copy the entire call chain into a shared > file Agh, actually this seires doesn't deal with ATS, which I think is wrong, propably we have to issue CMDQ_OP_ATC_INV for the whole space on every S2 invalidation which has to be done per-SID and as it can't be done by VMID :/ or just hide ATS support from host for now, I will look into more for v7. But anyway, we don’t have to share any logic, the kernel driver is quite complicated as it is designed for a different use-case. Doing that makes the hypervisor unnecessary complicated and oversharing this logic makes the kernel driver less maintainable IMO. The hypervisor only needs to share the architecture spec of populating commands. If that’s a problem. I am happy to re-write the RIL part in the hypervisor. It's not that complicated, but I thought it makes sense to share as it is a HW contract. Otherwise, we can break this part into some macros that are called by this function and the hypervisor but that's maybe over-engineered. Thanks, Mostafa > > Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 04/25] iommu/arm-smmu-v3: Move TLB range invalidation into common code 2026-05-11 11:45 ` Mostafa Saleh @ 2026-05-11 14:24 ` Jason Gunthorpe 0 siblings, 0 replies; 25+ messages in thread From: Jason Gunthorpe @ 2026-05-11 14:24 UTC (permalink / raw) To: Mostafa Saleh Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Mon, May 11, 2026 at 11:45:51AM +0000, Mostafa Saleh wrote: > On Sat, May 09, 2026 at 08:29:31PM -0300, Jason Gunthorpe wrote: > > On Thu, May 07, 2026 at 09:40:00AM +0000, Mostafa Saleh wrote: > > > But that doesn’t solve the problem, which is: At some point, whether > > > eagerly from the page table code, through gather sync or a fancy > > > invalidation array, the driver will need to populate a range > > > invalidation command (tg, ttl, scale...) and this logic is better > > > shared with the main driver which is this patch does. > > > > My point is this patch doesn't share enough. If you do need to issue > > invalidations then share everything below the top level tlbi entry > > point and don't try to make a pkvm version of the entire logic just by > > ripping out the range logic. > > > > There is no reason for pkvm to need a different algorithm > > here. Especially when you get to supporting ATS and multiple devices > > and smmus you may as well just use the whole thing. > > > > Which is why I suggested to copy the entire call chain into a shared > > file > > Agh, actually this seires doesn't deal with ATS, which I think is > wrong, propably we have to issue CMDQ_OP_ATC_INV for the whole > space on every S2 invalidation which has to be done per-SID and > as it can't be done by VMID :/ or just hide ATS support from host for > now, I will look into more for v7. Hiding from the host is a fine solution to start with. > But anyway, we don’t have to share any logic, the kernel driver > is quite complicated as it is designed for a different use-case. > Doing that makes the hypervisor unnecessary complicated and > oversharing this logic makes the kernel driver less maintainable IMO. invalidation is complicated, you should not try to open code your own version. You really cannot make it any simpler than what is in the driver, just use the code it is already decently modular. Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20260501111928.259252-6-smostafa@google.com>]
[parent not found: <20260501124716.GD6912@ziepe.ca>]
* Re: [PATCH v6 05/25] iommu/arm-smmu-v3: Move IDR parsing to common functions [not found] ` <20260501124716.GD6912@ziepe.ca> @ 2026-05-04 12:16 ` Mostafa Saleh 2026-05-05 16:27 ` Jason Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Mostafa Saleh @ 2026-05-04 12:16 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Fri, May 01, 2026 at 09:47:16AM -0300, Jason Gunthorpe wrote: > On Fri, May 01, 2026 at 11:19:07AM +0000, Mostafa Saleh wrote: > > Move parsing of IDRs to functions so that it can be re-used > > +unsigned long smmu_idr5_to_pgsize(u32 reg) > > +{ > > + unsigned long pgsize_bitmap = 0; > > + > > + if (reg & IDR5_GRAN64K) > > + pgsize_bitmap |= SZ_64K | SZ_512M; > > + if (reg & IDR5_GRAN16K) > > + pgsize_bitmap |= SZ_16K | SZ_32M; > > + if (reg & IDR5_GRAN4K) > > + pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G; > > + return pgsize_bitmap; > > +} > > I think this should include: > > > + smmu->oas = smmu_idr5_to_oas(reg); > > + if (smmu->oas == 52) > > smmu->pgsize_bitmap |= 1ULL << 42; /* 4TB */ > > - break; > > ie it should return the supported page sizes by inspecting all the > idrs and don't leave this tricky bit to be open coded.. This way was easier as each function only returns one thing, otherwise we have to pass stuff by address as we can’t pass the smmu struct as it is not shared between the drivers. But no strong opinion, I can change that. Thanks, Mostafa > > Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 05/25] iommu/arm-smmu-v3: Move IDR parsing to common functions 2026-05-04 12:16 ` [PATCH v6 05/25] iommu/arm-smmu-v3: Move IDR parsing to common functions Mostafa Saleh @ 2026-05-05 16:27 ` Jason Gunthorpe 2026-05-05 16:48 ` Mostafa Saleh 0 siblings, 1 reply; 25+ messages in thread From: Jason Gunthorpe @ 2026-05-05 16:27 UTC (permalink / raw) To: Mostafa Saleh Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Mon, May 04, 2026 at 12:16:13PM +0000, Mostafa Saleh wrote: > On Fri, May 01, 2026 at 09:47:16AM -0300, Jason Gunthorpe wrote: > > On Fri, May 01, 2026 at 11:19:07AM +0000, Mostafa Saleh wrote: > > > Move parsing of IDRs to functions so that it can be re-used > > > +unsigned long smmu_idr5_to_pgsize(u32 reg) > > > +{ > > > + unsigned long pgsize_bitmap = 0; > > > + > > > + if (reg & IDR5_GRAN64K) > > > + pgsize_bitmap |= SZ_64K | SZ_512M; > > > + if (reg & IDR5_GRAN16K) > > > + pgsize_bitmap |= SZ_16K | SZ_32M; > > > + if (reg & IDR5_GRAN4K) > > > + pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G; > > > + return pgsize_bitmap; > > > +} > > > > I think this should include: > > > > > + smmu->oas = smmu_idr5_to_oas(reg); > > > + if (smmu->oas == 52) > > > smmu->pgsize_bitmap |= 1ULL << 42; /* 4TB */ > > > - break; > > > > ie it should return the supported page sizes by inspecting all the > > idrs and don't leave this tricky bit to be open coded.. > > This way was easier as each function only returns one thing, otherwise > we have to pass stuff by address as we can’t pass the smmu struct as it > is not shared between the drivers. > But no strong opinion, I can change that. Could you share the struct? With multi-compilation you do get two structs with the same name and different layout, it is OK? That would give a cleaner setup too because you can just shove all this into a function and have it setup the same struct which is more a copy and paste than break up into little functions? Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 05/25] iommu/arm-smmu-v3: Move IDR parsing to common functions 2026-05-05 16:27 ` Jason Gunthorpe @ 2026-05-05 16:48 ` Mostafa Saleh 2026-05-06 9:56 ` Jason Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Mostafa Saleh @ 2026-05-05 16:48 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Tue, May 05, 2026 at 01:27:45PM -0300, Jason Gunthorpe wrote: > On Mon, May 04, 2026 at 12:16:13PM +0000, Mostafa Saleh wrote: > > On Fri, May 01, 2026 at 09:47:16AM -0300, Jason Gunthorpe wrote: > > > On Fri, May 01, 2026 at 11:19:07AM +0000, Mostafa Saleh wrote: > > > > Move parsing of IDRs to functions so that it can be re-used > > > > +unsigned long smmu_idr5_to_pgsize(u32 reg) > > > > +{ > > > > + unsigned long pgsize_bitmap = 0; > > > > + > > > > + if (reg & IDR5_GRAN64K) > > > > + pgsize_bitmap |= SZ_64K | SZ_512M; > > > > + if (reg & IDR5_GRAN16K) > > > > + pgsize_bitmap |= SZ_16K | SZ_32M; > > > > + if (reg & IDR5_GRAN4K) > > > > + pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G; > > > > + return pgsize_bitmap; > > > > +} > > > > > > I think this should include: > > > > > > > + smmu->oas = smmu_idr5_to_oas(reg); > > > > + if (smmu->oas == 52) > > > > smmu->pgsize_bitmap |= 1ULL << 42; /* 4TB */ > > > > - break; > > > > > > ie it should return the supported page sizes by inspecting all the > > > idrs and don't leave this tricky bit to be open coded.. > > > > This way was easier as each function only returns one thing, otherwise > > we have to pass stuff by address as we can’t pass the smmu struct as it > > is not shared between the drivers. > > But no strong opinion, I can change that. > > Could you share the struct? With multi-compilation you do get two > structs with the same name and different layout, it is OK? > > That would give a cleaner setup too because you can just shove all > this into a function and have it setup the same struct which is more a > copy and paste than break up into little functions? Yes, that’s possible, but I didn't want to contaminate the main struct, also we can’t compile the main one for KVM because of things such as “struct device, struct mutex...” May we can do: - struct arm_smmu_device_common: Basic stuff (ioaddress, features, oas...) - arm_smmu_device: For the main driver which inherits arm_smmu_device_common - arm_smmu_nested_device for KVM which also inherits arm_smmu_device_common What do you think? Thanks, Mostafa > > Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 05/25] iommu/arm-smmu-v3: Move IDR parsing to common functions 2026-05-05 16:48 ` Mostafa Saleh @ 2026-05-06 9:56 ` Jason Gunthorpe 2026-05-07 10:13 ` Mostafa Saleh 0 siblings, 1 reply; 25+ messages in thread From: Jason Gunthorpe @ 2026-05-06 9:56 UTC (permalink / raw) To: Mostafa Saleh Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Tue, May 05, 2026 at 04:48:08PM +0000, Mostafa Saleh wrote: > On Tue, May 05, 2026 at 01:27:45PM -0300, Jason Gunthorpe wrote: > > On Mon, May 04, 2026 at 12:16:13PM +0000, Mostafa Saleh wrote: > > > On Fri, May 01, 2026 at 09:47:16AM -0300, Jason Gunthorpe wrote: > > > > On Fri, May 01, 2026 at 11:19:07AM +0000, Mostafa Saleh wrote: > > > > > Move parsing of IDRs to functions so that it can be re-used > > > > > +unsigned long smmu_idr5_to_pgsize(u32 reg) > > > > > +{ > > > > > + unsigned long pgsize_bitmap = 0; > > > > > + > > > > > + if (reg & IDR5_GRAN64K) > > > > > + pgsize_bitmap |= SZ_64K | SZ_512M; > > > > > + if (reg & IDR5_GRAN16K) > > > > > + pgsize_bitmap |= SZ_16K | SZ_32M; > > > > > + if (reg & IDR5_GRAN4K) > > > > > + pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G; > > > > > + return pgsize_bitmap; > > > > > +} > > > > > > > > I think this should include: > > > > > > > > > + smmu->oas = smmu_idr5_to_oas(reg); > > > > > + if (smmu->oas == 52) > > > > > smmu->pgsize_bitmap |= 1ULL << 42; /* 4TB */ > > > > > - break; > > > > > > > > ie it should return the supported page sizes by inspecting all the > > > > idrs and don't leave this tricky bit to be open coded.. > > > > > > This way was easier as each function only returns one thing, otherwise > > > we have to pass stuff by address as we can’t pass the smmu struct as it > > > is not shared between the drivers. > > > But no strong opinion, I can change that. > > > > Could you share the struct? With multi-compilation you do get two > > structs with the same name and different layout, it is OK? > > > > That would give a cleaner setup too because you can just shove all > > this into a function and have it setup the same struct which is more a > > copy and paste than break up into little functions? > > Yes, that’s possible, but I didn't want to contaminate the main struct, > also we can’t compile the main one for KVM because of things such as > “struct device, struct mutex...” > May we can do: > - struct arm_smmu_device_common: Basic stuff (ioaddress, features, > oas...) > - arm_smmu_device: For the main driver which inherits arm_smmu_device_common > - arm_smmu_nested_device for KVM which also inherits arm_smmu_device_common Less keen on this, it would need more usages to justify it > What do you think? I was thinking more like #define arm_smmu arm_pkvm_smmu Before the pkvm compile of the shared code. How does pkvm compilation work anyhow? Is it all rolled into a unique link, or do you have to worry about symbol conflicts with the main kernel? Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 05/25] iommu/arm-smmu-v3: Move IDR parsing to common functions 2026-05-06 9:56 ` Jason Gunthorpe @ 2026-05-07 10:13 ` Mostafa Saleh 2026-05-09 23:34 ` Jason Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Mostafa Saleh @ 2026-05-07 10:13 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Wed, May 06, 2026 at 06:56:31AM -0300, Jason Gunthorpe wrote: > On Tue, May 05, 2026 at 04:48:08PM +0000, Mostafa Saleh wrote: > > On Tue, May 05, 2026 at 01:27:45PM -0300, Jason Gunthorpe wrote: > > > On Mon, May 04, 2026 at 12:16:13PM +0000, Mostafa Saleh wrote: > > > > On Fri, May 01, 2026 at 09:47:16AM -0300, Jason Gunthorpe wrote: > > > > > On Fri, May 01, 2026 at 11:19:07AM +0000, Mostafa Saleh wrote: > > > > > > Move parsing of IDRs to functions so that it can be re-used > > > > > > +unsigned long smmu_idr5_to_pgsize(u32 reg) > > > > > > +{ > > > > > > + unsigned long pgsize_bitmap = 0; > > > > > > + > > > > > > + if (reg & IDR5_GRAN64K) > > > > > > + pgsize_bitmap |= SZ_64K | SZ_512M; > > > > > > + if (reg & IDR5_GRAN16K) > > > > > > + pgsize_bitmap |= SZ_16K | SZ_32M; > > > > > > + if (reg & IDR5_GRAN4K) > > > > > > + pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G; > > > > > > + return pgsize_bitmap; > > > > > > +} > > > > > > > > > > I think this should include: > > > > > > > > > > > + smmu->oas = smmu_idr5_to_oas(reg); > > > > > > + if (smmu->oas == 52) > > > > > > smmu->pgsize_bitmap |= 1ULL << 42; /* 4TB */ > > > > > > - break; > > > > > > > > > > ie it should return the supported page sizes by inspecting all the > > > > > idrs and don't leave this tricky bit to be open coded.. > > > > > > > > This way was easier as each function only returns one thing, otherwise > > > > we have to pass stuff by address as we can’t pass the smmu struct as it > > > > is not shared between the drivers. > > > > But no strong opinion, I can change that. > > > > > > Could you share the struct? With multi-compilation you do get two > > > structs with the same name and different layout, it is OK? > > > > > > That would give a cleaner setup too because you can just shove all > > > this into a function and have it setup the same struct which is more a > > > copy and paste than break up into little functions? > > > > Yes, that’s possible, but I didn't want to contaminate the main struct, > > also we can’t compile the main one for KVM because of things such as > > “struct device, struct mutex...” > > May we can do: > > - struct arm_smmu_device_common: Basic stuff (ioaddress, features, > > oas...) > > - arm_smmu_device: For the main driver which inherits arm_smmu_device_common > > - arm_smmu_nested_device for KVM which also inherits arm_smmu_device_common > > Less keen on this, it would need more usages to justify it > > > What do you think? > > I was thinking more like > > #define arm_smmu arm_pkvm_smmu > > Before the pkvm compile of the shared code. Oh, so we rely on arm_pkvm_smmu and arm_smmu having the same field names and types? I don’t think that’s better for maintainability, I can look into it if Robin and Will are OK with that. > > How does pkvm compilation work anyhow? Is it all rolled into a unique > link, or do you have to worry about symbol conflicts with the main > kernel? All pkvm symbols are prefixed with __kvm_nvhe_ in the main kernel. Thanks, Mostafa > > Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 05/25] iommu/arm-smmu-v3: Move IDR parsing to common functions 2026-05-07 10:13 ` Mostafa Saleh @ 2026-05-09 23:34 ` Jason Gunthorpe 2026-05-11 11:53 ` Mostafa Saleh 0 siblings, 1 reply; 25+ messages in thread From: Jason Gunthorpe @ 2026-05-09 23:34 UTC (permalink / raw) To: Mostafa Saleh Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Thu, May 07, 2026 at 10:13:13AM +0000, Mostafa Saleh wrote: > > I was thinking more like > > > > #define arm_smmu arm_pkvm_smmu > > > > Before the pkvm compile of the shared code. > > Oh, so we rely on arm_pkvm_smmu and arm_smmu having the same field > names and types? Just for the shared code, yes. For instance this patch would require the features field be present in both, and then all you do is directly copy all the smmu->features logic into the shared file and that's it. > I don’t think that’s better for maintainability, I can look into it > if Robin and Will are OK with that. I think maintainability here means pkvm minimizes how much it changes the code of the main driver. Copying a bunch of functions into a shared .c file exactly as it is, then compiling the shared file with some #ifdef'ery at is going to be long term better than trying to mangle the whole thing to avoid using any of the core types and not directly share the code, IMHO. Like my tlbi point for instance, it is not a good idea to rip out a little bit of the invalidation logic then open code a pkvm version. Use the exact driver code. But to do this you need to pass an arm_smmu through to to the pkvm specific queue management functions. > > How does pkvm compilation work anyhow? Is it all rolled into a unique > > link, or do you have to worry about symbol conflicts with the main > > kernel? > > All pkvm symbols are prefixed with __kvm_nvhe_ in the main kernel. So automatically with objcopy or something like that during the build? Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 05/25] iommu/arm-smmu-v3: Move IDR parsing to common functions 2026-05-09 23:34 ` Jason Gunthorpe @ 2026-05-11 11:53 ` Mostafa Saleh 2026-05-11 14:30 ` Jason Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Mostafa Saleh @ 2026-05-11 11:53 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Sun, May 10, 2026 at 12:34 AM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Thu, May 07, 2026 at 10:13:13AM +0000, Mostafa Saleh wrote: > > > I was thinking more like > > > > > > #define arm_smmu arm_pkvm_smmu > > > > > > Before the pkvm compile of the shared code. > > > > Oh, so we rely on arm_pkvm_smmu and arm_smmu having the same field > > names and types? > > Just for the shared code, yes. > > For instance this patch would require the features field be present in > both, and then all you do is directly copy all the smmu->features > logic into the shared file and that's it. > > > I don’t think that’s better for maintainability, I can look into it > > if Robin and Will are OK with that. > > I think maintainability here means pkvm minimizes how much it changes > the code of the main driver. > > Copying a bunch of functions into a shared .c file exactly as it is, > then compiling the shared file with some #ifdef'ery at is going to be > long term better than trying to mangle the whole thing to avoid using > any of the core types and not directly share the code, IMHO. > There isn't #ifdef'ery at the moment, that's what I was trying to avoid by introducing shared code. I believe that can be treated as a library as most of the shared code is about HW description (commands, invalidation, IDR probing...) > Like my tlbi point for instance, it is not a good idea to rip out a > little bit of the invalidation logic then open code a pkvm > version. Use the exact driver code. But to do this you need to pass an > arm_smmu through to to the pkvm specific queue management functions. > What concerns me is how fragile that is, any change in the main struct can easily break the hypervisor, unlike if we have a clear shared code and defined API that is used by 2 entities. I will think more about this before v7 and see how intrusive it is. > > > How does pkvm compilation work anyhow? Is it all rolled into a unique > > > link, or do you have to worry about symbol conflicts with the main > > > kernel? > > > > All pkvm symbols are prefixed with __kvm_nvhe_ in the main kernel. > > So automatically with objcopy or something like that during the build? > Exactly: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kvm/hyp/nvhe/Makefile#n96 Thanks, Mostafa > Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 05/25] iommu/arm-smmu-v3: Move IDR parsing to common functions 2026-05-11 11:53 ` Mostafa Saleh @ 2026-05-11 14:30 ` Jason Gunthorpe 0 siblings, 0 replies; 25+ messages in thread From: Jason Gunthorpe @ 2026-05-11 14:30 UTC (permalink / raw) To: Mostafa Saleh Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf > > Copying a bunch of functions into a shared .c file exactly as it is, > > then compiling the shared file with some #ifdef'ery at is going to be > > long term better than trying to mangle the whole thing to avoid using > > any of the core types and not directly share the code, IMHO. > > > > There isn't #ifdef'ery at the moment, that's what I was trying to > avoid by introducing shared code. Yeah, I think that may be a bad direction. Ultimately it feels like it will be more burden to maintain this careful split, while some small list of carefully selected #defines will let you reuse alot more with no code changes and I think that is ultimately going to be better. > What concerns me is how fragile that is, any change in the main struct > can easily break the hypervisor, unlike if we have a clear shared code > and defined API that is used by 2 entities. > I will think more about this before v7 and see how intrusive it is. IMHO so long as it is easy to include pkvm in the compilation I see no issue with build testing the pkvm driver when working on smmuv3 driver. So I'm not worried about this, any breaks will be compile breaks and can be delt with. What I'd like is to minimize logic changes and maximimize re-use so you don't have to make bad re-implementations. Like pkvm shouldn't be building a weaker tlbi, it shouldn't have different logic for errata and FEAT, it shouldn't be doing STE changes without the hitless logic, etc, etc. All these things are easier to solve with greater direct code re-use.. Thus I feel the trade of off 'use the code with no changed via #define' is better than 'try to carefully cut away and avoid #define' Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20260501111928.259252-7-smostafa@google.com>]
[parent not found: <20260501122424.GA6912@ziepe.ca>]
* Re: [PATCH v6 06/25] iommu/io-pgtable-arm: Rework to use the iommu-pages API [not found] ` <20260501122424.GA6912@ziepe.ca> @ 2026-05-04 12:19 ` Mostafa Saleh 2026-05-09 23:21 ` Jason Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Mostafa Saleh @ 2026-05-04 12:19 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Fri, May 01, 2026 at 09:24:24AM -0300, Jason Gunthorpe wrote: > On Fri, May 01, 2026 at 11:19:08AM +0000, Mostafa Saleh wrote: > > To prepare for supporting io-pgtable-arm in the pKVM hypervisor, > > we need to abstract away standard kernel allocations, frees, virt/phys > > conversions, and DMA API mapping. > > > > This patch introduces a set of generic wrappers in iommu-pages.h: > > - iommu_alloc_data > > - iommu_free_data > > - iommu_virt_to_phys > > - iommu_phys_to_virt > > - iommu_pages_dma_map > > - iommu_pages_dma_mapping_error > > - iommu_pages_dma_unmap > > Wah? This has nothing to do with iommu pages? This just leaking > everything iommu pages abstracted out :( > > When I said to use iommu-pages, I meant to use the existing API, not a > completely different one. > > From an iommu-pages perspective the issue is this code open codes > dma_map_single()/etc instead of using the API surface > iommu_pages_start_incoherent() > > This is annoying to fix beacuse the external allocator messes it up, > but I think with some #ifdef you can probably fix it up. > > So.. I suggest you update it to use the iommu_pages API, #ifdef out > the allocator so the pkvm pkvm doesn't need to deal with it. Then > compile a special iommu-pages for the pkvm side presenting the same > API. I see, we still need to leave the DMA-API calls for the custom config, as I am not sure if it can use pages not backed by the vmemmap, I pushed that into a separate function so it’s easily compiled out. Without this patch, now it looks like: diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 0208e5897c29..1583b9916b09 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -248,26 +248,15 @@ static dma_addr_t __arm_lpae_dma_addr(void *pages) return (dma_addr_t)virt_to_phys(pages); } -static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, - struct io_pgtable_cfg *cfg, - void *cookie) +static void *__arm_lpae_cfg_alloc(size_t size, gfp_t gfp, + struct io_pgtable_cfg *cfg, + void *cookie) { struct device *dev = cfg->iommu_dev; - size_t alloc_size; dma_addr_t dma; void *pages; - /* - * For very small starting-level translation tables the HW requires a - * minimum alignment of at least 64 to cover all cases. - */ - alloc_size = max(size, 64); - if (cfg->alloc) - pages = cfg->alloc(cookie, alloc_size, gfp); - else - pages = iommu_alloc_pages_node_sz(dev_to_node(dev), gfp, - alloc_size); - + pages = cfg->alloc(cookie, size, gfp); if (!pages) return NULL; @@ -289,26 +278,67 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, out_unmap: dev_err(dev, "Cannot accommodate DMA translation for IOMMU page tables\n"); dma_unmap_single(dev, dma, size, DMA_TO_DEVICE); - out_free: - if (cfg->free) - cfg->free(cookie, pages, size); - else - iommu_free_pages(pages); - + cfg->free(cookie, pages, size); return NULL; } -static void __arm_lpae_free_pages(void *pages, size_t size, - struct io_pgtable_cfg *cfg, - void *cookie) +static void __arm_lpae_cfg_free(void *pages, size_t size, + struct io_pgtable_cfg *cfg, + void *cookie) { if (!cfg->coherent_walk) dma_unmap_single(cfg->iommu_dev, __arm_lpae_dma_addr(pages), size, DMA_TO_DEVICE); - if (cfg->free) - cfg->free(cookie, pages, size); + cfg->free(cookie, pages, size); +} + +static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, + struct io_pgtable_cfg *cfg, + void *cookie) +{ + struct device *dev = cfg->iommu_dev; + size_t alloc_size; + void *pages; + + /* + * For very small starting-level translation tables the HW requires a + * minimum alignment of at least 64 to cover all cases. + */ + alloc_size = max(size, 64); + if (cfg->alloc) + return __arm_lpae_cfg_alloc(alloc_size, gfp, cfg, cookie); + + pages = iommu_alloc_pages_node_sz(dev_to_node(dev), gfp, alloc_size); + if (!pages) + return NULL; + + if (!cfg->coherent_walk) { + int ret = iommu_pages_start_incoherent(pages, dev); + + if (ret) { + if (ret == -EOPNOTSUPP) + dev_err(dev, "Cannot accommodate DMA translation for IOMMU page tables\n"); + iommu_free_pages(pages); + return NULL; + } + } + + return pages; +} + +static void __arm_lpae_free_pages(void *pages, size_t size, + struct io_pgtable_cfg *cfg, + void *cookie) +{ + if (cfg->free) { + __arm_lpae_cfg_free(pages, size, cfg, cookie); + return; + } + + if (!cfg->coherent_walk) + iommu_pages_free_incoherent(pages, cfg->iommu_dev); else iommu_free_pages(pages); } Thanks, Mostafa > > You should have a pkvm shim header that provides > kmalloc/kfree/virt_to_phys in the normal way and just #include that in > io-pgtable when doing a pkvm build instead of hacking up all the code. Ok, I can do that in another change, but I believe it's better to change the usage in this file to arm_lpae_*(virt_to_phys...) so it's clear which parts are intended for that. Thanks, Mostafa > > Jason ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v6 06/25] iommu/io-pgtable-arm: Rework to use the iommu-pages API 2026-05-04 12:19 ` [PATCH v6 06/25] iommu/io-pgtable-arm: Rework to use the iommu-pages API Mostafa Saleh @ 2026-05-09 23:21 ` Jason Gunthorpe 2026-05-11 11:16 ` Mostafa Saleh 0 siblings, 1 reply; 25+ messages in thread From: Jason Gunthorpe @ 2026-05-09 23:21 UTC (permalink / raw) To: Mostafa Saleh Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Mon, May 04, 2026 at 12:19:37PM +0000, Mostafa Saleh wrote: > > So.. I suggest you update it to use the iommu_pages API, #ifdef out > > the allocator so the pkvm pkvm doesn't need to deal with it. Then > > compile a special iommu-pages for the pkvm side presenting the same > > API. > > I see, we still need to leave the DMA-API calls for the custom config, > as I am not sure if it can use pages not backed by the vmemmap, I > pushed that into a separate function so it’s easily compiled out. Yeah.. > Without this patch, now it looks like: Seems reasonable, and then i'd probably just put something like #define dma_map(...) #define dma_umap(...) To effectively take it out of the pkvm build Then have a pkvm compile of iommu-pages to provide the functions in a pkvm compatible way. I guess you can't actually fully do this, but can do enough to support this file at least. > > You should have a pkvm shim header that provides > > kmalloc/kfree/virt_to_phys in the normal way and just #include that in > > io-pgtable when doing a pkvm build instead of hacking up all the code. > > Ok, I can do that in another change, but I believe it's better to > change the usage in this file to arm_lpae_*(virt_to_phys...) so it's > clear which parts are intended for that. IDK, why? virt_to_phys() is part of the iommu-pages API, I'd just leave it.. If you want to narrow it then #define it for pkvm when compiling this file.. Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 06/25] iommu/io-pgtable-arm: Rework to use the iommu-pages API 2026-05-09 23:21 ` Jason Gunthorpe @ 2026-05-11 11:16 ` Mostafa Saleh 2026-05-11 14:18 ` Jason Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Mostafa Saleh @ 2026-05-11 11:16 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Sat, May 09, 2026 at 08:21:55PM -0300, Jason Gunthorpe wrote: > On Mon, May 04, 2026 at 12:19:37PM +0000, Mostafa Saleh wrote: > > > So.. I suggest you update it to use the iommu_pages API, #ifdef out > > > the allocator so the pkvm pkvm doesn't need to deal with it. Then > > > compile a special iommu-pages for the pkvm side presenting the same > > > API. > > > > I see, we still need to leave the DMA-API calls for the custom config, > > as I am not sure if it can use pages not backed by the vmemmap, I > > pushed that into a separate function so it’s easily compiled out. > > Yeah.. > > > Without this patch, now it looks like: > > Seems reasonable, and then i'd probably just put something like > #define dma_map(...) > #define dma_umap(...) > > To effectively take it out of the pkvm build > > Then have a pkvm compile of iommu-pages to provide the functions in a > pkvm compatible way. I guess you can't actually fully do this, but > can do enough to support this file at least. > > > > You should have a pkvm shim header that provides > > > kmalloc/kfree/virt_to_phys in the normal way and just #include that in > > > io-pgtable when doing a pkvm build instead of hacking up all the code. > > > > Ok, I can do that in another change, but I believe it's better to > > change the usage in this file to arm_lpae_*(virt_to_phys...) so it's > > clear which parts are intended for that. > > IDK, why? virt_to_phys() is part of the iommu-pages API, I'd just > leave it.. If you want to narrow it then #define it for pkvm when > compiling this file.. It is not going to be part of the iommu-pages API, I meant in io-pgtable-arm, we will use something arm_lpae_virt_to_phys()... which is then implemented differently for pkvm. Thanks, Mostafa > > Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 06/25] iommu/io-pgtable-arm: Rework to use the iommu-pages API 2026-05-11 11:16 ` Mostafa Saleh @ 2026-05-11 14:18 ` Jason Gunthorpe 0 siblings, 0 replies; 25+ messages in thread From: Jason Gunthorpe @ 2026-05-11 14:18 UTC (permalink / raw) To: Mostafa Saleh Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Mon, May 11, 2026 at 11:16:47AM +0000, Mostafa Saleh wrote: > > IDK, why? virt_to_phys() is part of the iommu-pages API, I'd just > > leave it.. If you want to narrow it then #define it for pkvm when > > compiling this file.. > > It is not going to be part of the iommu-pages API, I meant in > io-pgtable-arm, we will use something arm_lpae_virt_to_phys()... > which is then implemented differently for pkvm. Again why? I think the main goal should be to not mess up the normal code. #define virt_to_phys pkvm_virt_to_phys Does that, we should be leaning into this pattern I think, not adding unnecessary churn... If anything is needed then it should be an iommu_pages function to wrapper virt_to_phys() for use by iommu_pages uses but I'd rather not.. Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20260501111928.259252-9-smostafa@google.com>]
[parent not found: <20260501130006.GF6912@ziepe.ca>]
* Re: [PATCH v6 08/25] KVM: arm64: iommu: Shadow host stage-2 page table [not found] ` <20260501130006.GF6912@ziepe.ca> @ 2026-05-04 12:28 ` Mostafa Saleh 2026-05-09 23:27 ` Jason Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Mostafa Saleh @ 2026-05-04 12:28 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Fri, May 01, 2026 at 10:00:06AM -0300, Jason Gunthorpe wrote: > On Fri, May 01, 2026 at 11:19:10AM +0000, Mostafa Saleh wrote: > > Create a page-table for the IOMMU that shadows the host CPU stage-2 > > to establish DMA isolation. > > Is there a reason you can't just use the CPU S2 for the iommu? > > ie the CCA RMM is doing that, it is how ARM imagined this stuff would > work. > > Once you start supporting DMA like this you have no choice but to keep > a fully populated at all times S2 around, why not use that for the CPU > too to avoid faults? > > I guess there is a reason, but maybe explain in the commit message? > > It sure would be simpler, you wouldn't have to mess with iopgtable at > all... Sharing the page table is tricky, this is something I have been thinking about and my plan was to work on it after this series as it has some constraints and would require core KVM changes. So far this is the list of requirements/changes needed share the stage-2 page table (besides the obvious: same page table format, granularity, endianness...) 1) HW BBM is not supported in the hypervisor page table, that’s because it can generate TLB conflict aborts, which the hypervisor can not handle because of the limited syndrome information. We can rely on FEAT_BBML3 which was newly introduced to work around that, it’s quite niche and not supported in KVM yet or have an allow list similar to the kernel (as in cpu_supports_bbml2_noabort()) which also limits the number of CPUs that can run this. 2) Handling page faults, devices must be able to stall and let the hypervisor handle the page fault (which has to proxy through the kernel as the hypervisor doesn’t handle interrupts), this includes also IO page faults which are hard to get right from the HW which and may lead to system stability issues or lockups. Alternatively, we can pin the stage-2 pages, that would require some hypercalls, hacks to the driver/IOMMU API and possibly new semantics in the DMA-API for IDENTITY devices as they will still need to pin the pages as they are actually in stage-2 translation and not bypass. 3) SMMUv3 must be coherent. 4) Support BTM/DVM for TLB invalidation, otherwise some hooks are still required (although not io-pgtable-arm) This is not the complete list, I am sure I will run into more issues when prototyping this. IMO, 1, 2 are the most tricky parts. It's more work and runs on very limited systems, However, it can be implemented as an optimization) which is my plan. I am not sure how CCA deals with that, I’d expect they have a lot of constraints on CPUs/SMMUs and DMA capable devices on those systems. Thanks, Mostafa > > Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 08/25] KVM: arm64: iommu: Shadow host stage-2 page table 2026-05-04 12:28 ` [PATCH v6 08/25] KVM: arm64: iommu: Shadow host stage-2 page table Mostafa Saleh @ 2026-05-09 23:27 ` Jason Gunthorpe 2026-05-11 11:24 ` Mostafa Saleh 0 siblings, 1 reply; 25+ messages in thread From: Jason Gunthorpe @ 2026-05-09 23:27 UTC (permalink / raw) To: Mostafa Saleh Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Mon, May 04, 2026 at 12:28:55PM +0000, Mostafa Saleh wrote: > So far this is the list of requirements/changes needed share the > stage-2 page table (besides the obvious: same page table format, > granularity, endianness...) > > 1) HW BBM is not supported in the hypervisor page table, that’s > because it can generate TLB conflict aborts, which the hypervisor > can not handle because of the limited syndrome information. > We can rely on FEAT_BBML3 which was newly introduced to work > around that, it’s quite niche and not supported in KVM yet or > have an allow list similar to the kernel > (as in cpu_supports_bbml2_noabort()) which also limits the number > of CPUs that can run this. Do you think pkvm will need BBM? Hitless replace of a PTE is already a pretty advanced feature and the SMMU has its own support matrix there too. Is it for shared/private conversion? > 2) Handling page faults, devices must be able to stall and let the > hypervisor handle the page fault (which has to proxy through the > kernel as the hypervisor doesn’t handle interrupts), this includes > also IO page faults which are hard to get right from the HW which > and may lead to system stability issues or lockups. No.. once you turn on IO like this you don't have page faults anymore. Everything must be permantently mapped into the SMMU view, it can never be made non-present and you must run without page faults. That's what you have in the io-pgtable constructed table, right? > Alternatively, we can pin the stage-2 pages, that would require some > hypercalls, hacks to the driver/IOMMU API and possibly new semantics > in the DMA-API for IDENTITY devices as they will still need to pin > the pages as they are actually in stage-2 translation and not bypass. ?? Then how does this series work? > 3) SMMUv3 must be coherent. Yes for sure. > 4) Support BTM/DVM for TLB invalidation, otherwise some hooks are > still required (although not io-pgtable-arm) SW needs to forward invalidations, BTM is rare.. > IMO, 1, 2 are the most tricky parts. It's more work and runs on very > limited systems, However, it can be implemented as an optimization) > which is my plan. I think unless you can do it without these HW features (excluding 3) don't bother. > I am not sure how CCA deals with that, I’d expect they have a lot of > constraints on CPUs/SMMUs and DMA capable devices on those systems. 3 is not supported. The entire S2 is permanently mapped and doesn't really change alot at runtime. No page faults, not sure if the RMM private/shard conversion would require BMM.. Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 08/25] KVM: arm64: iommu: Shadow host stage-2 page table 2026-05-09 23:27 ` Jason Gunthorpe @ 2026-05-11 11:24 ` Mostafa Saleh 2026-05-11 14:22 ` Jason Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Mostafa Saleh @ 2026-05-11 11:24 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Sat, May 09, 2026 at 08:27:14PM -0300, Jason Gunthorpe wrote: > On Mon, May 04, 2026 at 12:28:55PM +0000, Mostafa Saleh wrote: > > So far this is the list of requirements/changes needed share the > > stage-2 page table (besides the obvious: same page table format, > > granularity, endianness...) > > > > 1) HW BBM is not supported in the hypervisor page table, that’s > > because it can generate TLB conflict aborts, which the hypervisor > > can not handle because of the limited syndrome information. > > We can rely on FEAT_BBML3 which was newly introduced to work > > around that, it’s quite niche and not supported in KVM yet or > > have an allow list similar to the kernel > > (as in cpu_supports_bbml2_noabort()) which also limits the number > > of CPUs that can run this. > > Do you think pkvm will need BBM? Hitless replace of a PTE is already a > pretty advanced feature and the SMMU has its own support matrix there > too. Is it for shared/private conversion? Yes, we can break block on memory donation which is transfer of ownership to the hypervisor or a guest. > > > 2) Handling page faults, devices must be able to stall and let the > > hypervisor handle the page fault (which has to proxy through the > > kernel as the hypervisor doesn’t handle interrupts), this includes > > also IO page faults which are hard to get right from the HW which > > and may lead to system stability issues or lockups. > > No.. once you turn on IO like this you don't have page faults > anymore. Everything must be permantently mapped into the SMMU view, it > can never be made non-present and you must run without page > faults. That's what you have in the io-pgtable constructed table, > right? Exactly, but the CPU page table doesn’t guarantee that, so we either have to handle page faults in the IOMMU, or completely change how KVM deals with stage-2 if we want to share the page table with the CPU. > > > Alternatively, we can pin the stage-2 pages, that would require some > > hypercalls, hacks to the driver/IOMMU API and possibly new semantics > > in the DMA-API for IDENTITY devices as they will still need to pin > > the pages as they are actually in stage-2 translation and not bypass. > > ?? Then how does this series work? This series works fine as it shadows the page table and doesn't share it with the CPU, so it fully populates the address space. > > > 3) SMMUv3 must be coherent. > > Yes for sure. > > > 4) Support BTM/DVM for TLB invalidation, otherwise some hooks are > > still required (although not io-pgtable-arm) > > SW needs to forward invalidations, BTM is rare.. > > > IMO, 1, 2 are the most tricky parts. It's more work and runs on very > > limited systems, However, it can be implemented as an optimization) > > which is my plan. > > I think unless you can do it without these HW features (excluding 3) > don't bother. I am looking into this now, but as I mentioned that will be a separate RFC following this one as an optimization for advanced HW. Thanks, Mostafa > > > I am not sure how CCA deals with that, I’d expect they have a lot of > > constraints on CPUs/SMMUs and DMA capable devices on those systems. > > 3 is not supported. The entire S2 is permanently mapped and doesn't > really change alot at runtime. No page faults, not sure if the RMM > private/shard conversion would require BMM.. > > Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 08/25] KVM: arm64: iommu: Shadow host stage-2 page table 2026-05-11 11:24 ` Mostafa Saleh @ 2026-05-11 14:22 ` Jason Gunthorpe 0 siblings, 0 replies; 25+ messages in thread From: Jason Gunthorpe @ 2026-05-11 14:22 UTC (permalink / raw) To: Mostafa Saleh Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Mon, May 11, 2026 at 11:24:14AM +0000, Mostafa Saleh wrote: > On Sat, May 09, 2026 at 08:27:14PM -0300, Jason Gunthorpe wrote: > > On Mon, May 04, 2026 at 12:28:55PM +0000, Mostafa Saleh wrote: > > > So far this is the list of requirements/changes needed share the > > > stage-2 page table (besides the obvious: same page table format, > > > granularity, endianness...) > > > > > > 1) HW BBM is not supported in the hypervisor page table, that’s > > > because it can generate TLB conflict aborts, which the hypervisor > > > can not handle because of the limited syndrome information. > > > We can rely on FEAT_BBML3 which was newly introduced to work > > > around that, it’s quite niche and not supported in KVM yet or > > > have an allow list similar to the kernel > > > (as in cpu_supports_bbml2_noabort()) which also limits the number > > > of CPUs that can run this. > > > > Do you think pkvm will need BBM? Hitless replace of a PTE is already a > > pretty advanced feature and the SMMU has its own support matrix there > > too. Is it for shared/private conversion? > > Yes, we can break block on memory donation which is transfer of > ownership to the hypervisor or a guest. So you need BBM support on the SMMU too? That is probably a big problem because the SMMU is often mismatched to the CPU :\ Also io-pgtable arm cannot trigger BBM behaviors, so how do you implement it? > > No.. once you turn on IO like this you don't have page faults > > anymore. Everything must be permantently mapped into the SMMU view, it > > can never be made non-present and you must run without page > > faults. That's what you have in the io-pgtable constructed table, > > right? > > Exactly, but the CPU page table doesn’t guarantee that, so we either > have to handle page faults in the IOMMU, or completely change how KVM > deals with stage-2 if we want to share the page table with the CPU. So that's the real explanation, KVM cannot manage the S2 in the right way so you can't share it. RMM/etc are managing the S2 without pointless page faults so they can share it. > > > Alternatively, we can pin the stage-2 pages, that would require some > > > hypercalls, hacks to the driver/IOMMU API and possibly new semantics > > > in the DMA-API for IDENTITY devices as they will still need to pin > > > the pages as they are actually in stage-2 translation and not bypass. > > > > ?? Then how does this series work? > > This series works fine as it shadows the page table and doesn't share it > with the CPU, so it fully populates the address space. Which is why it is so weird that KVM is using a partially populated S2 when there is, and must, be a fully populated one for the SMMU. But I understand there are reasons fo rthis. Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20260501111928.259252-14-smostafa@google.com>]
[parent not found: <20260501125148.GE6912@ziepe.ca>]
* Re: [PATCH v6 13/25] iommu/arm-smmu-v3-kvm: Probe SMMU HW [not found] ` <20260501125148.GE6912@ziepe.ca> @ 2026-05-04 12:30 ` Mostafa Saleh 0 siblings, 0 replies; 25+ messages in thread From: Mostafa Saleh @ 2026-05-04 12:30 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-arm-kernel, linux-kernel, kvmarm, iommu, catalin.marinas, will, maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui, joro, jean-philippe, mark.rutland, qperret, tabba, vdonnefort, sebastianene, keirf On Fri, May 01, 2026 at 09:51:48AM -0300, Jason Gunthorpe wrote: > On Fri, May 01, 2026 at 11:19:15AM +0000, Mostafa Saleh wrote: > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > index 61e6ab364086..157acde0436d 100644 > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > @@ -4738,12 +4738,6 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu) > > return 0; > > } > > > > -#define IIDR_IMPLEMENTER_ARM 0x43b > > -#define IIDR_PRODUCTID_ARM_MMU_600 0x483 > > -#define IIDR_PRODUCTID_ARM_MMU_700 0x487 > > -#define IIDR_PRODUCTID_ARM_MMU_L1 0x48a > > -#define IIDR_PRODUCTID_ARM_MMU_S3 0x498 > > - > > static void arm_smmu_device_iidr_probe(struct arm_smmu_device *smmu) > > { > > u32 reg; > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > > index 64618299d03a..f904f4d19609 100644 > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > > @@ -84,6 +84,12 @@ struct arm_vsmmu; > > #define IIDR_REVISION GENMASK(15, 12) > > #define IIDR_IMPLEMENTER GENMASK(11, 0) > > > > +#define IIDR_IMPLEMENTER_ARM 0x43b > > +#define IIDR_PRODUCTID_ARM_MMU_600 0x483 > > +#define IIDR_PRODUCTID_ARM_MMU_700 0x487 > > +#define IIDR_PRODUCTID_ARM_MMU_L1 0x48a > > +#define IIDR_PRODUCTID_ARM_MMU_S3 0x498 > > + > > #define ARM_SMMU_AIDR 0x1C > > Lets put these hunks in some earlier patch to migrate out the > functions/etc > > I think all these pkvm/arm-smmu-v3.c should just be building up the > driver. Will do. > > > +static bool smmu_nesting_supported(struct hyp_arm_smmu_v3_device *smmu) > > +{ > > + unsigned int implementer, productid, variant, revision; > > + u32 reg; > > + > > + if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1) || > > + !(smmu->features & ARM_SMMU_FEAT_TRANS_S2)) > > + return false; > > + > > + reg = readl_relaxed(smmu->base + ARM_SMMU_IIDR); > > + implementer = FIELD_GET(IIDR_IMPLEMENTER, reg); > > + productid = FIELD_GET(IIDR_PRODUCTID, reg); > > + variant = FIELD_GET(IIDR_VARIANT, reg); > > + revision = FIELD_GET(IIDR_REVISION, reg); > > + > > + if (implementer != IIDR_IMPLEMENTER_ARM) > > + return true; > > + > > + if (productid == IIDR_PRODUCTID_ARM_MMU_600) > > + return variant >= 2; > > + else if (productid == IIDR_PRODUCTID_ARM_MMU_700) > > + return !(variant < 1 || revision < 1); > > + > > + return true; > > +} > > Why not share all this errata stuff with the idr parsing code too? > > We already have ARM_SMMU_FEAT_NESTING that has the above calculation. > > The two drivers use the same ARM_SMMU_FEAT system, I would expect one > chunk of shared code to compute the FEATs, who cares if pkvm doesn't > use all of them? > > Use the same errata logic and so on to get to the feat bitmap. Makes sense. Thanks, Mostafa > > Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2026-05-11 14:30 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260501111928.259252-1-smostafa@google.com>
[not found] ` <20260501111928.259252-5-smostafa@google.com>
[not found] ` <20260501124143.GB6912@ziepe.ca>
2026-05-04 12:15 ` [PATCH v6 04/25] iommu/arm-smmu-v3: Move TLB range invalidation into common code Mostafa Saleh
2026-05-05 16:17 ` Jason Gunthorpe
2026-05-05 16:43 ` Mostafa Saleh
2026-05-06 9:53 ` Jason Gunthorpe
2026-05-07 9:40 ` Mostafa Saleh
2026-05-09 23:29 ` Jason Gunthorpe
2026-05-11 11:45 ` Mostafa Saleh
2026-05-11 14:24 ` Jason Gunthorpe
[not found] ` <20260501111928.259252-6-smostafa@google.com>
[not found] ` <20260501124716.GD6912@ziepe.ca>
2026-05-04 12:16 ` [PATCH v6 05/25] iommu/arm-smmu-v3: Move IDR parsing to common functions Mostafa Saleh
2026-05-05 16:27 ` Jason Gunthorpe
2026-05-05 16:48 ` Mostafa Saleh
2026-05-06 9:56 ` Jason Gunthorpe
2026-05-07 10:13 ` Mostafa Saleh
2026-05-09 23:34 ` Jason Gunthorpe
2026-05-11 11:53 ` Mostafa Saleh
2026-05-11 14:30 ` Jason Gunthorpe
[not found] ` <20260501111928.259252-7-smostafa@google.com>
[not found] ` <20260501122424.GA6912@ziepe.ca>
2026-05-04 12:19 ` [PATCH v6 06/25] iommu/io-pgtable-arm: Rework to use the iommu-pages API Mostafa Saleh
2026-05-09 23:21 ` Jason Gunthorpe
2026-05-11 11:16 ` Mostafa Saleh
2026-05-11 14:18 ` Jason Gunthorpe
[not found] ` <20260501111928.259252-9-smostafa@google.com>
[not found] ` <20260501130006.GF6912@ziepe.ca>
2026-05-04 12:28 ` [PATCH v6 08/25] KVM: arm64: iommu: Shadow host stage-2 page table Mostafa Saleh
2026-05-09 23:27 ` Jason Gunthorpe
2026-05-11 11:24 ` Mostafa Saleh
2026-05-11 14:22 ` Jason Gunthorpe
[not found] ` <20260501111928.259252-14-smostafa@google.com>
[not found] ` <20260501125148.GE6912@ziepe.ca>
2026-05-04 12:30 ` [PATCH v6 13/25] iommu/arm-smmu-v3-kvm: Probe SMMU HW Mostafa Saleh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox