* tlbi va, vaa vs. val, vaal @ 2015-02-27 0:12 Mario Smarduch 2015-02-27 10:24 ` Will Deacon 0 siblings, 1 reply; 8+ messages in thread From: Mario Smarduch @ 2015-02-27 0:12 UTC (permalink / raw) To: linux-arm-kernel I noticed kernel tlbflush.h use tlbi va*, vaa* variants instead of val, vaal ones. Reading the manual D.5.7.2 it appears that va*, vaa* versions invalidate intermediate caching of translation structures. With stage2 enabled that may result in 20+ memory lookups for a 4 level page table walk. That's assuming that intermediate caching structures cache mappings from stage1 table entry to host page. - Mario ^ permalink raw reply [flat|nested] 8+ messages in thread
* tlbi va, vaa vs. val, vaal 2015-02-27 0:12 tlbi va, vaa vs. val, vaal Mario Smarduch @ 2015-02-27 10:24 ` Will Deacon 2015-02-27 10:29 ` Marc Zyngier 2015-02-27 21:15 ` Mario Smarduch 0 siblings, 2 replies; 8+ messages in thread From: Will Deacon @ 2015-02-27 10:24 UTC (permalink / raw) To: linux-arm-kernel On Fri, Feb 27, 2015 at 12:12:32AM +0000, Mario Smarduch wrote: > I noticed kernel tlbflush.h use tlbi va*, vaa* variants instead of > val, vaal ones. Reading the manual D.5.7.2 it appears that > va*, vaa* versions invalidate intermediate caching of > translation structures. > > With stage2 enabled that may result in 20+ memory lookups > for a 4 level page table walk. That's assuming that intermediate > caching structures cache mappings from stage1 table entry to > host page. Yeah, Catalin and I discussed improving the kernel support for this, but it requires some changes to the generic mmu_gather code so that we can distinguish the leaf cases. I'd also like to see that done in a way that takes into account different granule sizes (we currently iterate over huge pages in 4k chunks). Last time I touched that, I entered a world of pain and don't plan to return there immediately :) Catalin -- feeling brave? FWIW: the new IOMMU page-table stuff I just got merged *does* make use of leaf-invalidation for the SMMU. Will ^ permalink raw reply [flat|nested] 8+ messages in thread
* tlbi va, vaa vs. val, vaal 2015-02-27 10:24 ` Will Deacon @ 2015-02-27 10:29 ` Marc Zyngier 2015-02-27 10:33 ` Will Deacon 2015-02-27 21:15 ` Mario Smarduch 1 sibling, 1 reply; 8+ messages in thread From: Marc Zyngier @ 2015-02-27 10:29 UTC (permalink / raw) To: linux-arm-kernel On 27/02/15 10:24, Will Deacon wrote: > On Fri, Feb 27, 2015 at 12:12:32AM +0000, Mario Smarduch wrote: >> I noticed kernel tlbflush.h use tlbi va*, vaa* variants instead of >> val, vaal ones. Reading the manual D.5.7.2 it appears that >> va*, vaa* versions invalidate intermediate caching of >> translation structures. >> >> With stage2 enabled that may result in 20+ memory lookups >> for a 4 level page table walk. That's assuming that intermediate >> caching structures cache mappings from stage1 table entry to >> host page. > > Yeah, Catalin and I discussed improving the kernel support for this, > but it requires some changes to the generic mmu_gather code so that we > can distinguish the leaf cases. I'd also like to see that done in a way > that takes into account different granule sizes (we currently iterate > over huge pages in 4k chunks). Last time I touched that, I entered a > world of pain and don't plan to return there immediately :) > > Catalin -- feeling brave? > > FWIW: the new IOMMU page-table stuff I just got merged *does* make use > of leaf-invalidation for the SMMU. Now, talking about feeling brave: who will be silly enough to port KVM to the IOMMU page table code? It should just work(tm), right? M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 8+ messages in thread
* tlbi va, vaa vs. val, vaal 2015-02-27 10:29 ` Marc Zyngier @ 2015-02-27 10:33 ` Will Deacon 2015-02-27 10:44 ` Marc Zyngier 0 siblings, 1 reply; 8+ messages in thread From: Will Deacon @ 2015-02-27 10:33 UTC (permalink / raw) To: linux-arm-kernel On Fri, Feb 27, 2015 at 10:29:06AM +0000, Marc Zyngier wrote: > On 27/02/15 10:24, Will Deacon wrote: > > On Fri, Feb 27, 2015 at 12:12:32AM +0000, Mario Smarduch wrote: > >> I noticed kernel tlbflush.h use tlbi va*, vaa* variants instead of > >> val, vaal ones. Reading the manual D.5.7.2 it appears that > >> va*, vaa* versions invalidate intermediate caching of > >> translation structures. > >> > >> With stage2 enabled that may result in 20+ memory lookups > >> for a 4 level page table walk. That's assuming that intermediate > >> caching structures cache mappings from stage1 table entry to > >> host page. > > > > Yeah, Catalin and I discussed improving the kernel support for this, > > but it requires some changes to the generic mmu_gather code so that we > > can distinguish the leaf cases. I'd also like to see that done in a way > > that takes into account different granule sizes (we currently iterate > > over huge pages in 4k chunks). Last time I touched that, I entered a > > world of pain and don't plan to return there immediately :) > > > > Catalin -- feeling brave? > > > > FWIW: the new IOMMU page-table stuff I just got merged *does* make use > > of leaf-invalidation for the SMMU. > > Now, talking about feeling brave: who will be silly enough to port KVM > to the IOMMU page table code? It should just work(tm), right? I suspect you'll need to do some surgery to the interfaces, which currently map directly onto the IOMMU API and therefore make nice assumptions about what we get asked to map/unmap. You also probably want a wider range of permissions than we use on the SMMU. Finally, the runtime nature of the code (we make no assumptions about address sizes, page sizes etc) probably incurs a performance hit that you may or may not care about. Will ^ permalink raw reply [flat|nested] 8+ messages in thread
* tlbi va, vaa vs. val, vaal 2015-02-27 10:33 ` Will Deacon @ 2015-02-27 10:44 ` Marc Zyngier 0 siblings, 0 replies; 8+ messages in thread From: Marc Zyngier @ 2015-02-27 10:44 UTC (permalink / raw) To: linux-arm-kernel On 27/02/15 10:33, Will Deacon wrote: > On Fri, Feb 27, 2015 at 10:29:06AM +0000, Marc Zyngier wrote: >> On 27/02/15 10:24, Will Deacon wrote: >>> On Fri, Feb 27, 2015 at 12:12:32AM +0000, Mario Smarduch wrote: >>>> I noticed kernel tlbflush.h use tlbi va*, vaa* variants instead of >>>> val, vaal ones. Reading the manual D.5.7.2 it appears that >>>> va*, vaa* versions invalidate intermediate caching of >>>> translation structures. >>>> >>>> With stage2 enabled that may result in 20+ memory lookups >>>> for a 4 level page table walk. That's assuming that intermediate >>>> caching structures cache mappings from stage1 table entry to >>>> host page. >>> >>> Yeah, Catalin and I discussed improving the kernel support for this, >>> but it requires some changes to the generic mmu_gather code so that we >>> can distinguish the leaf cases. I'd also like to see that done in a way >>> that takes into account different granule sizes (we currently iterate >>> over huge pages in 4k chunks). Last time I touched that, I entered a >>> world of pain and don't plan to return there immediately :) >>> >>> Catalin -- feeling brave? >>> >>> FWIW: the new IOMMU page-table stuff I just got merged *does* make use >>> of leaf-invalidation for the SMMU. >> >> Now, talking about feeling brave: who will be silly enough to port KVM >> to the IOMMU page table code? It should just work(tm), right? > > I suspect you'll need to do some surgery to the interfaces, which currently > map directly onto the IOMMU API and therefore make nice assumptions about > what we get asked to map/unmap. You also probably want a wider range of > permissions than we use on the SMMU. Finally, the runtime nature of the > code (we make no assumptions about address sizes, page sizes etc) probably > incurs a performance hit that you may or may not care about. That's exactly what I want to evaluate. It would also help us to decouple our page-table code from the kernel macros, which bite us time and time again... Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 8+ messages in thread
* tlbi va, vaa vs. val, vaal 2015-02-27 10:24 ` Will Deacon 2015-02-27 10:29 ` Marc Zyngier @ 2015-02-27 21:15 ` Mario Smarduch 2015-03-02 16:23 ` Catalin Marinas 1 sibling, 1 reply; 8+ messages in thread From: Mario Smarduch @ 2015-02-27 21:15 UTC (permalink / raw) To: linux-arm-kernel On 02/27/2015 02:24 AM, Will Deacon wrote: > On Fri, Feb 27, 2015 at 12:12:32AM +0000, Mario Smarduch wrote: >> I noticed kernel tlbflush.h use tlbi va*, vaa* variants instead of >> val, vaal ones. Reading the manual D.5.7.2 it appears that >> va*, vaa* versions invalidate intermediate caching of >> translation structures. >> >> With stage2 enabled that may result in 20+ memory lookups >> for a 4 level page table walk. That's assuming that intermediate >> caching structures cache mappings from stage1 table entry to >> host page. > > Yeah, Catalin and I discussed improving the kernel support for this, > but it requires some changes to the generic mmu_gather code so that we > can distinguish the leaf cases. I'd also like to see that done in a way > that takes into account different granule sizes (we currently iterate > over huge pages in 4k chunks). Last time I touched that, I entered a > world of pain and don't plan to return there immediately :) > > Catalin -- feeling brave? > > FWIW: the new IOMMU page-table stuff I just got merged *does* make use > of leaf-invalidation for the SMMU. > > Will > Hi Will, thanks for the background. I'm guessing how much of PTWalk is cached is implementation dependent. One old paper quotes upto 40% improvement for some industry benchmarks that cache all stage1/2 PTWalk entries. I guess something to benchmark. - Mario ^ permalink raw reply [flat|nested] 8+ messages in thread
* tlbi va, vaa vs. val, vaal 2015-02-27 21:15 ` Mario Smarduch @ 2015-03-02 16:23 ` Catalin Marinas 2015-03-02 19:26 ` Mario Smarduch 0 siblings, 1 reply; 8+ messages in thread From: Catalin Marinas @ 2015-03-02 16:23 UTC (permalink / raw) To: linux-arm-kernel On Fri, Feb 27, 2015 at 01:15:57PM -0800, Mario Smarduch wrote: > On 02/27/2015 02:24 AM, Will Deacon wrote: > > On Fri, Feb 27, 2015 at 12:12:32AM +0000, Mario Smarduch wrote: > >> I noticed kernel tlbflush.h use tlbi va*, vaa* variants instead of > >> val, vaal ones. Reading the manual D.5.7.2 it appears that > >> va*, vaa* versions invalidate intermediate caching of > >> translation structures. > >> > >> With stage2 enabled that may result in 20+ memory lookups > >> for a 4 level page table walk. That's assuming that intermediate > >> caching structures cache mappings from stage1 table entry to > >> host page. > > > > Yeah, Catalin and I discussed improving the kernel support for this, > > but it requires some changes to the generic mmu_gather code so that we > > can distinguish the leaf cases. I'd also like to see that done in a way > > that takes into account different granule sizes (we currently iterate > > over huge pages in 4k chunks). Last time I touched that, I entered a > > world of pain and don't plan to return there immediately :) > > > > Catalin -- feeling brave? > > > > FWIW: the new IOMMU page-table stuff I just got merged *does* make use > > of leaf-invalidation for the SMMU. > > thanks for the background. I'm guessing how much of PTWalk > is cached is implementation dependent. One old paper quotes upto 40% > improvement for some industry benchmarks that cache all stage1/2 PTWalk > entries. Is it caching in the TLB or in the level 1 CPU cache? I would indeed expect some improvement without many drawbacks. The only thing we need in Linux is to distinguish between leaf TLBI and TLBI for page table tearing down. It's not complicated, it just needs some testing (strangely enough, I tried to replace all user TLBI with the L variants on a Juno board and no signs of any crashes). -- Catalin ^ permalink raw reply [flat|nested] 8+ messages in thread
* tlbi va, vaa vs. val, vaal 2015-03-02 16:23 ` Catalin Marinas @ 2015-03-02 19:26 ` Mario Smarduch 0 siblings, 0 replies; 8+ messages in thread From: Mario Smarduch @ 2015-03-02 19:26 UTC (permalink / raw) To: linux-arm-kernel On 03/02/2015 08:23 AM, Catalin Marinas wrote: > On Fri, Feb 27, 2015 at 01:15:57PM -0800, Mario Smarduch wrote: >> On 02/27/2015 02:24 AM, Will Deacon wrote: >>> On Fri, Feb 27, 2015 at 12:12:32AM +0000, Mario Smarduch wrote: >>>> I noticed kernel tlbflush.h use tlbi va*, vaa* variants instead of >>>> val, vaal ones. Reading the manual D.5.7.2 it appears that >>>> va*, vaa* versions invalidate intermediate caching of >>>> translation structures. >>>> >>>> With stage2 enabled that may result in 20+ memory lookups >>>> for a 4 level page table walk. That's assuming that intermediate >>>> caching structures cache mappings from stage1 table entry to >>>> host page. >>> >>> Yeah, Catalin and I discussed improving the kernel support for this, >>> but it requires some changes to the generic mmu_gather code so that we >>> can distinguish the leaf cases. I'd also like to see that done in a way >>> that takes into account different granule sizes (we currently iterate >>> over huge pages in 4k chunks). Last time I touched that, I entered a >>> world of pain and don't plan to return there immediately :) >>> >>> Catalin -- feeling brave? >>> >>> FWIW: the new IOMMU page-table stuff I just got merged *does* make use >>> of leaf-invalidation for the SMMU. >> >> thanks for the background. I'm guessing how much of PTWalk >> is cached is implementation dependent. One old paper quotes upto 40% >> improvement for some industry benchmarks that cache all stage1/2 PTWalk >> entries. > > Is it caching in the TLB or in the level 1 CPU cache? AFAICT this is caching in what other vendors call page walk cache. It's likely for host - improvements may not be that dramatic. For Guest 1st stage table/pte lookups are 2nd stage n-level walks. I would think performance will vary on CPU implementation of this intermediate cache especially if nested page entries are cached. I guess it's likely onc CPU will show huge improvement and others may not. > > I would indeed expect some improvement without many drawbacks. The only > thing we need in Linux is to distinguish between leaf TLBI and TLBI for > page table tearing down. It's not complicated, it just needs some > testing (strangely enough, I tried to replace all user TLBI with the L > variants on a Juno board and no signs of any crashes). I tried that too it worked, but with very minimal test. But I think I understand what the concern is using the 'L' variant may leave intermediate table entries cached and corrupt another process PTW. - Mario > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-03-02 19:26 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-02-27 0:12 tlbi va, vaa vs. val, vaal Mario Smarduch 2015-02-27 10:24 ` Will Deacon 2015-02-27 10:29 ` Marc Zyngier 2015-02-27 10:33 ` Will Deacon 2015-02-27 10:44 ` Marc Zyngier 2015-02-27 21:15 ` Mario Smarduch 2015-03-02 16:23 ` Catalin Marinas 2015-03-02 19:26 ` Mario Smarduch
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).