* ARM64: Question: How to map non-shareable memory @ 2023-05-24 21:07 David Clear 2023-05-24 21:59 ` Ard Biesheuvel 0 siblings, 1 reply; 5+ messages in thread From: David Clear @ 2023-05-24 21:07 UTC (permalink / raw) To: linux-arm-kernel I'd like some advice on how a device driver could map normal, cacheable, non-shareable memory (currently not supported in the kernel). I have a device that contains areas of RAM and other internal memories that are outside of the coherency system, and it's a hardware requirement in this device that cacheable memory transactions to these areas be marked as non-shareable. In practical terms this means that Device or Normal_NC mappings work today, but Normal (cacheable) mappings will see transaction aborts (SErrors). An approach that appears to work is to define a pgprot_nonshared() macro in arch/arm64/include/asm/pgtable.h which sets the PTE SH bits to zero, and then define an arm64-specific pgprot_modify() that carries over the pgprot_nonshared() property, so the PTE changes aren't lost by vm_pgprot_modify(). That's a bit low-level, so I wonder if there's a better approach. Ideally I'd like a successful patch to be upstreamed, so I hope there's no intrinsic resistance to supporting non-shareable mappings. I'd appreciate your thoughts. Regards, David. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ARM64: Question: How to map non-shareable memory 2023-05-24 21:07 ARM64: Question: How to map non-shareable memory David Clear @ 2023-05-24 21:59 ` Ard Biesheuvel 2023-05-25 0:33 ` David Clear 0 siblings, 1 reply; 5+ messages in thread From: Ard Biesheuvel @ 2023-05-24 21:59 UTC (permalink / raw) To: David Clear Cc: linux-arm-kernel, Mark Rutland, Catalin Marinas, Marc Zyngier, Will Deacon (cc some folks that work on arm64 arch stuff) Hello David, On Wed, 24 May 2023 at 23:08, David Clear <dclear@amd.com> wrote: > > I'd like some advice on how a device driver could map normal, cacheable, > non-shareable memory (currently not supported in the kernel). > > I have a device that contains areas of RAM and other internal memories > that are outside of the coherency system, and it's a hardware requirement > in this device that cacheable memory transactions to these areas be > marked as non-shareable. > > In practical terms this means that Device or Normal_NC mappings work > today, but Normal (cacheable) mappings will see transaction aborts > (SErrors). > > An approach that appears to work is to define a pgprot_nonshared() > macro in arch/arm64/include/asm/pgtable.h which sets the PTE SH bits to > zero, and then define an arm64-specific pgprot_modify() that carries > over the pgprot_nonshared() property, so the PTE changes aren't lost > by vm_pgprot_modify(). > > That's a bit low-level, so I wonder if there's a better approach. Ideally > I'd like a successful patch to be upstreamed, so I hope there's no > intrinsic resistance to supporting non-shareable mappings. > The code changes are rather straight-forward, but unfortunately, that is not the real problem here. Non-shareable cacheable mappings are problematic because they are not covered by the hardware coherency protocol that keeps caches synchronized between CPUs and cluster-level and system-level caches. (IOW, accesses to non-shareable mappings will have snooping disabled). This means that, unless your system only has a single CPU and does not support cache coherent DMA at all, the cached view of those RAM regions will go out of sync between CPUs and wrt other coherent masters, which is probably not what you're after. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ARM64: Question: How to map non-shareable memory 2023-05-24 21:59 ` Ard Biesheuvel @ 2023-05-25 0:33 ` David Clear 2023-05-25 8:30 ` Catalin Marinas 0 siblings, 1 reply; 5+ messages in thread From: David Clear @ 2023-05-25 0:33 UTC (permalink / raw) To: ardb; +Cc: linux-arm-kernel, mark.rutland, catalin.marinas, maz, will On Wed, 24 May 2023 at 23:59, Ard Biesheuvel <ardb@kernel.org> wrote: > > Non-shareable cacheable mappings are problematic because they are not > covered by the hardware coherency protocol that keeps caches > synchronized between CPUs and cluster-level and system-level caches. > (IOW, accesses to non-shareable mappings will have snooping disabled). > > This means that, unless your system only has a single CPU and does not > support cache coherent DMA at all, the cached view of those RAM > regions will go out of sync between CPUs and wrt other coherent > masters, which is probably not what you're after. Hi Ard. Thanks for the quick reply. I understand your concerns. The general Linux memory within the (multi-cluster) system is fully coherent, and there are no surprises w.r.t normal SMP system operation and device DMA. The non-coherent memories are outside of the general Linux pool, owned by autonomous hardware units, and are used for product-specific purposes. These memories are either internal to the units (far away from coherence machinery) or purposefully avoid the system coherency controllers so as to not incur the latecy tax in back-to-back dependent transactions. In this product it would be a significant performance burden to maintain coherence with ARM caches that are essentially nothing to do with these unit's operations. For the userspace software that needs to access this memory, the current non-cached mapping is obtained via a device driver and the goal is to minimize the number of discrete memory transactions by supporting cached burst-reads and burst-writes, bracketed with appropriate cache maintenance ops. There are already private caches within the hardware pipelines that software needs to be explicitly flush or invalidate, so this is just one more thing. All that as prelude, it doesn't sound like you're objecting to the notion of supporting a non-shared mapping, more just asking why, and I hope I've given some insight into that. Regards, David. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ARM64: Question: How to map non-shareable memory 2023-05-25 0:33 ` David Clear @ 2023-05-25 8:30 ` Catalin Marinas 2023-05-25 23:47 ` David Clear 0 siblings, 1 reply; 5+ messages in thread From: Catalin Marinas @ 2023-05-25 8:30 UTC (permalink / raw) To: David Clear; +Cc: ardb, linux-arm-kernel, mark.rutland, maz, will Hi David, On Wed, May 24, 2023 at 05:33:59PM -0700, David Clear wrote: > On Wed, 24 May 2023 at 23:59, Ard Biesheuvel <ardb@kernel.org> wrote: > > Non-shareable cacheable mappings are problematic because they are not > > covered by the hardware coherency protocol that keeps caches > > synchronized between CPUs and cluster-level and system-level caches. > > (IOW, accesses to non-shareable mappings will have snooping disabled). > > > > This means that, unless your system only has a single CPU and does not > > support cache coherent DMA at all, the cached view of those RAM > > regions will go out of sync between CPUs and wrt other coherent > > masters, which is probably not what you're after. > > Hi Ard. Thanks for the quick reply. > > I understand your concerns. The general Linux memory within the > (multi-cluster) system is fully coherent, and there are no surprises > w.r.t normal SMP system operation and device DMA. > > The non-coherent memories are outside of the general Linux pool, owned > by autonomous hardware units, and are used for product-specific purposes. > These memories are either internal to the units (far away from coherence > machinery) or purposefully avoid the system coherency controllers so as > to not incur the latecy tax in back-to-back dependent transactions. In > this product it would be a significant performance burden to maintain > coherence with ARM caches that are essentially nothing to do with these > unit's operations. Are these memories bus masters themselves? I doubt it. My guess is that such memory is also accessed by a device that cannot maintain coherency with the CPU caches. So IIUC you want a cached mapping from the CPU side for performance reason but treat it non-coherent from a DMA perspective. For some hardware reason, shareable cacheable transactions to such memory trigger SErrors. Do you know why this is the case? Because any other non-cacheable transactions are considered shareable anyway. Or is it that out shareable is fine but inner shareable is not? The Arm CPUs don't really distinguish between these AFAIK. > For the userspace software that needs to access this memory, the current > non-cached mapping is obtained via a device driver and the goal is > to minimize the number of discrete memory transactions by supporting > cached burst-reads and burst-writes, bracketed with appropriate cache > maintenance ops. There are already private caches within the hardware > pipelines that software needs to be explicitly flush or invalidate, > so this is just one more thing. I agree with Ard, such mapping won't work. When you mark it as non-shareable, it tells the CPU that the cache lines for that mapping are not shared with other CPUs, they don't participate in the cache coherency protocols. Any cache maintenance to PoC is also limited to that CPU. See "Effects of instructions that operate by VA to the PoC" in the latest Arm ARM (page D7-5784). So let's say that your user process starts reading from such mapping (potentially speculatively) but doing some DC IVAC before (it needs to be in the kernel). The process is than migrated by the kernel to another CPU which has stale cache lines for that range because the DC IVAC only affected the first CPU. Similarly with the writes, you can't guarantee that the write and the DC CVAC happen on the same CPU. I also have no idea how some "transparent" system caches behave here, whether they do anything on the DC instructions and how shareability changes their behaviour. Your best bet is Normal Non-cacheable here. On newer architecture versions Arm introduced ST64B/LD64B for similar performance reasons (FEAT_LS64 in Armv8.7) but I don't think there's hardware yet. -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ARM64: Question: How to map non-shareable memory 2023-05-25 8:30 ` Catalin Marinas @ 2023-05-25 23:47 ` David Clear 0 siblings, 0 replies; 5+ messages in thread From: David Clear @ 2023-05-25 23:47 UTC (permalink / raw) To: catalin.marinas; +Cc: linux-arm-kernel, ardb, mark.rutland, maz, will On Thu, 25 May 2023 at 09:30:27, Catalin Marinas wrote: > Hi David, Hi Catalin, thanks for the detailed comments. I'm finally coming around... The multi-core issues, and having to drop into kernel mode for the DC IVAC eliminates any possible advantage this avenue of investigation could have yielded. > Your best bet is Normal Non-cacheable here. Yes, we'll stay with Normal_NC. BTW the Cortex A72 issues these as ReadNoSnoop / WriteNoSnoop, so that's why they made it through the NOC. > On newer architecture > versions Arm introduced ST64B/LD64B for similar performance reasons > (FEAT_LS64 in Armv8.7) but I don't think there's hardware yet. That's very interesting. Something to look forward to. Thanks again for your time. Both you and Ard. I appreciate it. Regards, David. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-05-25 23:47 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-05-24 21:07 ARM64: Question: How to map non-shareable memory David Clear 2023-05-24 21:59 ` Ard Biesheuvel 2023-05-25 0:33 ` David Clear 2023-05-25 8:30 ` Catalin Marinas 2023-05-25 23:47 ` David Clear
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).