* ARM64: Question: How to map non-shareable memory
@ 2023-05-24 21:07 David Clear
2023-05-24 21:59 ` Ard Biesheuvel
0 siblings, 1 reply; 5+ messages in thread
From: David Clear @ 2023-05-24 21:07 UTC (permalink / raw)
To: linux-arm-kernel
I'd like some advice on how a device driver could map normal, cacheable,
non-shareable memory (currently not supported in the kernel).
I have a device that contains areas of RAM and other internal memories
that are outside of the coherency system, and it's a hardware requirement
in this device that cacheable memory transactions to these areas be
marked as non-shareable.
In practical terms this means that Device or Normal_NC mappings work
today, but Normal (cacheable) mappings will see transaction aborts
(SErrors).
An approach that appears to work is to define a pgprot_nonshared()
macro in arch/arm64/include/asm/pgtable.h which sets the PTE SH bits to
zero, and then define an arm64-specific pgprot_modify() that carries
over the pgprot_nonshared() property, so the PTE changes aren't lost
by vm_pgprot_modify().
That's a bit low-level, so I wonder if there's a better approach. Ideally
I'd like a successful patch to be upstreamed, so I hope there's no
intrinsic resistance to supporting non-shareable mappings.
I'd appreciate your thoughts.
Regards,
David.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ARM64: Question: How to map non-shareable memory
2023-05-24 21:07 ARM64: Question: How to map non-shareable memory David Clear
@ 2023-05-24 21:59 ` Ard Biesheuvel
2023-05-25 0:33 ` David Clear
0 siblings, 1 reply; 5+ messages in thread
From: Ard Biesheuvel @ 2023-05-24 21:59 UTC (permalink / raw)
To: David Clear
Cc: linux-arm-kernel, Mark Rutland, Catalin Marinas, Marc Zyngier,
Will Deacon
(cc some folks that work on arm64 arch stuff)
Hello David,
On Wed, 24 May 2023 at 23:08, David Clear <dclear@amd.com> wrote:
>
> I'd like some advice on how a device driver could map normal, cacheable,
> non-shareable memory (currently not supported in the kernel).
>
> I have a device that contains areas of RAM and other internal memories
> that are outside of the coherency system, and it's a hardware requirement
> in this device that cacheable memory transactions to these areas be
> marked as non-shareable.
>
> In practical terms this means that Device or Normal_NC mappings work
> today, but Normal (cacheable) mappings will see transaction aborts
> (SErrors).
>
> An approach that appears to work is to define a pgprot_nonshared()
> macro in arch/arm64/include/asm/pgtable.h which sets the PTE SH bits to
> zero, and then define an arm64-specific pgprot_modify() that carries
> over the pgprot_nonshared() property, so the PTE changes aren't lost
> by vm_pgprot_modify().
>
> That's a bit low-level, so I wonder if there's a better approach. Ideally
> I'd like a successful patch to be upstreamed, so I hope there's no
> intrinsic resistance to supporting non-shareable mappings.
>
The code changes are rather straight-forward, but unfortunately, that
is not the real problem here.
Non-shareable cacheable mappings are problematic because they are not
covered by the hardware coherency protocol that keeps caches
synchronized between CPUs and cluster-level and system-level caches.
(IOW, accesses to non-shareable mappings will have snooping disabled).
This means that, unless your system only has a single CPU and does not
support cache coherent DMA at all, the cached view of those RAM
regions will go out of sync between CPUs and wrt other coherent
masters, which is probably not what you're after.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ARM64: Question: How to map non-shareable memory
2023-05-24 21:59 ` Ard Biesheuvel
@ 2023-05-25 0:33 ` David Clear
2023-05-25 8:30 ` Catalin Marinas
0 siblings, 1 reply; 5+ messages in thread
From: David Clear @ 2023-05-25 0:33 UTC (permalink / raw)
To: ardb; +Cc: linux-arm-kernel, mark.rutland, catalin.marinas, maz, will
On Wed, 24 May 2023 at 23:59, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> Non-shareable cacheable mappings are problematic because they are not
> covered by the hardware coherency protocol that keeps caches
> synchronized between CPUs and cluster-level and system-level caches.
> (IOW, accesses to non-shareable mappings will have snooping disabled).
>
> This means that, unless your system only has a single CPU and does not
> support cache coherent DMA at all, the cached view of those RAM
> regions will go out of sync between CPUs and wrt other coherent
> masters, which is probably not what you're after.
Hi Ard. Thanks for the quick reply.
I understand your concerns. The general Linux memory within the
(multi-cluster) system is fully coherent, and there are no surprises
w.r.t normal SMP system operation and device DMA.
The non-coherent memories are outside of the general Linux pool, owned
by autonomous hardware units, and are used for product-specific purposes.
These memories are either internal to the units (far away from coherence
machinery) or purposefully avoid the system coherency controllers so as
to not incur the latecy tax in back-to-back dependent transactions. In
this product it would be a significant performance burden to maintain
coherence with ARM caches that are essentially nothing to do with these
unit's operations.
For the userspace software that needs to access this memory, the current
non-cached mapping is obtained via a device driver and the goal is
to minimize the number of discrete memory transactions by supporting
cached burst-reads and burst-writes, bracketed with appropriate cache
maintenance ops. There are already private caches within the hardware
pipelines that software needs to be explicitly flush or invalidate,
so this is just one more thing.
All that as prelude, it doesn't sound like you're objecting to the
notion of supporting a non-shared mapping, more just asking why, and I
hope I've given some insight into that.
Regards,
David.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ARM64: Question: How to map non-shareable memory
2023-05-25 0:33 ` David Clear
@ 2023-05-25 8:30 ` Catalin Marinas
2023-05-25 23:47 ` David Clear
0 siblings, 1 reply; 5+ messages in thread
From: Catalin Marinas @ 2023-05-25 8:30 UTC (permalink / raw)
To: David Clear; +Cc: ardb, linux-arm-kernel, mark.rutland, maz, will
Hi David,
On Wed, May 24, 2023 at 05:33:59PM -0700, David Clear wrote:
> On Wed, 24 May 2023 at 23:59, Ard Biesheuvel <ardb@kernel.org> wrote:
> > Non-shareable cacheable mappings are problematic because they are not
> > covered by the hardware coherency protocol that keeps caches
> > synchronized between CPUs and cluster-level and system-level caches.
> > (IOW, accesses to non-shareable mappings will have snooping disabled).
> >
> > This means that, unless your system only has a single CPU and does not
> > support cache coherent DMA at all, the cached view of those RAM
> > regions will go out of sync between CPUs and wrt other coherent
> > masters, which is probably not what you're after.
>
> Hi Ard. Thanks for the quick reply.
>
> I understand your concerns. The general Linux memory within the
> (multi-cluster) system is fully coherent, and there are no surprises
> w.r.t normal SMP system operation and device DMA.
>
> The non-coherent memories are outside of the general Linux pool, owned
> by autonomous hardware units, and are used for product-specific purposes.
> These memories are either internal to the units (far away from coherence
> machinery) or purposefully avoid the system coherency controllers so as
> to not incur the latecy tax in back-to-back dependent transactions. In
> this product it would be a significant performance burden to maintain
> coherence with ARM caches that are essentially nothing to do with these
> unit's operations.
Are these memories bus masters themselves? I doubt it. My guess is that
such memory is also accessed by a device that cannot maintain coherency
with the CPU caches. So IIUC you want a cached mapping from the CPU side
for performance reason but treat it non-coherent from a DMA perspective.
For some hardware reason, shareable cacheable transactions to such
memory trigger SErrors. Do you know why this is the case? Because any
other non-cacheable transactions are considered shareable anyway. Or is
it that out shareable is fine but inner shareable is not? The Arm CPUs
don't really distinguish between these AFAIK.
> For the userspace software that needs to access this memory, the current
> non-cached mapping is obtained via a device driver and the goal is
> to minimize the number of discrete memory transactions by supporting
> cached burst-reads and burst-writes, bracketed with appropriate cache
> maintenance ops. There are already private caches within the hardware
> pipelines that software needs to be explicitly flush or invalidate,
> so this is just one more thing.
I agree with Ard, such mapping won't work. When you mark it as
non-shareable, it tells the CPU that the cache lines for that mapping
are not shared with other CPUs, they don't participate in the cache
coherency protocols. Any cache maintenance to PoC is also limited to
that CPU. See "Effects of instructions that operate by VA to the PoC" in
the latest Arm ARM (page D7-5784).
So let's say that your user process starts reading from such mapping
(potentially speculatively) but doing some DC IVAC before (it needs to
be in the kernel). The process is than migrated by the kernel to another
CPU which has stale cache lines for that range because the DC IVAC only
affected the first CPU. Similarly with the writes, you can't guarantee
that the write and the DC CVAC happen on the same CPU. I also have no
idea how some "transparent" system caches behave here, whether they do
anything on the DC instructions and how shareability changes their
behaviour.
Your best bet is Normal Non-cacheable here. On newer architecture
versions Arm introduced ST64B/LD64B for similar performance reasons
(FEAT_LS64 in Armv8.7) but I don't think there's hardware yet.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ARM64: Question: How to map non-shareable memory
2023-05-25 8:30 ` Catalin Marinas
@ 2023-05-25 23:47 ` David Clear
0 siblings, 0 replies; 5+ messages in thread
From: David Clear @ 2023-05-25 23:47 UTC (permalink / raw)
To: catalin.marinas; +Cc: linux-arm-kernel, ardb, mark.rutland, maz, will
On Thu, 25 May 2023 at 09:30:27, Catalin Marinas wrote:
> Hi David,
Hi Catalin, thanks for the detailed comments.
I'm finally coming around... The multi-core issues, and having to drop
into kernel mode for the DC IVAC eliminates any possible advantage
this avenue of investigation could have yielded.
> Your best bet is Normal Non-cacheable here.
Yes, we'll stay with Normal_NC. BTW the Cortex A72 issues these as
ReadNoSnoop / WriteNoSnoop, so that's why they made it through the NOC.
> On newer architecture
> versions Arm introduced ST64B/LD64B for similar performance reasons
> (FEAT_LS64 in Armv8.7) but I don't think there's hardware yet.
That's very interesting. Something to look forward to.
Thanks again for your time. Both you and Ard. I appreciate it.
Regards,
David.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-05-25 23:47 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-24 21:07 ARM64: Question: How to map non-shareable memory David Clear
2023-05-24 21:59 ` Ard Biesheuvel
2023-05-25 0:33 ` David Clear
2023-05-25 8:30 ` Catalin Marinas
2023-05-25 23:47 ` David Clear
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).