* Re: Fixing SH cache assumptions
2016-03-22 21:19 Fixing SH cache assumptions Rich Felker
@ 2016-03-23 5:08 ` Rob Landley
2016-03-23 7:45 ` Geert Uytterhoeven
2016-03-23 16:41 ` Rich Felker
2 siblings, 0 replies; 4+ messages in thread
From: Rob Landley @ 2016-03-23 5:08 UTC (permalink / raw)
To: linux-sh
On 03/22/2016 04:19 PM, Rich Felker wrote:
> Currently arch/sh has a hard-coded assumption that the cache is
> virtually indexed (and virtually tagged, from what I can tell), and
> thus needs to account for pages that may alias. While this is correct
> for SH3/4, it's wrong for anything NOMMU (since there are no virtual
> addresses, only physical) and the only reason SH2 works at all is
> because its small cache size (256 lines * 16 bytes per line) matches
> the page size, yielding an alias_mask of 0. If the cache were any
> larger (like it is on J2) then the alias avoidance logic would kick in
> and lead to calling kmap_coherent (which is BUG() on NOMMU) and
> possibly other incorrect or suboptimal behavior.
>
> I've avoided the issue so far on J2 simply by lying that the cache is
> small, but this needs a proper fix. It would be easy to just #ifndef
> out the logic that sets up alias_mask and shm_align_mask on NOMMU, but
> I think it would be better to somehow represent the cache indexing in
> the cache_info struct or elsewhere. In case the future J4 has a
> physically indexed cache (which is my hope), such an approach should
> naturally work for it with no further modifications.
Wikipedia[citation needed] is under the impression that physically
indexed caches are not a happy thing on systems with MMU:
https://en.wikipedia.org/wiki/CPU_cache#Address_translation
(I so want search anchors. Instead of #Address_translation, if I could
add $PIPT it could jump you right where you needed to go. But no,
mozilla never did that and chrome apparently hasn't thought of it.)
This sounds like something nommu systems do. Are any other nommu systems
currently device tree enabled? (Blackfish? Coldfire? Is any of the
Cortex-M stuff actually merged in-tree yet?)
> Any preferences for how I do this? Just add a type field to cache_info
> and make the default VIVT for existing models?
It sounds like a device tree issue. How would you represent it there?
> Rich
Rob
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Fixing SH cache assumptions
2016-03-22 21:19 Fixing SH cache assumptions Rich Felker
2016-03-23 5:08 ` Rob Landley
@ 2016-03-23 7:45 ` Geert Uytterhoeven
2016-03-23 16:41 ` Rich Felker
2 siblings, 0 replies; 4+ messages in thread
From: Geert Uytterhoeven @ 2016-03-23 7:45 UTC (permalink / raw)
To: linux-sh
On Wed, Mar 23, 2016 at 6:08 AM, Rob Landley <rob@landley.net> wrote:
> This sounds like something nommu systems do. Are any other nommu systems
> currently device tree enabled? (Blackfish? Coldfire? Is any of the
> Cortex-M stuff actually merged in-tree yet?)
$ git grep cortex-m
[...]
arch/arm/boot/dts/lpc18xx.dtsi: compatible = "arm,cortex-m3";
arch/arm/boot/dts/lpc4350.dtsi: compatible = "arm,cortex-m4";
arch/arm/boot/dts/lpc4357.dtsi: compatible = "arm,cortex-m4";
$
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: Fixing SH cache assumptions
2016-03-22 21:19 Fixing SH cache assumptions Rich Felker
2016-03-23 5:08 ` Rob Landley
2016-03-23 7:45 ` Geert Uytterhoeven
@ 2016-03-23 16:41 ` Rich Felker
2 siblings, 0 replies; 4+ messages in thread
From: Rich Felker @ 2016-03-23 16:41 UTC (permalink / raw)
To: linux-sh
On Wed, Mar 23, 2016 at 12:08:51AM -0500, Rob Landley wrote:
> On 03/22/2016 04:19 PM, Rich Felker wrote:
> > Currently arch/sh has a hard-coded assumption that the cache is
> > virtually indexed (and virtually tagged, from what I can tell), and
> > thus needs to account for pages that may alias. While this is correct
> > for SH3/4, it's wrong for anything NOMMU (since there are no virtual
> > addresses, only physical) and the only reason SH2 works at all is
> > because its small cache size (256 lines * 16 bytes per line) matches
> > the page size, yielding an alias_mask of 0. If the cache were any
> > larger (like it is on J2) then the alias avoidance logic would kick in
> > and lead to calling kmap_coherent (which is BUG() on NOMMU) and
> > possibly other incorrect or suboptimal behavior.
> >
> > I've avoided the issue so far on J2 simply by lying that the cache is
> > small, but this needs a proper fix. It would be easy to just #ifndef
> > out the logic that sets up alias_mask and shm_align_mask on NOMMU, but
> > I think it would be better to somehow represent the cache indexing in
> > the cache_info struct or elsewhere. In case the future J4 has a
> > physically indexed cache (which is my hope), such an approach should
> > naturally work for it with no further modifications.
>
> Wikipedia[citation needed] is under the impression that physically
> indexed caches are not a happy thing on systems with MMU:
>
> https://en.wikipedia.org/wiki/CPU_cache#Address_translation
>
> (I so want search anchors. Instead of #Address_translation, if I could
> add $PIPT it could jump you right where you needed to go. But no,
> mozilla never did that and chrome apparently hasn't thought of it.)
The information there is kind of, well, "dated" IMO. Most modern
systems (esp. x86 and ARM) with MMU use PIPT caches, because the
disadvantages of virtual indexing are pretty big:
- Without ASIDs, you have to flush the TLB at every context switch.
ASIDs partly solve this problem but have limitations/problems of
their own.
- With virtual tagging and indexing, memory that's mapped into
multiple processes, like shared libraries, will get cached multiple
times, unless it's mapped at the same location or the same location,
which is hard to guarantee and precludes ASLR.
- With physical tagging and virtual indexing, the problem isn't quite
as bad, but to ensure the same physical memory doesn't get cached
multiple times (waste for RO memory, breaks coherency for RW memory)
you have to use alignment larger (possibly much larger) than the
page size; this is the page aliasing avoidance I was talking about.
I'm not an expert on the hardware side of things, but my understanding
is that implementing fast PIPT cache is a solved problem and shouldn't
really require anything more than a low-latency MMU.
Also, in order for your MMU to allow arbitrary mappings, rather than
having certain virtual ranges (like SH P0/P1/P2/P3/P4 zones) with
special meanings, then I don't see how you can save latency by using
virtual indexing instead of physical, since it's not determined
whether an address is cacheable until after the MMU resolves what it
maps to.
> This sounds like something nommu systems do. Are any other nommu systems
> currently device tree enabled? (Blackfish? Coldfire? Is any of the
> Cortex-M stuff actually merged in-tree yet?)
On a NOMMU system there's no such thing as a virtual/physical
distinction. Virtual addresses "are" (or map linearly to) physical
addresses and there's no multiple-mapping of the same memory at
different virtual pages.
> > Any preferences for how I do this? Just add a type field to cache_info
> > and make the default VIVT for existing models?
>
> It sounds like a device tree issue. How would you represent it there?
Even if the value for the property comes from the DT, it needs to be
stored in the kernel's data structures at runtime; that's what I was
asking about. But this is another good question. For legacy SH
cpus/socs, should the cache properties be implied by the cpu node's
"compatible" string, or should we require the DT provide them all? I
think "renesas,sh4-* cpu has VIPT cache" is a fairly reasonable
assumption to go in the kernel's handling of the cpu type, while sizes
(which might vary by subtype/soc?) _could_ go in the DT, but I think
they're actually detectable (and currently detected) via probing the
sh4 cache controller registers, and probing should probably be
preferred over DT-provided values. On the other hand, for J2 where the
cache could easily be changed or swapped out independently of the cpu,
it makes sense for all the properties to come from the DT.
BTW this document looks helpful explaining SH4 cache architecture:
http://www.stlinux.com/node/512
Rich
^ permalink raw reply [flat|nested] 4+ messages in thread