Fixing SH cache assumptions

linux-sh.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Fixing SH cache assumptions
@ 2016-03-22 21:19 Rich Felker
  2016-03-23  5:08 ` Rob Landley
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Rich Felker @ 2016-03-22 21:19 UTC (permalink / raw)
  To: linux-sh

Currently arch/sh has a hard-coded assumption that the cache is
virtually indexed (and virtually tagged, from what I can tell), and
thus needs to account for pages that may alias. While this is correct
for SH3/4, it's wrong for anything NOMMU (since there are no virtual
addresses, only physical) and the only reason SH2 works at all is
because its small cache size (256 lines * 16 bytes per line) matches
the page size, yielding an alias_mask of 0. If the cache were any
larger (like it is on J2) then the alias avoidance logic would kick in
and lead to calling kmap_coherent (which is BUG() on NOMMU) and
possibly other incorrect or suboptimal behavior.

I've avoided the issue so far on J2 simply by lying that the cache is
small, but this needs a proper fix. It would be easy to just #ifndef
out the logic that sets up alias_mask and shm_align_mask on NOMMU, but
I think it would be better to somehow represent the cache indexing in
the cache_info struct or elsewhere. In case the future J4 has a
physically indexed cache (which is my hope), such an approach should
naturally work for it with no further modifications.

Any preferences for how I do this? Just add a type field to cache_info
and make the default VIVT for existing models?

Rich

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fixing SH cache assumptions
  2016-03-22 21:19 Fixing SH cache assumptions Rich Felker
@ 2016-03-23  5:08 ` Rob Landley
  2016-03-23  7:45 ` Geert Uytterhoeven
  2016-03-23 16:41 ` Rich Felker
  2 siblings, 0 replies; 4+ messages in thread
From: Rob Landley @ 2016-03-23  5:08 UTC (permalink / raw)
  To: linux-sh

On 03/22/2016 04:19 PM, Rich Felker wrote:
> Currently arch/sh has a hard-coded assumption that the cache is
> virtually indexed (and virtually tagged, from what I can tell), and
> thus needs to account for pages that may alias. While this is correct
> for SH3/4, it's wrong for anything NOMMU (since there are no virtual
> addresses, only physical) and the only reason SH2 works at all is
> because its small cache size (256 lines * 16 bytes per line) matches
> the page size, yielding an alias_mask of 0. If the cache were any
> larger (like it is on J2) then the alias avoidance logic would kick in
> and lead to calling kmap_coherent (which is BUG() on NOMMU) and
> possibly other incorrect or suboptimal behavior.
> 
> I've avoided the issue so far on J2 simply by lying that the cache is
> small, but this needs a proper fix. It would be easy to just #ifndef
> out the logic that sets up alias_mask and shm_align_mask on NOMMU, but
> I think it would be better to somehow represent the cache indexing in
> the cache_info struct or elsewhere. In case the future J4 has a
> physically indexed cache (which is my hope), such an approach should
> naturally work for it with no further modifications.

Wikipedia[citation needed] is under the impression that physically
indexed caches are not a happy thing on systems with MMU:

https://en.wikipedia.org/wiki/CPU_cache#Address_translation

(I so want search anchors. Instead of #Address_translation, if I could
add $PIPT it could jump you right where you needed to go. But no,
mozilla never did that and chrome apparently hasn't thought of it.)

This sounds like something nommu systems do. Are any other nommu systems
currently device tree enabled? (Blackfish? Coldfire? Is any of the
Cortex-M stuff actually merged in-tree yet?)

> Any preferences for how I do this? Just add a type field to cache_info
> and make the default VIVT for existing models?

It sounds like a device tree issue. How would you represent it there?

> Rich

Rob

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fixing SH cache assumptions
  2016-03-22 21:19 Fixing SH cache assumptions Rich Felker
  2016-03-23  5:08 ` Rob Landley
@ 2016-03-23  7:45 ` Geert Uytterhoeven
  2016-03-23 16:41 ` Rich Felker
  2 siblings, 0 replies; 4+ messages in thread
From: Geert Uytterhoeven @ 2016-03-23  7:45 UTC (permalink / raw)
  To: linux-sh

On Wed, Mar 23, 2016 at 6:08 AM, Rob Landley <rob@landley.net> wrote:
> This sounds like something nommu systems do. Are any other nommu systems
> currently device tree enabled? (Blackfish? Coldfire? Is any of the
> Cortex-M stuff actually merged in-tree yet?)

$ git grep cortex-m
[...]
arch/arm/boot/dts/lpc18xx.dtsi:                 compatible = "arm,cortex-m3";
arch/arm/boot/dts/lpc4350.dtsi:                 compatible = "arm,cortex-m4";
arch/arm/boot/dts/lpc4357.dtsi:                 compatible = "arm,cortex-m4";
$

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fixing SH cache assumptions
  2016-03-22 21:19 Fixing SH cache assumptions Rich Felker
  2016-03-23  5:08 ` Rob Landley
  2016-03-23  7:45 ` Geert Uytterhoeven
@ 2016-03-23 16:41 ` Rich Felker
  2 siblings, 0 replies; 4+ messages in thread
From: Rich Felker @ 2016-03-23 16:41 UTC (permalink / raw)
  To: linux-sh

On Wed, Mar 23, 2016 at 12:08:51AM -0500, Rob Landley wrote:
> On 03/22/2016 04:19 PM, Rich Felker wrote:
> > Currently arch/sh has a hard-coded assumption that the cache is
> > virtually indexed (and virtually tagged, from what I can tell), and
> > thus needs to account for pages that may alias. While this is correct
> > for SH3/4, it's wrong for anything NOMMU (since there are no virtual
> > addresses, only physical) and the only reason SH2 works at all is
> > because its small cache size (256 lines * 16 bytes per line) matches
> > the page size, yielding an alias_mask of 0. If the cache were any
> > larger (like it is on J2) then the alias avoidance logic would kick in
> > and lead to calling kmap_coherent (which is BUG() on NOMMU) and
> > possibly other incorrect or suboptimal behavior.
> > 
> > I've avoided the issue so far on J2 simply by lying that the cache is
> > small, but this needs a proper fix. It would be easy to just #ifndef
> > out the logic that sets up alias_mask and shm_align_mask on NOMMU, but
> > I think it would be better to somehow represent the cache indexing in
> > the cache_info struct or elsewhere. In case the future J4 has a
> > physically indexed cache (which is my hope), such an approach should
> > naturally work for it with no further modifications.
> 
> Wikipedia[citation needed] is under the impression that physically
> indexed caches are not a happy thing on systems with MMU:
> 
> https://en.wikipedia.org/wiki/CPU_cache#Address_translation
> 
> (I so want search anchors. Instead of #Address_translation, if I could
> add $PIPT it could jump you right where you needed to go. But no,
> mozilla never did that and chrome apparently hasn't thought of it.)

The information there is kind of, well, "dated" IMO. Most modern
systems (esp. x86 and ARM) with MMU use PIPT caches, because the
disadvantages of virtual indexing are pretty big:

- Without ASIDs, you have to flush the TLB at every context switch.
  ASIDs partly solve this problem but have limitations/problems of
  their own.

- With virtual tagging and indexing, memory that's mapped into
  multiple processes, like shared libraries, will get cached multiple
  times, unless it's mapped at the same location or the same location,
  which is hard to guarantee and precludes ASLR.

- With physical tagging and virtual indexing, the problem isn't quite
  as bad, but to ensure the same physical memory doesn't get cached
  multiple times (waste for RO memory, breaks coherency for RW memory)
  you have to use alignment larger (possibly much larger) than the
  page size; this is the page aliasing avoidance I was talking about.

I'm not an expert on the hardware side of things, but my understanding
is that implementing fast PIPT cache is a solved problem and shouldn't
really require anything more than a low-latency MMU.

Also, in order for your MMU to allow arbitrary mappings, rather than
having certain virtual ranges (like SH P0/P1/P2/P3/P4 zones) with
special meanings, then I don't see how you can save latency by using
virtual indexing instead of physical, since it's not determined
whether an address is cacheable until after the MMU resolves what it
maps to.

> This sounds like something nommu systems do. Are any other nommu systems
> currently device tree enabled? (Blackfish? Coldfire? Is any of the
> Cortex-M stuff actually merged in-tree yet?)

On a NOMMU system there's no such thing as a virtual/physical
distinction. Virtual addresses "are" (or map linearly to) physical
addresses and there's no multiple-mapping of the same memory at
different virtual pages.

> > Any preferences for how I do this? Just add a type field to cache_info
> > and make the default VIVT for existing models?
> 
> It sounds like a device tree issue. How would you represent it there?

Even if the value for the property comes from the DT, it needs to be
stored in the kernel's data structures at runtime; that's what I was
asking about. But this is another good question. For legacy SH
cpus/socs, should the cache properties be implied by the cpu node's
"compatible" string, or should we require the DT provide them all? I
think "renesas,sh4-* cpu has VIPT cache" is a fairly reasonable
assumption to go in the kernel's handling of the cpu type, while sizes
(which might vary by subtype/soc?) _could_ go in the DT, but I think
they're actually detectable (and currently detected) via probing the
sh4 cache controller registers, and probing should probably be
preferred over DT-provided values. On the other hand, for J2 where the
cache could easily be changed or swapped out independently of the cpu,
it makes sense for all the properties to come from the DT.

BTW this document looks helpful explaining SH4 cache architecture:

http://www.stlinux.com/node/512

Rich

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-03-23 16:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-22 21:19 Fixing SH cache assumptions Rich Felker
2016-03-23  5:08 ` Rob Landley
2016-03-23  7:45 ` Geert Uytterhoeven
2016-03-23 16:41 ` Rich Felker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).