From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rich Felker Date: Wed, 23 Mar 2016 16:41:06 +0000 Subject: Re: Fixing SH cache assumptions Message-Id: <20160323164106.GN21636@brightrain.aerifal.cx> List-Id: References: <20160322211905.GA11781@brightrain.aerifal.cx> In-Reply-To: <20160322211905.GA11781@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-sh@vger.kernel.org On Wed, Mar 23, 2016 at 12:08:51AM -0500, Rob Landley wrote: > On 03/22/2016 04:19 PM, Rich Felker wrote: > > Currently arch/sh has a hard-coded assumption that the cache is > > virtually indexed (and virtually tagged, from what I can tell), and > > thus needs to account for pages that may alias. While this is correct > > for SH3/4, it's wrong for anything NOMMU (since there are no virtual > > addresses, only physical) and the only reason SH2 works at all is > > because its small cache size (256 lines * 16 bytes per line) matches > > the page size, yielding an alias_mask of 0. If the cache were any > > larger (like it is on J2) then the alias avoidance logic would kick in > > and lead to calling kmap_coherent (which is BUG() on NOMMU) and > > possibly other incorrect or suboptimal behavior. > > > > I've avoided the issue so far on J2 simply by lying that the cache is > > small, but this needs a proper fix. It would be easy to just #ifndef > > out the logic that sets up alias_mask and shm_align_mask on NOMMU, but > > I think it would be better to somehow represent the cache indexing in > > the cache_info struct or elsewhere. In case the future J4 has a > > physically indexed cache (which is my hope), such an approach should > > naturally work for it with no further modifications. > > Wikipedia[citation needed] is under the impression that physically > indexed caches are not a happy thing on systems with MMU: > > https://en.wikipedia.org/wiki/CPU_cache#Address_translation > > (I so want search anchors. Instead of #Address_translation, if I could > add $PIPT it could jump you right where you needed to go. But no, > mozilla never did that and chrome apparently hasn't thought of it.) The information there is kind of, well, "dated" IMO. Most modern systems (esp. x86 and ARM) with MMU use PIPT caches, because the disadvantages of virtual indexing are pretty big: - Without ASIDs, you have to flush the TLB at every context switch. ASIDs partly solve this problem but have limitations/problems of their own. - With virtual tagging and indexing, memory that's mapped into multiple processes, like shared libraries, will get cached multiple times, unless it's mapped at the same location or the same location, which is hard to guarantee and precludes ASLR. - With physical tagging and virtual indexing, the problem isn't quite as bad, but to ensure the same physical memory doesn't get cached multiple times (waste for RO memory, breaks coherency for RW memory) you have to use alignment larger (possibly much larger) than the page size; this is the page aliasing avoidance I was talking about. I'm not an expert on the hardware side of things, but my understanding is that implementing fast PIPT cache is a solved problem and shouldn't really require anything more than a low-latency MMU. Also, in order for your MMU to allow arbitrary mappings, rather than having certain virtual ranges (like SH P0/P1/P2/P3/P4 zones) with special meanings, then I don't see how you can save latency by using virtual indexing instead of physical, since it's not determined whether an address is cacheable until after the MMU resolves what it maps to. > This sounds like something nommu systems do. Are any other nommu systems > currently device tree enabled? (Blackfish? Coldfire? Is any of the > Cortex-M stuff actually merged in-tree yet?) On a NOMMU system there's no such thing as a virtual/physical distinction. Virtual addresses "are" (or map linearly to) physical addresses and there's no multiple-mapping of the same memory at different virtual pages. > > Any preferences for how I do this? Just add a type field to cache_info > > and make the default VIVT for existing models? > > It sounds like a device tree issue. How would you represent it there? Even if the value for the property comes from the DT, it needs to be stored in the kernel's data structures at runtime; that's what I was asking about. But this is another good question. For legacy SH cpus/socs, should the cache properties be implied by the cpu node's "compatible" string, or should we require the DT provide them all? I think "renesas,sh4-* cpu has VIPT cache" is a fairly reasonable assumption to go in the kernel's handling of the cpu type, while sizes (which might vary by subtype/soc?) _could_ go in the DT, but I think they're actually detectable (and currently detected) via probing the sh4 cache controller registers, and probing should probably be preferred over DT-provided values. On the other hand, for J2 where the cache could easily be changed or swapped out independently of the cpu, it makes sense for all the properties to come from the DT. BTW this document looks helpful explaining SH4 cache architecture: http://www.stlinux.com/node/512 Rich