From mboxrd@z Thu Jan  1 00:00:00 1970
From: jamie@shareable.org (Jamie Lokier)
Date: Fri, 26 Mar 2010 05:45:08 +0000
Subject: ARM caches variants.
In-Reply-To: <1269423728.29073.11.camel@e102109-lin.cambridge.arm.com>
References: <20100323234949.GG20130@shareable.org>
	<1269423728.29073.11.camel@e102109-lin.cambridge.arm.com>
Message-ID: <20100326054508.GA19308@shareable.org>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

Catalin Marinas wrote:
> On Tue, 2010-03-23 at 23:49 +0000, Jamie Lokier wrote:
> > Catalin Marinas wrote:
> > > > In other word, is not the cache line used by virtual address addr:
> > > > (addr % cache size) / (cache line size)
> > >
> > > With any cache line, you have an index and a tag for identifying it. The
> > > cache may have multiple ways (e.g. 4-way associative) to speed up the
> > > look-up. For a 32KB 4-way associative cache you have 8KB per way (2^13).
> > >
> > > If the cache line size is 32B (2^5), the index of a cache line is:
> > >
> > > addr & (2^13 - 1) >> 5
> > >
> > > e.g. bits 12..5 from the VA are used for indexing the cache line.
> > >
> > > The tag is given by the rest of the top bits, in the above case bits
> > > 31..13 of the VA (if VIVT cache) or PA (VIPT cache).
> > >
> > > The cache look-up for a VA goes something like this:
> > >
> > >      1. extracts the index. With a 4-way associative cache there are 4
> > >         possible cache lines for this index
> > >      2. extracts the tag (from either VA or PA, depending on the cache
> > >         type). For VIPT caches, it needs to do a TLB look-up as well to
> > >         find the physical address
> > >      3. check the four cache lines identified by the index at step 1
> > >         against their tag
> > >      4. if the tag matches, you get a hit, otherwise a miss
> > >
> > > For your #2 and #3 issues, if two processes map the same PA using
> > > different VAs, data can end up pretty much anywhere in a VIVT cache. If
> > > you calculate the index and tag (used to identify a cache line) for two
> > > different VAs, the only common part are bits 11..5 of the index (since
> > > they are inside a page). If you want to have the same index and tag for
> > > the two different VAs, you end up with having to use the same VA in both
> > > processes.
> > >
> > > With VIPT caches, the tag is the same for issues #2 and #3. The only
> > > difference may be in a few top bits of the index. In the above case,
> > > it's bit 12 of the VA which may differ. This gives you two page colours
> > > (with 64KB 4-way associative cache you have 2 bits for the colour
> > > resulting in 4 colours).
> > 
> > That's a very helpful explanation, thank you.
> > 
> > Am I to understand that "VIPT aliasing" means there are some of those
> > bits and therefore >= 2 colours, and "VIPT non-aliasing" means the
> > cache size / ways is <= PAGE_SIZE, and therefore has effectively 1 colour?
> 
> A method to get non-aliasing VIPT is to have the way size <= PAGE_SIZE.
> That's how ARM1136 with 16K caches works. But with bigger caches, adding
> more ways may get expensive in hardware.
> 
> > I suspect some x86s have VIPT caches, especially AMD (I've seen timing
> > measurements which clearly show page colour effects), and I can only
> > imagine that aliasing is prevent by when a cache line requests to be
> > filled from higher level cache (L2), something very similar to SMP
> > MESI cache coherence gets involved to keep both lines consistent.
> > 
> > That would make a "VIPT non-aliasing" cache that has multiple colours.
> > Is that ever done on the ARM architecture?
> 
> ARMv7 has non-aliasing VIPT D-cache where the aliasing is handled by the
> hardware (maybe similar to MESI). I don't know the hardware
> implementation but my guess is that a cache look-up checks all the
> indices (4 in a 64K 4-way associative cache) and the tag may be extended
> to bit 12 (and may overlap with the index).

Loading a new cache line, or writing through it, must reach the other
indices somehow to avoid aliases.

Two kinds of behaviour come to mind:

  - Loading / writing through causes a clean+flush of all aliasing indices.
  - Loading permits multiple indices in S-state, as a MESI cache would.

I'm not sure if the difference affects what you need to do for
explicit cache flushes for DMA etc., or if it's just a
timing/performance difference.

> Note that the I-cache on ARMv7 is an aliasing VIPT (when the way size >
> PAGE_SIZE).

Aliases in a read-only cache (I-cache) don't matter, so I presume you
mean it has multiple aliases against the D-cache?

I think that would only affect what you have to do when flushing
I-cache lines after writing data, and then only if flushing has to use
the virtual address, not physical.  Is that right?

Thanks again,
-- Jamie