L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
@ 2012-05-14  7:03 Murali N
  2012-05-14 15:50 ` Lorenzo Pieralisi
  0 siblings, 1 reply; 26+ messages in thread
From: Murali N @ 2012-05-14  7:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi All,
I have a query on cache flush sequence being followed for L1 & L2
while target going into deep low power state on CortexA5 MPCore.
Here are the H/W details & the cache flush sequence i am following in
my power driver:

H/W details:
1.?????? ?APPS processor: CortexA5 MPCore
2.?????? ?L2 controller: External PL310 r3p2

Sequences:
a)?While target is going into deep low power mode (where APPS
processor + L2 loose their power) currently I am following the below
cache flush sequence.

1.?L2 cache clean & invalidate
2.?L2 disable
3.?L1 clean & invalidate
4.?L1 disable
5.?WFI

b)?But when I look the PL310 r3p2 TRM (page no 91) explains the
sequence to be followed is bit difference than what I am following.

1.?L1 clean & invalidate
2.?L1 disable
3.?L2 cache clean & invalidate
4.?L2 disable
5.?WFI

Is it mandatory that I would follow only the sequence that is
mentioned in the TRM (i.e. b)? (OR) though TRM says above sequence
(i.e. b) can i still follow the steps (i.e. a)?
What are problems that I see, if I don?t follow what TRM says & follow
the sequence which I have mentioned above (i.e. a)?

Also I have worked on another target with CortexA5 (Single core with
same L2 pl310 controller) where i have followed the sequence ?a? for
quite a long time and don?t see any data corruption issues.

Here my question is, is the above sequence ?b? something special for
only CortexA5MPCore targets to follow?

>From the system stability wise I don?t see any improvement after I
moved to a sequence mentioned in the TRM (i.e. b) for CortexA5 MPCore
target.

Please provide your valuable inputs if you guys have seen similar
issues on other targets?

--
Regards,
Murali N

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-14  7:03 L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes Murali N
@ 2012-05-14 15:50 ` Lorenzo Pieralisi
  2012-05-14 15:58   ` Russell King - ARM Linux
  0 siblings, 1 reply; 26+ messages in thread
From: Lorenzo Pieralisi @ 2012-05-14 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Mon, May 14, 2012 at 08:03:04AM +0100, Murali N wrote:
> Hi All,
> I have a query on cache flush sequence being followed for L1 & L2
> while target going into deep low power state on CortexA5 MPCore.
> Here are the H/W details & the cache flush sequence i am following in
> my power driver:
> 
> H/W details:
> 1.        APPS processor: CortexA5 MPCore
> 2.        L2 controller: External PL310 r3p2
> 
> Sequences:
> a) While target is going into deep low power mode (where APPS
> processor + L2 loose their power) currently I am following the below
> cache flush sequence.
> 
> 1. L2 cache clean & invalidate

This is wrong. If L1 evictions happen here you will kiss those cache lines
goodbye when the cluster is powered off. See below.

> 2. L2 disable
> 3. L1 clean & invalidate

This is wrong again since while cleaning and invalidating the cache (L1 here)
can still allocate and this must not happen.

> 4. L1 disable
> 5. WFI
> 
> b) But when I look the PL310 r3p2 TRM (page no 91) explains the
> sequence to be followed is bit difference than what I am following.
> 
> 1. L1 clean & invalidate
> 2. L1 disable
> 3. L2 cache clean & invalidate
> 4. L2 disable
> 5. WFI

You are *extrapolating* the procedure above from the TRM, but that's not
100% correct.

For a single CPU shutdown the procedure is the following:

1) clear C bit in SCTLR. CPU won't allocate cache lines in integrated
   (L1 for A5) caches anymore. Memory access might still hit in the cache,
   but that's not a problem, you just want to writeback the content of caches
   to DDR on power down.
   This is subtle but important. If a dirty cache line is moved from one
   processor to the one going down while cleaning the cache, the cache line is
   lost (dirty lines can be moved between processors).
   Clearing the C bit BEFORE starting the cache clean prevents that.
2) clean and invalidate the cache levels (L1 in A5)
3) exit coherency (clear SMP bit in ACTLR)

If the cluster has to be shut down as well and L2 is not retained through power
down:

4) clean and invalidate L2
5) disable PL310

Please note that 5 might not be strictly required, it depends on your specific
HW configuration and how AXI transactions interact with the power
controller. If you want to be on the safe side, (5) has to be executed.

Please note that PL310 can be disabled before cleaning and invalidating
L2. If you carry out the operations in the order above, code must NOT
write any static data that has to be preserved throughout shutdown
between (4) and (5).
The C bit in SCTLR does not affect PL310 since it is external to the core so
you could end up allocating cache lines after the entire content of L2
has been cleaned. If those lines are just eg stack lines that can be
discarded then fine, but if that data is to be preserved and consistent
through shut down then you have been warned.

I suggest you have a look at OMAP4 CPU idle implementation where the
above is implemented in detail, inclusive of cpu_{suspend}/cpu_{resume}
API that provides the infrastructure on top of which cache management
code must be built.

> Is it mandatory that I would follow only the sequence that is
> mentioned in the TRM (i.e. b)? (OR) though TRM says above sequence
> (i.e. b) can i still follow the steps (i.e. a)?
> What are problems that I see, if I don't follow what TRM says & follow
> the sequence which I have mentioned above (i.e. a)?

Yes, it is mandatory. I hope I explained why thoroughly.
And (b), as it stands in your description is wrong and I explained why it is
so.

> Also I have worked on another target with CortexA5 (Single core with
> same L2 pl310 controller) where i have followed the sequence 'a' for
> quite a long time and don't see any data corruption issues.

This does not mean the procedure is correct.

> Here my question is, is the above sequence 'b' something special for
> only CortexA5MPCore targets to follow?

(b) is wrong, and the "patched" procedure I provided you with works for all
ARM MP systems (and consequently UP as well).

Hope that helps, feel free to come back to us for any questions.

Lorenzo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-14 15:50 ` Lorenzo Pieralisi
@ 2012-05-14 15:58   ` Russell King - ARM Linux
  2012-05-14 16:21     ` Lorenzo Pieralisi
  0 siblings, 1 reply; 26+ messages in thread
From: Russell King - ARM Linux @ 2012-05-14 15:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 14, 2012 at 04:50:22PM +0100, Lorenzo Pieralisi wrote:
> > 2. L2 disable
> > 3. L1 clean & invalidate
> 
> This is wrong again since while cleaning and invalidating the cache (L1 here)
> can still allocate and this must not happen.

No it isn't.  There is never anything wrong with allocating new caches lines
into a cache which is going to (eventually) be powered down.  Ever.

What would be wrong is if we end up with dirty cache lines in the cache
to be powered down for data which we _care_ about preserving when power
is lost.

That's a _very_ _very_ important difference.

Sure, if we're talking about avoiding cache snooping etc, then we may
wish to disable coherency, but, again, there's absolutely nothing wrong
with allocating cache lines.

Take a moment to think why this is.  Where's the data pulled into the
cache stored - in RAM.  The copy in the cache, while it remains clean,
is just a duplicate of what's already stored elsewhere in the system.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-14 15:58   ` Russell King - ARM Linux
@ 2012-05-14 16:21     ` Lorenzo Pieralisi
  2012-05-14 16:39       ` Russell King - ARM Linux
  2013-12-24 17:52       ` Antti Miettinen
  0 siblings, 2 replies; 26+ messages in thread
From: Lorenzo Pieralisi @ 2012-05-14 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 14, 2012 at 04:58:59PM +0100, Russell King - ARM Linux wrote:
> On Mon, May 14, 2012 at 04:50:22PM +0100, Lorenzo Pieralisi wrote:
> > > 2. L2 disable
> > > 3. L1 clean & invalidate
> > 
> > This is wrong again since while cleaning and invalidating the cache (L1 here)
> > can still allocate and this must not happen.
> 
> No it isn't.  There is never anything wrong with allocating new caches lines
> into a cache which is going to (eventually) be powered down.  Ever.

What if the cache allocates a dirty cache line moved from L1 of another
processor ?

> What would be wrong is if we end up with dirty cache lines in the cache
> to be powered down for data which we _care_ about preserving when power
> is lost.
> 
> That's a _very_ _very_ important difference.

That's exactly the point I am making. dirty cache lines can be migrated across
processors caches. If we want to shut down a single core we have to be 100%
sure that dirty cache lines (if we care about that data, we might be not as you
pointed out) must not be present in L1 when we shut the core down. If the C
bit in SCTLR is not cleared before cleaning and invalidating this is not
guaranteed from an architectural point of view.

Occurences might be rare, but it is still not safe to clean the cache with the
C bit set.

> Sure, if we're talking about avoiding cache snooping etc, then we may
> wish to disable coherency, but, again, there's absolutely nothing wrong
> with allocating cache lines.
> 
> Take a moment to think why this is.  Where's the data pulled into the
> cache stored - in RAM.  The copy in the cache, while it remains clean,
> is just a duplicate of what's already stored elsewhere in the system.

It can be stored in other caches RAM too on MP systems.
While it remains clean, fine. It is dirty cache lines migration I am
talking about.

Lorenzo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-14 16:21     ` Lorenzo Pieralisi
@ 2012-05-14 16:39       ` Russell King - ARM Linux
  2012-05-14 17:15         ` Lorenzo Pieralisi
  2013-12-24 17:52       ` Antti Miettinen
  1 sibling, 1 reply; 26+ messages in thread
From: Russell King - ARM Linux @ 2012-05-14 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 14, 2012 at 05:21:50PM +0100, Lorenzo Pieralisi wrote:
> On Mon, May 14, 2012 at 04:58:59PM +0100, Russell King - ARM Linux wrote:
> > On Mon, May 14, 2012 at 04:50:22PM +0100, Lorenzo Pieralisi wrote:
> > > > 2. L2 disable
> > > > 3. L1 clean & invalidate
> > > 
> > > This is wrong again since while cleaning and invalidating the cache (L1 here)
> > > can still allocate and this must not happen.
> > 
> > No it isn't.  There is never anything wrong with allocating new caches lines
> > into a cache which is going to (eventually) be powered down.  Ever.
> 
> What if the cache allocates a dirty cache line moved from L1 of another
> processor ?
> 
> > What would be wrong is if we end up with dirty cache lines in the cache
> > to be powered down for data which we _care_ about preserving when power
> > is lost.
> > 
> > That's a _very_ _very_ important difference.
> 
> That's exactly the point I am making. dirty cache lines can be migrated across
> processors caches. If we want to shut down a single core we have to be 100%
> sure that dirty cache lines (if we care about that data, we might be not as you
> pointed out) must not be present in L1 when we shut the core down. If the C
> bit in SCTLR is not cleared before cleaning and invalidating this is not
> guaranteed from an architectural point of view.
> 
> Occurences might be rare, but it is still not safe to clean the cache with the
> C bit set.

It's not safe to disable the C bit without first pushing the dirty data out
to RAM either.  It's a catch-22 situation - because turning the C bit off
not only stops the caches allocating new lines but also prevents them being
searched.

That means your view of cacheable memory suddenly changes beneath you when
the C bit is turned off.

>From what you're saying - and from my understanding of your cache behaviours,
even the sequence:
- clean cache
- disable C bit
- clean cache
is buggy.

I think what you're effectively saying is that it is not possible to safely
power down a cache on an ARM SMP CPU...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-14 16:39       ` Russell King - ARM Linux
@ 2012-05-14 17:15         ` Lorenzo Pieralisi
  2012-05-15  9:25           ` Murali N
  2012-05-15  9:40           ` Russell King - ARM Linux
  0 siblings, 2 replies; 26+ messages in thread
From: Lorenzo Pieralisi @ 2012-05-14 17:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 14, 2012 at 05:39:09PM +0100, Russell King - ARM Linux wrote:
> On Mon, May 14, 2012 at 05:21:50PM +0100, Lorenzo Pieralisi wrote:
> > On Mon, May 14, 2012 at 04:58:59PM +0100, Russell King - ARM Linux wrote:
> > > On Mon, May 14, 2012 at 04:50:22PM +0100, Lorenzo Pieralisi wrote:
> > > > > 2. L2 disable
> > > > > 3. L1 clean & invalidate
> > > > 
> > > > This is wrong again since while cleaning and invalidating the cache (L1 here)
> > > > can still allocate and this must not happen.
> > > 
> > > No it isn't.  There is never anything wrong with allocating new caches lines
> > > into a cache which is going to (eventually) be powered down.  Ever.
> > 
> > What if the cache allocates a dirty cache line moved from L1 of another
> > processor ?
> > 
> > > What would be wrong is if we end up with dirty cache lines in the cache
> > > to be powered down for data which we _care_ about preserving when power
> > > is lost.
> > > 
> > > That's a _very_ _very_ important difference.
> > 
> > That's exactly the point I am making. dirty cache lines can be migrated across
> > processors caches. If we want to shut down a single core we have to be 100%
> > sure that dirty cache lines (if we care about that data, we might be not as you
> > pointed out) must not be present in L1 when we shut the core down. If the C
> > bit in SCTLR is not cleared before cleaning and invalidating this is not
> > guaranteed from an architectural point of view.
> > 
> > Occurences might be rare, but it is still not safe to clean the cache with the
> > C bit set.
> 
> It's not safe to disable the C bit without first pushing the dirty data out
> to RAM either.  It's a catch-22 situation - because turning the C bit off
> not only stops the caches allocating new lines but also prevents them being
> searched.

That depends on the processor. On A9 cache is bypassed on A15 it is not,
data access might still hit in the cache.

It is "implementation defined" according to ARM ARM (B2-1265).
But C bit cleared stops allocation that's true across all implementations.

> That means your view of cacheable memory suddenly changes beneath you when
> the C bit is turned off.

Yes might be (see above) but the cache operations still work so we do
not have any problem (well, as long as we clean and invalidate without
using data that can live in the cache, but that's how it is done on v7 cache
flush ops and it is perfectly fine).

> From what you're saying - and from my understanding of your cache behaviours,
> even the sequence:
> - clean cache
> - disable C bit
> - clean cache
> is buggy.

No, that's correct, works fine on A9 and A15. Second clean is mostly nops.

> 
> I think what you're effectively saying is that it is not possible to safely
> power down a cache on an ARM SMP CPU...

It is possible, but the final clean must be done with C bit cleared. It is
belt and braces, agreed, but that's the only way to do it properly.

Lorenzo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-14 17:15         ` Lorenzo Pieralisi
@ 2012-05-15  9:25           ` Murali N
  2012-05-15  9:40           ` Russell King - ARM Linux
  1 sibling, 0 replies; 26+ messages in thread
From: Murali N @ 2012-05-15  9:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 14, 2012 at 10:45 PM, Lorenzo Pieralisi
<lorenzo.pieralisi@arm.com> wrote:
> On Mon, May 14, 2012 at 05:39:09PM +0100, Russell King - ARM Linux wrote:
>> On Mon, May 14, 2012 at 05:21:50PM +0100, Lorenzo Pieralisi wrote:
>> > On Mon, May 14, 2012 at 04:58:59PM +0100, Russell King - ARM Linux wrote:
>> > > On Mon, May 14, 2012 at 04:50:22PM +0100, Lorenzo Pieralisi wrote:
>> > > > > 2. L2 disable
>> > > > > 3. L1 clean & invalidate
>> > > >
>> > > > This is wrong again since while cleaning and invalidating the cache (L1 here)
>> > > > can still allocate and this must not happen.
>> > >
>> > > No it isn't. ?There is never anything wrong with allocating new caches lines
>> > > into a cache which is going to (eventually) be powered down. ?Ever.
>> >
>> > What if the cache allocates a dirty cache line moved from L1 of another
>> > processor ?
>> >
>> > > What would be wrong is if we end up with dirty cache lines in the cache
>> > > to be powered down for data which we _care_ about preserving when power
>> > > is lost.
>> > >
>> > > That's a _very_ _very_ important difference.
>> >
>> > That's exactly the point I am making. dirty cache lines can be migrated across
>> > processors caches. If we want to shut down a single core we have to be 100%
>> > sure that dirty cache lines (if we care about that data, we might be not as you
>> > pointed out) must not be present in L1 when we shut the core down. If the C
>> > bit in SCTLR is not cleared before cleaning and invalidating this is not
>> > guaranteed from an architectural point of view.
>> >
>> > Occurences might be rare, but it is still not safe to clean the cache with the
>> > C bit set.
>>
>> It's not safe to disable the C bit without first pushing the dirty data out
>> to RAM either. ?It's a catch-22 situation - because turning the C bit off
>> not only stops the caches allocating new lines but also prevents them being
>> searched.
>
> That depends on the processor. On A9 cache is bypassed on A15 it is not,
> data access might still hit in the cache.
>
> It is "implementation defined" according to ARM ARM (B2-1265).
> But C bit cleared stops allocation that's true across all implementations.
>
>> That means your view of cacheable memory suddenly changes beneath you when
>> the C bit is turned off.
>
> Yes might be (see above) but the cache operations still work so we do
> not have any problem (well, as long as we clean and invalidate without
> using data that can live in the cache, but that's how it is done on v7 cache
> flush ops and it is perfectly fine).
>
>> From what you're saying - and from my understanding of your cache behaviours,
>> even the sequence:
>> - clean cache
>> - disable C bit
>> - clean cache
>> is buggy.
>
> No, that's correct, works fine on A9 and A15. Second clean is mostly nops.
>
>>
>> I think what you're effectively saying is that it is not possible to safely
>> power down a cache on an ARM SMP CPU...
>
> It is possible, but the final clean must be done with C bit cleared. It is
> belt and braces, agreed, but that's the only way to do it properly.
>
> Lorenzo
>

In my case while powering down the core0, core1 is already in *off*
state and out of coherency.
So, while shutting down the CPU0 (cpu1 is already off) still i need to
follow the steps you have mentioned for effective power down of a
cache?
My feel is while going to shutdown effectively i operate in a single
core mode, so it doesn't make any difference in following either of
the sequence?

Please correct me if i am wrong.

-- 
Regards,
Murali N

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-14 17:15         ` Lorenzo Pieralisi
  2012-05-15  9:25           ` Murali N
@ 2012-05-15  9:40           ` Russell King - ARM Linux
  2012-05-15 10:09             ` Lorenzo Pieralisi
  1 sibling, 1 reply; 26+ messages in thread
From: Russell King - ARM Linux @ 2012-05-15  9:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 14, 2012 at 06:15:33PM +0100, Lorenzo Pieralisi wrote:
> On Mon, May 14, 2012 at 05:39:09PM +0100, Russell King - ARM Linux wrote:
> > From what you're saying - and from my understanding of your cache behaviours,
> > even the sequence:
> > - clean cache
> > - disable C bit
> > - clean cache
> > is buggy.
> 
> No, that's correct, works fine on A9 and A15. Second clean is mostly nops.

It's racy.  Consider this:

	- clean cache
	- cache speculatively prefetches a dirty cache line from another CPU
	- disable C bit

At this point, you lose access to that dirty data.  If that dirty data is
used inbetween disabling the C bit and cleaning the cache for the second
time, you have data corruption issues.

Another point which needs to be checked is whether dirty cache lines in
a CPUs cache which has had the C bit disabled still take part in the
coherency protocol with other CPUs.  If the answer is no, then that's a
_major_ problem for the hot unplug code paths.  That effectively means
that we have a window where a CPU going down actively _corrupts_ the
data visible to other CPUs.

As I have said, given what you've mentioned, it is impossible to safely
disable the cache on a SMP system.  In order to do it safely, you need to
have a way to disable new allocations into the cache _without_ disabling
the ability for the cache to be searched.

And if we could do that, then the sequence becomes a simple and race free:

	- disable new allocations
	- clean cache
	- disable cache

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-15  9:40           ` Russell King - ARM Linux
@ 2012-05-15 10:09             ` Lorenzo Pieralisi
  2012-05-15 10:15               ` Russell King - ARM Linux
  0 siblings, 1 reply; 26+ messages in thread
From: Lorenzo Pieralisi @ 2012-05-15 10:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 15, 2012 at 10:40:10AM +0100, Russell King - ARM Linux wrote:
> On Mon, May 14, 2012 at 06:15:33PM +0100, Lorenzo Pieralisi wrote:
> > On Mon, May 14, 2012 at 05:39:09PM +0100, Russell King - ARM Linux wrote:
> > > From what you're saying - and from my understanding of your cache behaviours,
> > > even the sequence:
> > > - clean cache
> > > - disable C bit
> > > - clean cache
> > > is buggy.
> > 
> > No, that's correct, works fine on A9 and A15. Second clean is mostly nops.
> 
> It's racy.  Consider this:
> 
> 	- clean cache
> 	- cache speculatively prefetches a dirty cache line from another CPU
> 	- disable C bit
	- clean cache

> At this point, you lose access to that dirty data.  If that dirty data is
> used inbetween disabling the C bit and cleaning the cache for the second
> time, you have data corruption issues.

It is not racy. After disabling the C bit the cache clean operations write-back
any dirty cache line to the next cache level. And the CPU is still in coherency
mode so there is not a problem with that either.

> Another point which needs to be checked is whether dirty cache lines in
> a CPUs cache which has had the C bit disabled still take part in the
> coherency protocol with other CPUs.  If the answer is no, then that's a
> _major_ problem for the hot unplug code paths.  That effectively means
> that we have a window where a CPU going down actively _corrupts_ the
> data visible to other CPUs.

See above.

> As I have said, given what you've mentioned, it is impossible to safely
> disable the cache on a SMP system.  In order to do it safely, you need to
> have a way to disable new allocations into the cache _without_ disabling
> the ability for the cache to be searched.

Cache lines can be acted upon with maintenance operations whether the C bit is
set or clear. For instance caches can be invalidated when the MMU is off
and the C bit is clear, eg v7 boot.

Cache cleaning and cache enabling/disabling are two different things, that's
valid for the PL310 as well.

Lorenzo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-15 10:09             ` Lorenzo Pieralisi
@ 2012-05-15 10:15               ` Russell King - ARM Linux
  2012-05-15 16:28                 ` Lorenzo Pieralisi
  0 siblings, 1 reply; 26+ messages in thread
From: Russell King - ARM Linux @ 2012-05-15 10:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 15, 2012 at 11:09:02AM +0100, Lorenzo Pieralisi wrote:
> On Tue, May 15, 2012 at 10:40:10AM +0100, Russell King - ARM Linux wrote:
> > On Mon, May 14, 2012 at 06:15:33PM +0100, Lorenzo Pieralisi wrote:
> > > On Mon, May 14, 2012 at 05:39:09PM +0100, Russell King - ARM Linux wrote:
> > > > From what you're saying - and from my understanding of your cache behaviours,
> > > > even the sequence:
> > > > - clean cache
> > > > - disable C bit
> > > > - clean cache
> > > > is buggy.
> > > 
> > > No, that's correct, works fine on A9 and A15. Second clean is mostly nops.
> > 
> > It's racy.  Consider this:
> > 
> > 	- clean cache
> > 	- cache speculatively prefetches a dirty cache line from another CPU
> > 	- disable C bit
> 	- clean cache

Thank you for totally missing the point and destroying the example.

> > At this point, you lose access to that dirty data.  If that dirty data is
> > used inbetween disabling the C bit and cleaning the cache for the second
> > time, you have data corruption issues.
> 
> It is not racy. After disabling the C bit the cache clean operations write-back
> any dirty cache line to the next cache level. And the CPU is still in coherency
> mode so there is not a problem with that either.

No.  *THINK* about the exact example I gave you.  Think about what state
the CPU sees between that "disable C bit" and the final cache clean (which
you seem to be insisting is an atomic operation.)

Please, read what I'm saying rather than re-interpreting it, augmenting it
and then answering something entirely different.

> > As I have said, given what you've mentioned, it is impossible to safely
> > disable the cache on a SMP system.  In order to do it safely, you need to
> > have a way to disable new allocations into the cache _without_ disabling
> > the ability for the cache to be searched.
> 
> Cache lines can be acted upon with maintenance operations whether the C bit is
> set or clear. For instance caches can be invalidated when the MMU is off
> and the C bit is clear, eg v7 boot.
> 
> Cache cleaning and cache enabling/disabling are two different things, that's
> valid for the PL310 as well.

Yes, of course I realise that.  That's not what I'm talking about at all.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-15 10:15               ` Russell King - ARM Linux
@ 2012-05-15 16:28                 ` Lorenzo Pieralisi
  2012-05-15 16:36                   ` Russell King - ARM Linux
  0 siblings, 1 reply; 26+ messages in thread
From: Lorenzo Pieralisi @ 2012-05-15 16:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 15, 2012 at 11:15:05AM +0100, Russell King - ARM Linux wrote:
> On Tue, May 15, 2012 at 11:09:02AM +0100, Lorenzo Pieralisi wrote:
> > On Tue, May 15, 2012 at 10:40:10AM +0100, Russell King - ARM Linux wrote:
> > > On Mon, May 14, 2012 at 06:15:33PM +0100, Lorenzo Pieralisi wrote:
> > > > On Mon, May 14, 2012 at 05:39:09PM +0100, Russell King - ARM Linux wrote:
> > > > > From what you're saying - and from my understanding of your cache behaviours,
> > > > > even the sequence:
> > > > > - clean cache
> > > > > - disable C bit
> > > > > - clean cache
> > > > > is buggy.
> > > > 
> > > > No, that's correct, works fine on A9 and A15. Second clean is mostly nops.
> > > 
> > > It's racy.  Consider this:
> > > 
> > > 	- clean cache
> > > 	- cache speculatively prefetches a dirty cache line from another CPU
> > > 	- disable C bit
> > 	- clean cache
> 
> Thank you for totally missing the point and destroying the example.
> 
> > > At this point, you lose access to that dirty data.  If that dirty data is
> > > used inbetween disabling the C bit and cleaning the cache for the second
> > > time, you have data corruption issues.
> > 
> > It is not racy. After disabling the C bit the cache clean operations write-back
> > any dirty cache line to the next cache level. And the CPU is still in coherency
> > mode so there is not a problem with that either.
> 
> No.  *THINK* about the exact example I gave you.  Think about what state
> the CPU sees between that "disable C bit" and the final cache clean (which
> you seem to be insisting is an atomic operation.)
> 
> Please, read what I'm saying rather than re-interpreting it, augmenting it
> and then answering something entirely different.
> 
> > > As I have said, given what you've mentioned, it is impossible to safely
> > > disable the cache on a SMP system.  In order to do it safely, you need to
> > > have a way to disable new allocations into the cache _without_ disabling
> > > the ability for the cache to be searched.

First off, my apologies, it was not meant to disrupt the discussion, if
I did sorry about that. Let's try to sum it up:

1) Hitting in the cache when the SCTLR.C is cleared is CPU specific (eg A9
   does not, A15 does)
2) as long as they are taking part in coherency (SMP bit set in ACTLR), all
   Cortex-A cores in a MP configuration with the SCTLR.C bit set can hit in
   the cache of a CPU that runs with the C bit cleared in SCTLR
3) The sequence:
        - clean cache
        - clear SCTLR.C
        - clean cache

is correct and we must mandate it, with the following remarks:

        - The first cache clean is superfluos (but does no harm)
        - The second cache clean must not rely on any data that might
          sit in the cache
        - clearing SCTLR.C and cleaning the cache must be coded in
          assembly in a function carrying out both operations (to avoid
          stack issues ie cacheable push/pop ops and any global data
          reference)

4) Current vexpress hotplug code clears ACTLR.SMP bit before clearing
   SCTLR.C; this is a bug according to this discussion and we must fix
   it (to avoid copy'n'paste of code that does not follow the standard
   for platforms that have PM capabilities beyond standbywfi)

Please let me know if I am missing something and thanks for the discussion.

Lorenzo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-15 16:28                 ` Lorenzo Pieralisi
@ 2012-05-15 16:36                   ` Russell King - ARM Linux
  2012-05-15 17:05                     ` Lorenzo Pieralisi
  2012-05-15 18:17                     ` Will Deacon
  0 siblings, 2 replies; 26+ messages in thread
From: Russell King - ARM Linux @ 2012-05-15 16:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 15, 2012 at 05:28:51PM +0100, Lorenzo Pieralisi wrote:
> On Tue, May 15, 2012 at 11:15:05AM +0100, Russell King - ARM Linux wrote:
> > On Tue, May 15, 2012 at 11:09:02AM +0100, Lorenzo Pieralisi wrote:
> > > On Tue, May 15, 2012 at 10:40:10AM +0100, Russell King - ARM Linux wrote:
> > > > On Mon, May 14, 2012 at 06:15:33PM +0100, Lorenzo Pieralisi wrote:
> > > > > On Mon, May 14, 2012 at 05:39:09PM +0100, Russell King - ARM Linux wrote:
> > > > > > From what you're saying - and from my understanding of your cache behaviours,
> > > > > > even the sequence:
> > > > > > - clean cache
> > > > > > - disable C bit
> > > > > > - clean cache
> > > > > > is buggy.
> > > > > 
> > > > > No, that's correct, works fine on A9 and A15. Second clean is mostly nops.
> > > > 
> > > > It's racy.  Consider this:
> > > > 
> > > > 	- clean cache
> > > > 	- cache speculatively prefetches a dirty cache line from another CPU
> > > > 	- disable C bit
> > > 	- clean cache
> > 
> > Thank you for totally missing the point and destroying the example.
> > 
> > > > At this point, you lose access to that dirty data.  If that dirty data is
> > > > used inbetween disabling the C bit and cleaning the cache for the second
> > > > time, you have data corruption issues.
> > > 
> > > It is not racy. After disabling the C bit the cache clean operations write-back
> > > any dirty cache line to the next cache level. And the CPU is still in coherency
> > > mode so there is not a problem with that either.
> > 
> > No.  *THINK* about the exact example I gave you.  Think about what state
> > the CPU sees between that "disable C bit" and the final cache clean (which
> > you seem to be insisting is an atomic operation.)
> > 
> > Please, read what I'm saying rather than re-interpreting it, augmenting it
> > and then answering something entirely different.
> > 
> > > > As I have said, given what you've mentioned, it is impossible to safely
> > > > disable the cache on a SMP system.  In order to do it safely, you need to
> > > > have a way to disable new allocations into the cache _without_ disabling
> > > > the ability for the cache to be searched.
> 
> First off, my apologies, it was not meant to disrupt the discussion, if
> I did sorry about that. Let's try to sum it up:
> 
> 1) Hitting in the cache when the SCTLR.C is cleared is CPU specific (eg A9
>    does not, A15 does)
> 2) as long as they are taking part in coherency (SMP bit set in ACTLR), all
>    Cortex-A cores in a MP configuration with the SCTLR.C bit set can hit in
>    the cache of a CPU that runs with the C bit cleared in SCTLR
> 3) The sequence:
>         - clean cache
>         - clear SCTLR.C
>         - clean cache
> 
> is correct

I continue to disagree with that assertion.

I repeat: what happens in this situation on A9:

	- clean cache
	- cache speculatively prefetches data from another core
	- clear SCTLR.C
	- _this_ core accesses the address associated with that prefetched
	  data

_That_ is a data corruption issue - as soon as SCTLR.C is cleared, the CPUs
view of data in memory _changes_, and is only restored to what it should
be when the dirty cache lines are finally flushed out of the cache.  And
then, hey presto, the data magically changes again.

So, the above sequence is _not_ safe, and it's _not_ "correct".

It _might_ be the closest thing you can get to given the broken hardware
design, but calling this _correct_ is a silly thing to do when it contains
such a problem.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-15 16:36                   ` Russell King - ARM Linux
@ 2012-05-15 17:05                     ` Lorenzo Pieralisi
  2012-09-19  8:55                       ` Antti P Miettinen
  2012-05-15 18:17                     ` Will Deacon
  1 sibling, 1 reply; 26+ messages in thread
From: Lorenzo Pieralisi @ 2012-05-15 17:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 15, 2012 at 05:36:18PM +0100, Russell King - ARM Linux wrote:
> On Tue, May 15, 2012 at 05:28:51PM +0100, Lorenzo Pieralisi wrote:
> > On Tue, May 15, 2012 at 11:15:05AM +0100, Russell King - ARM Linux wrote:
> > > On Tue, May 15, 2012 at 11:09:02AM +0100, Lorenzo Pieralisi wrote:
> > > > On Tue, May 15, 2012 at 10:40:10AM +0100, Russell King - ARM Linux wrote:
> > > > > On Mon, May 14, 2012 at 06:15:33PM +0100, Lorenzo Pieralisi wrote:
> > > > > > On Mon, May 14, 2012 at 05:39:09PM +0100, Russell King - ARM Linux wrote:
> > > > > > > From what you're saying - and from my understanding of your cache behaviours,
> > > > > > > even the sequence:
> > > > > > > - clean cache
> > > > > > > - disable C bit
> > > > > > > - clean cache
> > > > > > > is buggy.
> > > > > > 
> > > > > > No, that's correct, works fine on A9 and A15. Second clean is mostly nops.
> > > > > 
> > > > > It's racy.  Consider this:
> > > > > 
> > > > > 	- clean cache
> > > > > 	- cache speculatively prefetches a dirty cache line from another CPU
> > > > > 	- disable C bit
> > > > 	- clean cache
> > > 
> > > Thank you for totally missing the point and destroying the example.
> > > 
> > > > > At this point, you lose access to that dirty data.  If that dirty data is
> > > > > used inbetween disabling the C bit and cleaning the cache for the second
> > > > > time, you have data corruption issues.
> > > > 
> > > > It is not racy. After disabling the C bit the cache clean operations write-back
> > > > any dirty cache line to the next cache level. And the CPU is still in coherency
> > > > mode so there is not a problem with that either.
> > > 
> > > No.  *THINK* about the exact example I gave you.  Think about what state
> > > the CPU sees between that "disable C bit" and the final cache clean (which
> > > you seem to be insisting is an atomic operation.)
> > > 
> > > Please, read what I'm saying rather than re-interpreting it, augmenting it
> > > and then answering something entirely different.
> > > 
> > > > > As I have said, given what you've mentioned, it is impossible to safely
> > > > > disable the cache on a SMP system.  In order to do it safely, you need to
> > > > > have a way to disable new allocations into the cache _without_ disabling
> > > > > the ability for the cache to be searched.
> > 
> > First off, my apologies, it was not meant to disrupt the discussion, if
> > I did sorry about that. Let's try to sum it up:
> > 
> > 1) Hitting in the cache when the SCTLR.C is cleared is CPU specific (eg A9
> >    does not, A15 does)
> > 2) as long as they are taking part in coherency (SMP bit set in ACTLR), all
> >    Cortex-A cores in a MP configuration with the SCTLR.C bit set can hit in
> >    the cache of a CPU that runs with the C bit cleared in SCTLR
> > 3) The sequence:
> >         - clean cache
> >         - clear SCTLR.C
> >         - clean cache
> > 
> > is correct
> 
> I continue to disagree with that assertion.
> 
> I repeat: what happens in this situation on A9:
> 
> 	- clean cache
> 	- cache speculatively prefetches data from another core
> 	- clear SCTLR.C
> 	- _this_ core accesses the address associated with that prefetched
> 	  data
> 
> _That_ is a data corruption issue - as soon as SCTLR.C is cleared, the CPUs
> view of data in memory _changes_, and is only restored to what it should
> be when the dirty cache lines are finally flushed out of the cache.  And
> then, hey presto, the data magically changes again.

What you are saying is correct, no doubts about that; I think though that
in this controlled code execution code path for power down, explicit data
access after clearing the C bit but before cleaning the cache must and
can be prevented.

What we should do as I described, is executing the sequence:

clear SCTRL.C
clean cache
exit coherency

in an uninterruptible way (it is always executed with IRQs disabled) and
with no explicit access to any data whatsoever.
If we code that in assembly (and lots of us already did that for v7, eg OMAP4)
in a controlled code path, I think we can call it safe, that's my opinion FWIW.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-15 17:05                     ` Lorenzo Pieralisi
@ 2012-09-19  8:55                       ` Antti P Miettinen
  2012-09-20  9:54                         ` Lorenzo Pieralisi
  0 siblings, 1 reply; 26+ messages in thread
From: Antti P Miettinen @ 2012-09-19  8:55 UTC (permalink / raw)
  To: linux-arm-kernel

Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
> What we should do as I described, is executing the sequence:
>
> clear SCTRL.C
> clean cache
> exit coherency

How does SCTRL.C affect TLB fetches? Especially on A9? Seems that page
table updates do clean_dcache_area() so probably not an issue but just
out of curiosity, are TLB fetches affected by the C bit on A9?

--
Antti P Miettinen
http://www.iki.fi/~ananaza/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-09-19  8:55                       ` Antti P Miettinen
@ 2012-09-20  9:54                         ` Lorenzo Pieralisi
  2012-09-20 21:17                           ` Antti P Miettinen
  0 siblings, 1 reply; 26+ messages in thread
From: Lorenzo Pieralisi @ 2012-09-20  9:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 19, 2012 at 09:55:52AM +0100, Antti P Miettinen wrote:
> Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
> > What we should do as I described, is executing the sequence:
> >
> > clear SCTRL.C
> > clean cache
> > exit coherency
> 
> How does SCTRL.C affect TLB fetches? Especially on A9? Seems that page
> table updates do clean_dcache_area() so probably not an issue but just
> out of curiosity, are TLB fetches affected by the C bit on A9?

Yes, they are. TLB fetches cannot search the D-cache if the C bit in
SCTLR is clear on A9. I do not see any issue with this though, at least
in the power down procedure described above and in previous e-mails in
this thread.

Lorenzo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-09-20  9:54                         ` Lorenzo Pieralisi
@ 2012-09-20 21:17                           ` Antti P Miettinen
  2012-09-23 21:32                             ` Antti P Miettinen
  0 siblings, 1 reply; 26+ messages in thread
From: Antti P Miettinen @ 2012-09-20 21:17 UTC (permalink / raw)
  To: linux-arm-kernel

Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
> On Wed, Sep 19, 2012 at 09:55:52AM +0100, Antti P Miettinen wrote:
>> Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
>> > What we should do as I described, is executing the sequence:
>> >
>> > clear SCTRL.C
>> > clean cache
>> > exit coherency
>> 
>> How does SCTRL.C affect TLB fetches? Especially on A9? Seems that page
>> table updates do clean_dcache_area() so probably not an issue but just
>> out of curiosity, are TLB fetches affected by the C bit on A9?
>
> Yes, they are. TLB fetches cannot search the D-cache if the C bit in
> SCTLR is clear on A9. I do not see any issue with this though, at least
> in the power down procedure described above and in previous e-mails in
> this thread.
>
> Lorenzo

Hmm.. is the condition for cache coherence protocol then different from
TLB lookups? If C is cleared, is the cache available for snoops by other
cores? What happens if another core needs a dirty line in a cache that
has C cleared?

--
Antti P Miettinen
http://www.iki.fi/~ananaza/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-09-20 21:17                           ` Antti P Miettinen
@ 2012-09-23 21:32                             ` Antti P Miettinen
  2013-02-22  9:04                               ` Antti P Miettinen
  0 siblings, 1 reply; 26+ messages in thread
From: Antti P Miettinen @ 2012-09-23 21:32 UTC (permalink / raw)
  To: linux-arm-kernel

Antti P Miettinen <ananaza@iki.fi> writes:
> Hmm.. is the condition for cache coherence protocol then different from
> TLB lookups? If C is cleared, is the cache available for snoops by other
> cores? What happens if another core needs a dirty line in a cache that
> has C cleared?

Sorry - looks like you already answered this:
> 2) as long as they are taking part in coherency (SMP bit set in ACTLR), all
>    Cortex-A cores in a MP configuration with the SCTLR.C bit set can hit in
>    the cache of a CPU that runs with the C bit cleared in SCTLR

So other cores apparently can search the cache that has C bit
cleared. The only clarification I still would need is whether this
searching applies to also TLB fetches by other cores. So when you say:
> .. TLB fetches cannot search the D-cache if the C bit in
> SCTLR is clear on A9. ..

you meant TLB fethes by the core that has it's C bit cleared. The TLB
fetches by other cores will still search the cache just like any other
coherence searches?

--
Antti P Miettinen
http://www.iki.fi/~ananaza/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-09-23 21:32                             ` Antti P Miettinen
@ 2013-02-22  9:04                               ` Antti P Miettinen
  2013-02-22  9:39                                 ` Lorenzo Pieralisi
  0 siblings, 1 reply; 26+ messages in thread
From: Antti P Miettinen @ 2013-02-22  9:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hello, coming back to an old thread:

Antti P Miettinen <ananaza@iki.fi> writes:
> Antti P Miettinen <ananaza@iki.fi> writes:
>> Hmm.. is the condition for cache coherence protocol then different from
>> TLB lookups? If C is cleared, is the cache available for snoops by other
>> cores? What happens if another core needs a dirty line in a cache that
>> has C cleared?
>
> Sorry - looks like you already answered this:
>> 2) as long as they are taking part in coherency (SMP bit set in ACTLR), all
>>    Cortex-A cores in a MP configuration with the SCTLR.C bit set can hit in
>>    the cache of a CPU that runs with the C bit cleared in SCTLR
>
> So other cores apparently can search the cache that has C bit
> cleared. The only clarification I still would need is whether this
> searching applies to also TLB fetches by other cores. So when you say:
>> .. TLB fetches cannot search the D-cache if the C bit in
>> SCTLR is clear on A9. ..
>
> you meant TLB fethes by the core that has it's C bit cleared. The TLB
> fetches by other cores will still search the cache just like any other
> coherence searches?

This did not get answered - are TLB fetches by sibling cores treated in
the same way as cache fetches? If core A has C bit cleared, is the cache
still available for TLB fetches by core B?

--
Antti P Miettinen
http://www.iki.fi/~ananaza/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2013-02-22  9:04                               ` Antti P Miettinen
@ 2013-02-22  9:39                                 ` Lorenzo Pieralisi
  2013-02-23 20:41                                   ` Antti P Miettinen
  0 siblings, 1 reply; 26+ messages in thread
From: Lorenzo Pieralisi @ 2013-02-22  9:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 22, 2013 at 09:04:04AM +0000, Antti P Miettinen wrote:
> Hello, coming back to an old thread:
> 
> Antti P Miettinen <ananaza@iki.fi> writes:
> > Antti P Miettinen <ananaza@iki.fi> writes:
> >> Hmm.. is the condition for cache coherence protocol then different from
> >> TLB lookups? If C is cleared, is the cache available for snoops by other
> >> cores? What happens if another core needs a dirty line in a cache that
> >> has C cleared?
> >
> > Sorry - looks like you already answered this:
> >> 2) as long as they are taking part in coherency (SMP bit set in ACTLR), all
> >>    Cortex-A cores in a MP configuration with the SCTLR.C bit set can hit in
> >>    the cache of a CPU that runs with the C bit cleared in SCTLR
> >
> > So other cores apparently can search the cache that has C bit
> > cleared. The only clarification I still would need is whether this
> > searching applies to also TLB fetches by other cores. So when you say:
> >> .. TLB fetches cannot search the D-cache if the C bit in
> >> SCTLR is clear on A9. ..
> >
> > you meant TLB fethes by the core that has it's C bit cleared. The TLB
> > fetches by other cores will still search the cache just like any other
> > coherence searches?
> 
> This did not get answered - are TLB fetches by sibling cores treated in
> the same way as cache fetches? If core A has C bit cleared, is the cache
> still available for TLB fetches by core B?

Yes, it is as long as the SMP bit is set in ACTLR.

Lorenzo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2013-02-22  9:39                                 ` Lorenzo Pieralisi
@ 2013-02-23 20:41                                   ` Antti P Miettinen
  2013-02-25 13:36                                     ` Lorenzo Pieralisi
  0 siblings, 1 reply; 26+ messages in thread
From: Antti P Miettinen @ 2013-02-23 20:41 UTC (permalink / raw)
  To: linux-arm-kernel

From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> On Fri, Feb 22, 2013 at 09:04:04AM +0000, Antti P Miettinen wrote:
>> This did not get answered - are TLB fetches by sibling cores treated in
>> the same way as cache fetches? If core A has C bit cleared, is the cache
>> still available for TLB fetches by core B?
> 
> Yes, it is as long as the SMP bit is set in ACTLR.
> 
> Lorenzo

Thanks Lorenzo. Do you know if there are any known errata that would
invalidate any of the assumptions disussed in this thread?

--
Antti P Miettinen
http://www.iki.fi/~ananaza/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2013-02-23 20:41                                   ` Antti P Miettinen
@ 2013-02-25 13:36                                     ` Lorenzo Pieralisi
  0 siblings, 0 replies; 26+ messages in thread
From: Lorenzo Pieralisi @ 2013-02-25 13:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Feb 23, 2013 at 08:41:17PM +0000, Antti P Miettinen wrote:
> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > On Fri, Feb 22, 2013 at 09:04:04AM +0000, Antti P Miettinen wrote:
> >> This did not get answered - are TLB fetches by sibling cores treated in
> >> the same way as cache fetches? If core A has C bit cleared, is the cache
> >> still available for TLB fetches by core B?
> > 
> > Yes, it is as long as the SMP bit is set in ACTLR.
> > 
> > Lorenzo
> 
> Thanks Lorenzo. Do you know if there are any known errata that would
> invalidate any of the assumptions disussed in this thread?

If you can provide me with a bit of context I am happy to help you chase
ths issue(s), since it looks like you are facing some.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-15 16:36                   ` Russell King - ARM Linux
  2012-05-15 17:05                     ` Lorenzo Pieralisi
@ 2012-05-15 18:17                     ` Will Deacon
  2012-05-17  5:01                       ` Murali N
  1 sibling, 1 reply; 26+ messages in thread
From: Will Deacon @ 2012-05-15 18:17 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

On Tue, May 15, 2012 at 05:36:18PM +0100, Russell King - ARM Linux wrote:
> I repeat: what happens in this situation on A9:
> 
> 	- clean cache
> 	- cache speculatively prefetches data from another core

If this prefetching occurs then either:

	(a) The line is clean (no problem)

	(b) Another core has written some data and we end up (speculatively)
	    loading dirty lines

Case (b) is only a problem if we actually commit to using the data later on.

> 	- clear SCTLR.C
> 	- _this_ core accesses the address associated with that prefetched
> 	  data

Yes. At this point it is cpu-specific whether or not we hit our dirty lines
from above. On A9, we will get the stale data from memory. However, this is
exactly the same situation we would find ourselves in if we tried to access
dirty data held in any cache with our SCTLR.C bit cleared. We're no longer
coherent at this stage, so need to avoid accessing shared data.

> _That_ is a data corruption issue - as soon as SCTLR.C is cleared, the CPUs
> view of data in memory _changes_, and is only restored to what it should
> be when the dirty cache lines are finally flushed out of the cache.  And
> then, hey presto, the data magically changes again.

Well we still can't see dirty data in any of the other L1 caches, so our view
of memory is going to be constantly out of date. The tricky bit is ensuring
that we don't rely on data being written by anybody else (and if we write
data ourself, we need to make sure it's suitably aligned so as not to get
clobbered by evictions from the other caches).

Will

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-15 18:17                     ` Will Deacon
@ 2012-05-17  5:01                       ` Murali N
  2012-05-17  7:30                         ` Shilimkar, Santosh
  0 siblings, 1 reply; 26+ messages in thread
From: Murali N @ 2012-05-17  5:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 15, 2012 at 11:47 PM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Russell,
>
> On Tue, May 15, 2012 at 05:36:18PM +0100, Russell King - ARM Linux wrote:
>> I repeat: what happens in this situation on A9:
>>
>> ? ? ? - clean cache
>> ? ? ? - cache speculatively prefetches data from another core
>
> If this prefetching occurs then either:
>
> ? ? ? ?(a) The line is clean (no problem)
>
> ? ? ? ?(b) Another core has written some data and we end up (speculatively)
> ? ? ? ? ? ?loading dirty lines
>
> Case (b) is only a problem if we actually commit to using the data later on.
>
>> ? ? ? - clear SCTLR.C
>> ? ? ? - _this_ core accesses the address associated with that prefetched
>> ? ? ? ? data
>
> Yes. At this point it is cpu-specific whether or not we hit our dirty lines
> from above. On A9, we will get the stale data from memory. However, this is
> exactly the same situation we would find ourselves in if we tried to access
> dirty data held in any cache with our SCTLR.C bit cleared. We're no longer
> coherent at this stage, so need to avoid accessing shared data.
>
>> _That_ is a data corruption issue - as soon as SCTLR.C is cleared, the CPUs
>> view of data in memory _changes_, and is only restored to what it should
>> be when the dirty cache lines are finally flushed out of the cache. ?And
>> then, hey presto, the data magically changes again.
>
> Well we still can't see dirty data in any of the other L1 caches, so our view
> of memory is going to be constantly out of date. The tricky bit is ensuring
> that we don't rely on data being written by anybody else (and if we write
> data ourself, we need to make sure it's suitably aligned so as not to get
> clobbered by evictions from the other caches).
>
> Will

how about following the below sequence still cause any possible problems?

1. L1 clean & invalidate
2. L2 clean & invalidate
3. Disable L2
4. L1 clean & invalidate
5. Disable "C" bit
6. WFI

-- 
Regards,
Murali N

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-17  5:01                       ` Murali N
@ 2012-05-17  7:30                         ` Shilimkar, Santosh
  0 siblings, 0 replies; 26+ messages in thread
From: Shilimkar, Santosh @ 2012-05-17  7:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 17, 2012 at 10:31 AM, Murali N <nalajala.murali@gmail.com> wrote:
> On Tue, May 15, 2012 at 11:47 PM, Will Deacon <will.deacon@arm.com> wrote:
>> Hi Russell,
>>
>> On Tue, May 15, 2012 at 05:36:18PM +0100, Russell King - ARM Linux wrote:
>>> I repeat: what happens in this situation on A9:
>>>
>>> ? ? ? - clean cache
>>> ? ? ? - cache speculatively prefetches data from another core
>>
>> If this prefetching occurs then either:
>>
>> ? ? ? ?(a) The line is clean (no problem)
>>
>> ? ? ? ?(b) Another core has written some data and we end up (speculatively)
>> ? ? ? ? ? ?loading dirty lines
>>
>> Case (b) is only a problem if we actually commit to using the data later on.
>>
>>> ? ? ? - clear SCTLR.C
>>> ? ? ? - _this_ core accesses the address associated with that prefetched
>>> ? ? ? ? data
>>
>> Yes. At this point it is cpu-specific whether or not we hit our dirty lines
>> from above. On A9, we will get the stale data from memory. However, this is
>> exactly the same situation we would find ourselves in if we tried to access
>> dirty data held in any cache with our SCTLR.C bit cleared. We're no longer
>> coherent at this stage, so need to avoid accessing shared data.
>>
>>> _That_ is a data corruption issue - as soon as SCTLR.C is cleared, the CPUs
>>> view of data in memory _changes_, and is only restored to what it should
>>> be when the dirty cache lines are finally flushed out of the cache. ?And
>>> then, hey presto, the data magically changes again.
>>
>> Well we still can't see dirty data in any of the other L1 caches, so our view
>> of memory is going to be constantly out of date. The tricky bit is ensuring
>> that we don't rely on data being written by anybody else (and if we write
>> data ourself, we need to make sure it's suitably aligned so as not to get
>> clobbered by evictions from the other caches).
>>
>> Will
>
> how about following the below sequence still cause any possible problems?
>
> 1. L1 clean & invalidate
> 2. L2 clean & invalidate
> 3. Disable L2
> 4. L1 clean & invalidate
> 5. Disable "C" bit
> 6. WFI
>
This is wrong if the code path is common for
CPU and CPU cluster power down. As Russell pointed
out the corner cases, the sequence I got working without any
issues so far on OMAP is like below ...

- L1 clean & invalidate
- Disable "C" bit
- ISB
- L1 clean & invalidate
- Disable SMP bit
- ISB
- Check for cluster state
if cluster == OFF
- L2 clean & invalidate
isb
dsb
WFI
NOP ( To avoid speculative aborts if any)
NOP
NOP
NOP

No. of NOPS depends on the pipeline depth.

Hope it helps

Regards
Santosh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2012-05-14 16:21     ` Lorenzo Pieralisi
  2012-05-14 16:39       ` Russell King - ARM Linux
@ 2013-12-24 17:52       ` Antti Miettinen
  2014-01-06 12:43         ` Lorenzo Pieralisi
  1 sibling, 1 reply; 26+ messages in thread
From: Antti Miettinen @ 2013-12-24 17:52 UTC (permalink / raw)
  To: linux-arm-kernel

Sorry to still bring up an old thread, but this still bothers me..

Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
> [..] dirty cache lines can be migrated across
> processors caches. [..]

What are the conditions under which this can happen? Which CPUs in
reality migrate dirty lines between caches? And C==0 does prevent
migrations as well as local allocations?

	--Antti

^ permalink raw reply	[flat|nested] 26+ messages in thread

* L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes
  2013-12-24 17:52       ` Antti Miettinen
@ 2014-01-06 12:43         ` Lorenzo Pieralisi
  0 siblings, 0 replies; 26+ messages in thread
From: Lorenzo Pieralisi @ 2014-01-06 12:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 24, 2013 at 05:52:48PM +0000, Antti Miettinen wrote:
> Sorry to still bring up an old thread, but this still bothers me..
> 
> Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
> > [..] dirty cache lines can be migrated across
> > processors caches. [..]
> 
> What are the conditions under which this can happen? Which CPUs in
> reality migrate dirty lines between caches? And C==0 does prevent
> migrations as well as local allocations?

It happens if a cache miss is for a cache line that is dirty on another CPU
that is part of the coherency domain, the line is just moved from one L1
to the local L1.

Yes, C bit cleared prevents allocations so it prevents migrations too.

Lorenzo

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-01-06 12:43 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-14  7:03 L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes Murali N
2012-05-14 15:50 ` Lorenzo Pieralisi
2012-05-14 15:58   ` Russell King - ARM Linux
2012-05-14 16:21     ` Lorenzo Pieralisi
2012-05-14 16:39       ` Russell King - ARM Linux
2012-05-14 17:15         ` Lorenzo Pieralisi
2012-05-15  9:25           ` Murali N
2012-05-15  9:40           ` Russell King - ARM Linux
2012-05-15 10:09             ` Lorenzo Pieralisi
2012-05-15 10:15               ` Russell King - ARM Linux
2012-05-15 16:28                 ` Lorenzo Pieralisi
2012-05-15 16:36                   ` Russell King - ARM Linux
2012-05-15 17:05                     ` Lorenzo Pieralisi
2012-09-19  8:55                       ` Antti P Miettinen
2012-09-20  9:54                         ` Lorenzo Pieralisi
2012-09-20 21:17                           ` Antti P Miettinen
2012-09-23 21:32                             ` Antti P Miettinen
2013-02-22  9:04                               ` Antti P Miettinen
2013-02-22  9:39                                 ` Lorenzo Pieralisi
2013-02-23 20:41                                   ` Antti P Miettinen
2013-02-25 13:36                                     ` Lorenzo Pieralisi
2012-05-15 18:17                     ` Will Deacon
2012-05-17  5:01                       ` Murali N
2012-05-17  7:30                         ` Shilimkar, Santosh
2013-12-24 17:52       ` Antti Miettinen
2014-01-06 12:43         ` Lorenzo Pieralisi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).