* v7_flush_kern_cache_louis flushes up to L2? @ 2013-04-10 10:43 Bastian Hecht 2013-04-10 11:51 ` Jonathan Austin 0 siblings, 1 reply; 9+ messages in thread From: Bastian Hecht @ 2013-04-10 10:43 UTC (permalink / raw) To: linux-arm-kernel Hello, I've got a Cortex-A9 UP with a L2 and want to submit some PM code I've written. Just to make sure I've made no mistake, it would be very helpful if you can confirm a hypothesis I use in my code: v7_flush_kern_cache_louis: Flush the data cache up to Level of Unification Inner Shareable This flushes the data out up to the L2, right? The ARM docs say that the Point of Unification would be my L2. I'm a bit confused by the term "Level of Unification Inner Shareable" (that states that in an SMP system L1 coherency is guaranteed and all is flushed to the L2?). Thanks, Bastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* v7_flush_kern_cache_louis flushes up to L2? 2013-04-10 10:43 v7_flush_kern_cache_louis flushes up to L2? Bastian Hecht @ 2013-04-10 11:51 ` Jonathan Austin 2013-04-10 12:16 ` Bastian Hecht 0 siblings, 1 reply; 9+ messages in thread From: Jonathan Austin @ 2013-04-10 11:51 UTC (permalink / raw) To: linux-arm-kernel Hi Bastian, On 10/04/13 11:43, Bastian Hecht wrote: > Hello, > > I've got a Cortex-A9 UP with a L2 and want to submit some PM code I've To clarify, is this an MPCore with a single core, or a genuine UP? This can be established from the 'U' bit of the MPIDR. > written. Just to make sure I've made no mistake, it would be very > helpful if you can confirm a hypothesis I use in my code: > > v7_flush_kern_cache_louis: Flush the data cache up to Level of > Unification Inner Shareable > Depending on whether you're SMP or UP (bearing in mind that you can be SMP, but still only have one processor!) the IS is ignored in the v7_flush_dcache_louis operation: (from cache-v7.S) mrc p15, 1, r0, c0, c0, 1 @ read clidr, r0 = clidr ALT_SMP(ands r3, r0, #(7 << 21)) @ extract LoUIS from clidr ALT_UP(ands r3, r0, #(7 << 27)) @ extract LoUU from clidr ALT_SMP(mov r3, r3, lsr #20) @ r3 = LoUIS * 2 ALT_UP(mov r3, r3, lsr #26) @ r3 = LoUU * 2 ... flush levels based on value in r3 > This flushes the data out up to the L2, right? The ARM docs say that > the Point of Unification would be my L2. I'm a bit confused by the > term "Level of Unification Inner Shareable" (that states that in an > SMP system L1 coherency is guaranteed and all is flushed to the L2?). > As you say, for the A9 (from the TRM) the CLIDR reports LoUIS is the same as LoUU and both specify L2. Does that make things clearer, or are you still unsure about something? Jonny ^ permalink raw reply [flat|nested] 9+ messages in thread
* v7_flush_kern_cache_louis flushes up to L2? 2013-04-10 11:51 ` Jonathan Austin @ 2013-04-10 12:16 ` Bastian Hecht 2013-04-10 13:35 ` Jonathan Austin 2013-04-10 13:51 ` Lorenzo Pieralisi 0 siblings, 2 replies; 9+ messages in thread From: Bastian Hecht @ 2013-04-10 12:16 UTC (permalink / raw) To: linux-arm-kernel Hi Jonny! 2013/4/10 Jonathan Austin <jonathan.austin@arm.com>: > Hi Bastian, > > > On 10/04/13 11:43, Bastian Hecht wrote: >> >> Hello, >> >> I've got a Cortex-A9 UP with a L2 and want to submit some PM code I've > > > To clarify, is this an MPCore with a single core, or a genuine UP? This can > be established from the 'U' bit of the MPIDR. > I didn't actually read out the U bit, but I'm sure I've got no SCU, so I bet high that it's a genuine UP system. >> written. Just to make sure I've made no mistake, it would be very >> helpful if you can confirm a hypothesis I use in my code: >> >> v7_flush_kern_cache_louis: Flush the data cache up to Level of >> Unification Inner Shareable >> > > Depending on whether you're SMP or UP (bearing in mind that you can be SMP, > but still only have one processor!) the IS is ignored in the > v7_flush_dcache_louis operation: > (from cache-v7.S) > > mrc p15, 1, r0, c0, c0, 1 @ read clidr, r0 = clidr > ALT_SMP(ands r3, r0, #(7 << 21)) @ extract LoUIS from clidr > ALT_UP(ands r3, r0, #(7 << 27)) @ extract LoUU from clidr > ALT_SMP(mov r3, r3, lsr #20) @ r3 = LoUIS * 2 > ALT_UP(mov r3, r3, lsr #26) @ r3 = LoUU * 2 > ... > flush levels based on value in r3 > > >> This flushes the data out up to the L2, right? The ARM docs say that >> the Point of Unification would be my L2. I'm a bit confused by the >> term "Level of Unification Inner Shareable" (that states that in an >> SMP system L1 coherency is guaranteed and all is flushed to the L2?). >> > > As you say, for the A9 (from the TRM) the CLIDR reports LoUIS is the same as > LoUU and both specify L2. Ok, this is the golden info I was looking for. So after cpu_suspend() I am good with the following sequence? flush L2 (outer_flush_all) disable L2 (outer_disable) Clear the SCTLR.C bit and issue an "isb" flush L1 (v7_flush_dcache_all) cpu_do_idle and for resume: invalidate L1 (trust cpu_resume to resume the L1 and enable the SCTLR.C bit) resume L2 (outer_resume) > Does that make things clearer, or are you still unsure about something? If you could confirm the above sequence, I'm perfectly fine. Thanks for the quick and exhaustive support. Cheers, Bastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* v7_flush_kern_cache_louis flushes up to L2? 2013-04-10 12:16 ` Bastian Hecht @ 2013-04-10 13:35 ` Jonathan Austin 2013-04-10 14:49 ` Bastian Hecht 2013-04-19 10:21 ` Russell King - ARM Linux 2013-04-10 13:51 ` Lorenzo Pieralisi 1 sibling, 2 replies; 9+ messages in thread From: Jonathan Austin @ 2013-04-10 13:35 UTC (permalink / raw) To: linux-arm-kernel Hi again Bastian, On 10/04/13 13:16, Bastian Hecht wrote: [...] >>> This flushes the data out up to the L2, right? The ARM docs say that >>> the Point of Unification would be my L2. I'm a bit confused by the >>> term "Level of Unification Inner Shareable" (that states that in an >>> SMP system L1 coherency is guaranteed and all is flushed to the L2?). >>> >> >> As you say, for the A9 (from the TRM) the CLIDR reports LoUIS is the same as >> LoUU and both specify L2. > > Ok, this is the golden info I was looking for. So after cpu_suspend() > I am good with the following sequence? > flush L2 (outer_flush_all) > disable L2 (outer_disable) > Clear the SCTLR.C bit and issue an "isb" > flush L1 (v7_flush_dcache_all) This looks good but I think there's a minor modification to be made to the sequence here: After you turn of the caches with the SCTLR.C bit, you can no longer hit in the L1 cache: SCTLR.C = 0: "The Cortex-A9 L1 Data Cache is not enabled. All memory accesses to Normal Memory Cacheable regions are treated as Normal Memory Non-Cacheable, without lookup and without allocation in the L1 Data Cache." If you had dirty data in the L1 cache you will lose it at that point. So, you should flush L1 before turning off caches. Note that on SMP, you would also need to clean/flush again after turning off the caches, as read-speculation could have sucked dirty data from another cache in to your cache, which would need to be written back before sleeping. > cpu_do_idle > > and for resume: > invalidate L1 How do you plan to do this? I notice that there's an increasing tendency to call v7_invalidate_l1 which isn't actually part of the cacheflush API (see arch/arm/include/asm/cacheflush.h arch/arm/include/asm/glue-cache.h...) Does your cache definitely come up in an undefined state? If not, could you use v7_flush_dcache_all? (see the comment above v7_invalidate_l1 in arch/arm/mm/cache-v7.S) Hope that helps, Jonny ^ permalink raw reply [flat|nested] 9+ messages in thread
* v7_flush_kern_cache_louis flushes up to L2? 2013-04-10 13:35 ` Jonathan Austin @ 2013-04-10 14:49 ` Bastian Hecht 2013-04-19 10:21 ` Russell King - ARM Linux 1 sibling, 0 replies; 9+ messages in thread From: Bastian Hecht @ 2013-04-10 14:49 UTC (permalink / raw) To: linux-arm-kernel Heyho! 2013/4/10 Jonathan Austin <jonathan.austin@arm.com>: > Hi again Bastian, > > On 10/04/13 13:16, Bastian Hecht wrote: > [...] > >>>> This flushes the data out up to the L2, right? The ARM docs say that >>>> the Point of Unification would be my L2. I'm a bit confused by the >>>> term "Level of Unification Inner Shareable" (that states that in an >>>> SMP system L1 coherency is guaranteed and all is flushed to the L2?). >>>> >>> >>> As you say, for the A9 (from the TRM) the CLIDR reports LoUIS is the same >>> as >>> LoUU and both specify L2. >> >> >> Ok, this is the golden info I was looking for. So after cpu_suspend() >> I am good with the following sequence? >> flush L2 (outer_flush_all) >> disable L2 (outer_disable) >> Clear the SCTLR.C bit and issue an "isb" >> flush L1 (v7_flush_dcache_all) > > > This looks good but I think there's a minor modification to be made to the > sequence here: > > After you turn of the caches with the SCTLR.C bit, you can no longer hit in > the L1 cache: > SCTLR.C = 0: > "The Cortex-A9 L1 Data Cache is not enabled. All memory accesses to Normal > Memory Cacheable regions are treated as Normal Memory Non-Cacheable, without > lookup and without allocation in the L1 Data Cache." > > If you had dirty data in the L1 cache you will lose it at that point. > > So, you should flush L1 before turning off caches. Hmm, I wondered if I need that as cpu_suspend flushes all out to the L2 as we discussed. So if I don't write anything to RAM after cpu_suspend why do I need the extra flush before turning off L1-caching? > Note that on SMP, you would also need to clean/flush again after turning off > the caches, as read-speculation could have sucked dirty data from another > cache in to your cache, which would need to be written back before sleeping. > Aaaaaah, thanks! I always wondered why this is needed, but that makes sense. >> cpu_do_idle >> >> and for resume: >> invalidate L1 > > > How do you plan to do this? I notice that there's an increasing tendency to > call v7_invalidate_l1 which isn't actually part of the cacheflush API (see > arch/arm/include/asm/cacheflush.h arch/arm/include/asm/glue-cache.h...) I used to have my gazillionth version of v7_invalidate_l1 in my assembly code but as it is in cache-v7.S now, I use this. > Does your cache definitely come up in an undefined state? If not, could you > use v7_flush_dcache_all? (see the comment above v7_invalidate_l1 in > arch/arm/mm/cache-v7.S) To be honest, I have no clue if my cache comes up in an undefined state. But the comment about random data getting flushed to RAM scared me so much that I blindly go with what other Cortex A9 users do. > Hope that helps, It does! Thanks, Bastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* v7_flush_kern_cache_louis flushes up to L2? 2013-04-10 13:35 ` Jonathan Austin 2013-04-10 14:49 ` Bastian Hecht @ 2013-04-19 10:21 ` Russell King - ARM Linux 2013-04-19 10:48 ` Russell King - ARM Linux 1 sibling, 1 reply; 9+ messages in thread From: Russell King - ARM Linux @ 2013-04-19 10:21 UTC (permalink / raw) To: linux-arm-kernel On Wed, Apr 10, 2013 at 02:35:05PM +0100, Jonathan Austin wrote: > Note that on SMP, you would also need to clean/flush again after turning > off the caches, as read-speculation could have sucked dirty data from > another cache in to your cache, which would need to be written back > before sleeping. Err, no. If you read data into a cache, it is always populated in a clean state. You never suck dirty data out of one cache, remove it there and place it immediately in a dirty state in an upper level cache. So, if we start from the idea that reads will populate the upper level cache with clean data, that cache is populated merely with a _copy_ of the data held elsewhere in the system. In that case, there is no need to clean or flush that cache after turning it off - if the cache only contains _copies_ of data elsewhere in the system, then it's perfectly fine for that data to be discarded in any way - be that by turning the power off to the cache or invalidating it. There is a step further in this: if the cache is being dirtied by the local CPU, and that data will only ever be used by the local CPU, again, in the CPU shutdown path, you really do not need to write that data out - because the only CPU which cares about it is the local CPU which is going to be shutdown. So, loss of that data is not a problem. In other words, shutdown of a CPU is a process which involves pushing out data which the rest of the system requires (that's done by an initial flush). Once that flush is done, provided the dying CPU does not _write_ to any shared data (this includes spinlocks and such like too), there is no need to flush the cache anymore. ^ permalink raw reply [flat|nested] 9+ messages in thread
* v7_flush_kern_cache_louis flushes up to L2? 2013-04-19 10:21 ` Russell King - ARM Linux @ 2013-04-19 10:48 ` Russell King - ARM Linux 0 siblings, 0 replies; 9+ messages in thread From: Russell King - ARM Linux @ 2013-04-19 10:48 UTC (permalink / raw) To: linux-arm-kernel On Fri, Apr 19, 2013 at 11:21:42AM +0100, Russell King - ARM Linux wrote: > On Wed, Apr 10, 2013 at 02:35:05PM +0100, Jonathan Austin wrote: > > Note that on SMP, you would also need to clean/flush again after turning > > off the caches, as read-speculation could have sucked dirty data from > > another cache in to your cache, which would need to be written back > > before sleeping. > > Err, no. If you read data into a cache, it is always populated in a clean > state. You never suck dirty data out of one cache, remove it there and > place it immediately in a dirty state in an upper level cache. > > So, if we start from the idea that reads will populate the upper level > cache with clean data, that cache is populated merely with a _copy_ of > the data held elsewhere in the system. Hmm, Will tells me that later ARM cores can migrate dirty cache lines up the cache tree, which means that we're potentially into deep problems with CPU hotplug and suspend paths, because it means there's potentially no way to ensure that we cleanly push data out the caches and shut the caches down before the CPU goes down. Combine this cache behaviour with the implementation specific details about SCTRL.C (whether caches are just prevented from being allocated, or whether it also affects the caches being searched) and you have a recipe for multiple chunks of complex platform specific and CPU specific code - especially when combined with things like removal of CPU power and synchronisation of that in the hotplug case. I do hope that ARM Ltd have thought about this, and have a way that we can shut CPU caches down sanely, but I'm fearing that they haven't really considered that to be something anyone really wants to do in a generic way. ^ permalink raw reply [flat|nested] 9+ messages in thread
* v7_flush_kern_cache_louis flushes up to L2? 2013-04-10 12:16 ` Bastian Hecht 2013-04-10 13:35 ` Jonathan Austin @ 2013-04-10 13:51 ` Lorenzo Pieralisi 2013-04-10 15:08 ` Bastian Hecht 1 sibling, 1 reply; 9+ messages in thread From: Lorenzo Pieralisi @ 2013-04-10 13:51 UTC (permalink / raw) To: linux-arm-kernel On Wed, Apr 10, 2013 at 01:16:03PM +0100, Bastian Hecht wrote: > Hi Jonny! > > 2013/4/10 Jonathan Austin <jonathan.austin@arm.com>: > > Hi Bastian, > > > > > > On 10/04/13 11:43, Bastian Hecht wrote: > >> > >> Hello, > >> > >> I've got a Cortex-A9 UP with a L2 and want to submit some PM code I've > > > > > > To clarify, is this an MPCore with a single core, or a genuine UP? This can > > be established from the 'U' bit of the MPIDR. > > > > I didn't actually read out the U bit, but I'm sure I've got no SCU, so > I bet high that it's a genuine UP system. > > >> written. Just to make sure I've made no mistake, it would be very > >> helpful if you can confirm a hypothesis I use in my code: > >> > >> v7_flush_kern_cache_louis: Flush the data cache up to Level of > >> Unification Inner Shareable > >> > > > > Depending on whether you're SMP or UP (bearing in mind that you can be SMP, > > but still only have one processor!) the IS is ignored in the > > v7_flush_dcache_louis operation: > > (from cache-v7.S) > > > > mrc p15, 1, r0, c0, c0, 1 @ read clidr, r0 = clidr > > ALT_SMP(ands r3, r0, #(7 << 21)) @ extract LoUIS from clidr > > ALT_UP(ands r3, r0, #(7 << 27)) @ extract LoUU from clidr > > ALT_SMP(mov r3, r3, lsr #20) @ r3 = LoUIS * 2 > > ALT_UP(mov r3, r3, lsr #26) @ r3 = LoUU * 2 > > ... > > flush levels based on value in r3 > > > > > >> This flushes the data out up to the L2, right? The ARM docs say that > >> the Point of Unification would be my L2. I'm a bit confused by the > >> term "Level of Unification Inner Shareable" (that states that in an > >> SMP system L1 coherency is guaranteed and all is flushed to the L2?). > >> > > > > As you say, for the A9 (from the TRM) the CLIDR reports LoUIS is the same as > > LoUU and both specify L2. > > Ok, this is the golden info I was looking for. So after cpu_suspend() > I am good with the following sequence? Is L2 RAM retained on power down ? > flush L2 (outer_flush_all) > disable L2 (outer_disable) Disabling L2 is not mandatory. And the code above (if L2 RAM is not retained) can be simply outer_disable the cleaning is done in the PL310 disable function properly. > Clear the SCTLR.C bit and issue an "isb" > flush L1 (v7_flush_dcache_all) Two steps above are ok, as long as flush L1 does not push data on the stack on entry and L1 clean routine does not need any dirty data present in L1. Clearing C bit on A9 stops the core from searching the cache so data writes should not be executed with C bit cleared and a data cache still containing dirty lines. For the same reason the cache cleaning routine should not require any data to execute since the data can be sitting in the cache that is not searched anymore when the C bit is cleared. The sequence: clear C bit bl v7_flush_dcache_all is better coded in assembly (in the cpu_suspend finisher, ie the function you pass as cpu_suspend 2 argument) to control what you are doing. The cache cleaning routine (v7_flush_dcache_all) does not require any data to execute, so running it with C bit cleared is fine. > cpu_do_idle > > and for resume: > invalidate L1 Use v7_invalidate_l1 here. > (trust cpu_resume to resume the L1 and enable the SCTLR.C bit) > resume L2 (outer_resume) Again, it depends on L2 behaviour on shutdown, if it is retained or not. If L2 RAM content is lost on power down the sequence above seems ok. Post the code, happy to have a look. Lorenzo ^ permalink raw reply [flat|nested] 9+ messages in thread
* v7_flush_kern_cache_louis flushes up to L2? 2013-04-10 13:51 ` Lorenzo Pieralisi @ 2013-04-10 15:08 ` Bastian Hecht 0 siblings, 0 replies; 9+ messages in thread From: Bastian Hecht @ 2013-04-10 15:08 UTC (permalink / raw) To: linux-arm-kernel Hi Lorenzo, 2013/4/10 Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>: > On Wed, Apr 10, 2013 at 01:16:03PM +0100, Bastian Hecht wrote: >> Hi Jonny! >> >> 2013/4/10 Jonathan Austin <jonathan.austin@arm.com>: >> > Hi Bastian, >> > >> > >> > On 10/04/13 11:43, Bastian Hecht wrote: >> >> >> >> Hello, >> >> >> >> I've got a Cortex-A9 UP with a L2 and want to submit some PM code I've >> > >> > >> > To clarify, is this an MPCore with a single core, or a genuine UP? This can >> > be established from the 'U' bit of the MPIDR. >> > >> >> I didn't actually read out the U bit, but I'm sure I've got no SCU, so >> I bet high that it's a genuine UP system. >> >> >> written. Just to make sure I've made no mistake, it would be very >> >> helpful if you can confirm a hypothesis I use in my code: >> >> >> >> v7_flush_kern_cache_louis: Flush the data cache up to Level of >> >> Unification Inner Shareable >> >> >> > >> > Depending on whether you're SMP or UP (bearing in mind that you can be SMP, >> > but still only have one processor!) the IS is ignored in the >> > v7_flush_dcache_louis operation: >> > (from cache-v7.S) >> > >> > mrc p15, 1, r0, c0, c0, 1 @ read clidr, r0 = clidr >> > ALT_SMP(ands r3, r0, #(7 << 21)) @ extract LoUIS from clidr >> > ALT_UP(ands r3, r0, #(7 << 27)) @ extract LoUU from clidr >> > ALT_SMP(mov r3, r3, lsr #20) @ r3 = LoUIS * 2 >> > ALT_UP(mov r3, r3, lsr #26) @ r3 = LoUU * 2 >> > ... >> > flush levels based on value in r3 >> > >> > >> >> This flushes the data out up to the L2, right? The ARM docs say that >> >> the Point of Unification would be my L2. I'm a bit confused by the >> >> term "Level of Unification Inner Shareable" (that states that in an >> >> SMP system L1 coherency is guaranteed and all is flushed to the L2?). >> >> >> > >> > As you say, for the A9 (from the TRM) the CLIDR reports LoUIS is the same as >> > LoUU and both specify L2. >> >> Ok, this is the golden info I was looking for. So after cpu_suspend() >> I am good with the following sequence? > > Is L2 RAM retained on power down ? I have two different versions of powering down the SoC. Currently I'm focusing on a shutdown mode that contains the powering off of the L2. >> flush L2 (outer_flush_all) >> disable L2 (outer_disable) > > Disabling L2 is not mandatory. And the code above (if L2 RAM is not > retained) can be simply > > outer_disable > > the cleaning is done in the PL310 disable function properly. The problem I see with this approach is: What advantage do we get at all if we have to flush the L2 (which is done in the PL310 disable routine)? Isn't this exactly the part we want to save? Not to have to flush the L2. >> Clear the SCTLR.C bit and issue an "isb" >> flush L1 (v7_flush_dcache_all) > > Two steps above are ok, as long as flush L1 does not push data on the > stack on entry and L1 clean routine does not need any dirty data present > in L1. Clearing C bit on A9 stops the core from searching the > cache so data writes should not be executed with C bit cleared and a > data cache still containing dirty lines. For the same reason the cache > cleaning routine should not require any data to execute since the data > can be sitting in the cache that is not searched anymore when the C bit > is cleared. Ah true! When I enter my assembly code there are of course stack modifications. And here comes a point I've just realized recently: Sometimes the WFI command doesn't enter the low power state mode I've requested. I haven't observed this when using Suspend-To-Ram but when using CPUIdle. I've seen code from the OMAP people that check for the case that WFI doesn't succeed. Probably I need to do the same. And for this I need the stack to be synced. > The sequence: > > clear C bit > bl v7_flush_dcache_all > > is better coded in assembly (in the cpu_suspend finisher, ie the > function you pass as cpu_suspend 2 argument) to control what you are doing. > > The cache cleaning routine (v7_flush_dcache_all) does not require any > data to execute, so running it with C bit cleared is fine. > >> cpu_do_idle >> >> and for resume: >> invalidate L1 > > Use v7_invalidate_l1 here. > >> (trust cpu_resume to resume the L1 and enable the SCTLR.C bit) >> resume L2 (outer_resume) > > Again, it depends on L2 behaviour on shutdown, if it is retained or not. > If L2 RAM content is lost on power down the sequence above seems ok. > > Post the code, happy to have a look. > Ok great, I'm quite blown away by the support here, should have contacted you earlier! Bastian ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-04-19 10:48 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-10 10:43 v7_flush_kern_cache_louis flushes up to L2? Bastian Hecht 2013-04-10 11:51 ` Jonathan Austin 2013-04-10 12:16 ` Bastian Hecht 2013-04-10 13:35 ` Jonathan Austin 2013-04-10 14:49 ` Bastian Hecht 2013-04-19 10:21 ` Russell King - ARM Linux 2013-04-19 10:48 ` Russell King - ARM Linux 2013-04-10 13:51 ` Lorenzo Pieralisi 2013-04-10 15:08 ` Bastian Hecht
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).