linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/5] ARM: augment cache flushing API
@ 2012-09-18 16:35 Lorenzo Pieralisi
  2012-09-18 16:35 ` [RFC PATCH v2 1/5] ARM: mm: implement LoUIS API for cache maintenance ops Lorenzo Pieralisi
                   ` (5 more replies)
  0 siblings, 6 replies; 39+ messages in thread
From: Lorenzo Pieralisi @ 2012-09-18 16:35 UTC (permalink / raw)
  To: linux-arm-kernel

This patch series provides an update of a previous posting:

http://www.spinics.net/lists/arm-kernel/msg194946.html

v2 updates:
 - Dropped v7 dcache level patch
 - Refactor the set to make it work on all processors with MULTI_CACHE
   and !MULTI_CACHE
 - Factor out label redefinition in v7_flush_dcache_all
 - Updated some comments

v7 ARM architecture introduced the concept of cache levels and relative
control registers to manage them. Cache operations that operate on set/way
require to define the cache level at which maintenance operations are carried
out by using coprocessor registers.

Processors like A7/A15 integrates a unified L2 that is part of the cache
level hierarchy; this implies that cache operations operating on all levels
also end up cleaning the L2 unified cache which is a very time consuming
operation and it is not needed for some power-down operations like single CPU
shutdown.

For v7, flush_kern_all() cleans all the cache levels up to the Level of
Coherency which includes L2 in it. This is suboptimal for code paths that end
up shutting-down a single processor like CPU hotplug and CPU idle, where only
per-CPU cache state (ie L1 integrated cache) has to be cleaned and invalidated.

To fix this performance issue this patchset introduces cache LoUIS (Level of
Unification Inner Shareable) maintenance operations in the kernel.

A new cache operations pointer is added to cpu_cache_fns

void (*flush_kern_cache_louis)(void);

that allows to clean and invalidate all data cache levels up to the LoUIS and
invalidate the instruction cache. This new API should provide a sufficiently
optimized API to be used in generic C code in the kernel for power management
operations on most v7 systems.

For architecture versions previous to v7, flush_kern_cache_louis() falls back
to flush_kern_all() leaving the current behaviour unchanged.

For A9/A5 processors Level of Unification Inner Shareable and Level of
Coherency are equivalent hence this patch should not affect current kernel
behaviour in any way when run on A9/A5 based systems, but should nonetheless
be thoroughly tested on them.

Tested on:
  - OMAP4 (S2R, cpuidle and hotplug)
  - OMAP5 (out of tree code) (S2R, cpuidle and hotplug)
  - TC2 big.LITTLE testchip (out of tree code) (cpuidle, on both A7 and A15
    clusters)

Lorenzo Pieralisi (4):
  ARM: mm: implement LoUIS API for cache maintenance ops
  ARM: mm: rename jump labels in v7_flush_dcache_all function
  ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  ARM: kernel: update __cpu_disable to use cache LoUIS maintenance API

Santosh Shilimkar (1):
  ARM: mm: update __v7_setup() to the new LoUIS cache maintenance API

 arch/arm/include/asm/cacheflush.h | 15 ++++++++++++
 arch/arm/include/asm/glue-cache.h |  1 +
 arch/arm/kernel/smp.c             |  5 +++-
 arch/arm/kernel/suspend.c         | 17 +++++++++++++-
 arch/arm/mm/cache-fa.S            |  3 +++
 arch/arm/mm/cache-v3.S            |  3 +++
 arch/arm/mm/cache-v4.S            |  3 +++
 arch/arm/mm/cache-v4wb.S          |  3 +++
 arch/arm/mm/cache-v4wt.S          |  3 +++
 arch/arm/mm/cache-v6.S            |  3 +++
 arch/arm/mm/cache-v7.S            | 48 ++++++++++++++++++++++++++++++++++-----
 arch/arm/mm/proc-arm1020.S        |  3 +++
 arch/arm/mm/proc-arm1020e.S       |  3 +++
 arch/arm/mm/proc-arm1022.S        |  3 +++
 arch/arm/mm/proc-arm1026.S        |  3 +++
 arch/arm/mm/proc-arm920.S         |  3 +++
 arch/arm/mm/proc-arm922.S         |  3 +++
 arch/arm/mm/proc-arm925.S         |  3 +++
 arch/arm/mm/proc-arm926.S         |  3 +++
 arch/arm/mm/proc-arm940.S         |  3 +++
 arch/arm/mm/proc-arm946.S         |  3 +++
 arch/arm/mm/proc-feroceon.S       |  3 +++
 arch/arm/mm/proc-macros.S         |  1 +
 arch/arm/mm/proc-mohawk.S         |  3 +++
 arch/arm/mm/proc-v7.S             |  2 +-
 arch/arm/mm/proc-xsc3.S           |  3 +++
 arch/arm/mm/proc-xscale.S         |  3 +++
 27 files changed, 140 insertions(+), 9 deletions(-)

-- 
1.7.12

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 1/5] ARM: mm: implement LoUIS API for cache maintenance ops
  2012-09-18 16:35 [RFC PATCH v2 0/5] ARM: augment cache flushing API Lorenzo Pieralisi
@ 2012-09-18 16:35 ` Lorenzo Pieralisi
  2012-09-18 18:12   ` Nicolas Pitre
  2012-09-18 16:35 ` [RFC PATCH v2 2/5] ARM: mm: rename jump labels in v7_flush_dcache_all function Lorenzo Pieralisi
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 39+ messages in thread
From: Lorenzo Pieralisi @ 2012-09-18 16:35 UTC (permalink / raw)
  To: linux-arm-kernel

ARM v7 architecture introduced the concept of cache levels and related
control registers. New processors like A7 and A15 embed an L2 unified cache
controller that becomes part of the cache level hierarchy. Some operations in
the kernel like cpu_suspend and __cpu_disable do not require a flush of the
entire cache hierarchy to DRAM but just the cache levels belonging to the
Level of Unification Inner Shareable (LoUIS), which in most of ARM v7 systems
correspond to L1.

The current cache flushing API used in cpu_suspend and __cpu_disable,
flush_cache_all(), ends up flushing the whole cache hierarchy since for
v7 it cleans and invalidates all cache levels up to Level of Coherency
(LoC) which cripples system performance when used in hot paths like hotplug
and cpuidle.

Therefore a new kernel cache maintenance API must be added to cope with
latest ARM system requirements.

This patch adds flush_cache_louis() to the ARM kernel cache maintenance API.

This function cleans and invalidates all data cache levels up to the
Level of Unification Inner Shareable (LoUIS) and invalidates the instruction
cache for processors that support it (> v7).

This patch also creates an alias of the cache LoUIS function to flush_kern_all
for all processor versions prior to v7, so that the current cache flushing
behaviour is unchanged for those processors.

v7 cache maintenance code implements a cache LoUIS function that cleans and
invalidates the D-cache up to LoUIS and invalidates the I-cache, according
to the new API.

Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 arch/arm/include/asm/cacheflush.h | 15 +++++++++++++++
 arch/arm/include/asm/glue-cache.h |  1 +
 arch/arm/mm/cache-fa.S            |  3 +++
 arch/arm/mm/cache-v3.S            |  3 +++
 arch/arm/mm/cache-v4.S            |  3 +++
 arch/arm/mm/cache-v4wb.S          |  3 +++
 arch/arm/mm/cache-v4wt.S          |  3 +++
 arch/arm/mm/cache-v6.S            |  3 +++
 arch/arm/mm/cache-v7.S            | 36 ++++++++++++++++++++++++++++++++++++
 arch/arm/mm/proc-arm1020.S        |  3 +++
 arch/arm/mm/proc-arm1020e.S       |  3 +++
 arch/arm/mm/proc-arm1022.S        |  3 +++
 arch/arm/mm/proc-arm1026.S        |  3 +++
 arch/arm/mm/proc-arm920.S         |  3 +++
 arch/arm/mm/proc-arm922.S         |  3 +++
 arch/arm/mm/proc-arm925.S         |  3 +++
 arch/arm/mm/proc-arm926.S         |  3 +++
 arch/arm/mm/proc-arm940.S         |  3 +++
 arch/arm/mm/proc-arm946.S         |  3 +++
 arch/arm/mm/proc-feroceon.S       |  3 +++
 arch/arm/mm/proc-macros.S         |  1 +
 arch/arm/mm/proc-mohawk.S         |  3 +++
 arch/arm/mm/proc-xsc3.S           |  3 +++
 arch/arm/mm/proc-xscale.S         |  3 +++
 24 files changed, 113 insertions(+)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index c6e2ed9..4e8217b 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -50,6 +50,13 @@
  *
  *		Unconditionally clean and invalidate the entire cache.
  *
+ *     flush_kern_louis()
+ *
+ *             Flush data cache levels up to the level of unification
+ *             inner shareable and invalidate the I-cache.
+ *             Only needed from v7 onwards, falls back to flush_cache_all()
+ *             for all other processor versions.
+ *
  *	flush_user_all()
  *
  *		Clean and invalidate all user space cache entries
@@ -98,6 +105,7 @@
 struct cpu_cache_fns {
 	void (*flush_icache_all)(void);
 	void (*flush_kern_all)(void);
+	void (*flush_kern_louis)(void);
 	void (*flush_user_all)(void);
 	void (*flush_user_range)(unsigned long, unsigned long, unsigned int);
 
@@ -120,6 +128,7 @@ extern struct cpu_cache_fns cpu_cache;
 
 #define __cpuc_flush_icache_all		cpu_cache.flush_icache_all
 #define __cpuc_flush_kern_all		cpu_cache.flush_kern_all
+#define __cpuc_flush_kern_louis		cpu_cache.flush_kern_louis
 #define __cpuc_flush_user_all		cpu_cache.flush_user_all
 #define __cpuc_flush_user_range		cpu_cache.flush_user_range
 #define __cpuc_coherent_kern_range	cpu_cache.coherent_kern_range
@@ -140,6 +149,7 @@ extern struct cpu_cache_fns cpu_cache;
 
 extern void __cpuc_flush_icache_all(void);
 extern void __cpuc_flush_kern_all(void);
+extern void __cpuc_flush_kern_louis(void);
 extern void __cpuc_flush_user_all(void);
 extern void __cpuc_flush_user_range(unsigned long, unsigned long, unsigned int);
 extern void __cpuc_coherent_kern_range(unsigned long, unsigned long);
@@ -205,6 +215,11 @@ static inline void __flush_icache_all(void)
 	__flush_icache_preferred();
 }
 
+/*
+ * Flush caches up to Level of Unification Inner Shareable
+ */
+#define flush_cache_louis()		__cpuc_flush_kern_louis()
+
 #define flush_cache_all()		__cpuc_flush_kern_all()
 
 static inline void vivt_flush_cache_mm(struct mm_struct *mm)
diff --git a/arch/arm/include/asm/glue-cache.h b/arch/arm/include/asm/glue-cache.h
index 7e30874..2d6a7de 100644
--- a/arch/arm/include/asm/glue-cache.h
+++ b/arch/arm/include/asm/glue-cache.h
@@ -132,6 +132,7 @@
 #ifndef MULTI_CACHE
 #define __cpuc_flush_icache_all		__glue(_CACHE,_flush_icache_all)
 #define __cpuc_flush_kern_all		__glue(_CACHE,_flush_kern_cache_all)
+#define __cpuc_flush_kern_louis		__glue(_CACHE,_flush_kern_cache_louis)
 #define __cpuc_flush_user_all		__glue(_CACHE,_flush_user_cache_all)
 #define __cpuc_flush_user_range		__glue(_CACHE,_flush_user_cache_range)
 #define __cpuc_coherent_kern_range	__glue(_CACHE,_coherent_kern_range)
diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S
index 0720163..e505bef 100644
--- a/arch/arm/mm/cache-fa.S
+++ b/arch/arm/mm/cache-fa.S
@@ -240,6 +240,9 @@ ENTRY(fa_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(fa_dma_unmap_area)
 
+	.globl	fa_flush_kern_cache_louis
+	.equ	fa_flush_kern_cache_louis, fa_flush_kern_cache_all
+
 	__INITDATA
 
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
diff --git a/arch/arm/mm/cache-v3.S b/arch/arm/mm/cache-v3.S
index 52e35f3..8a3fade 100644
--- a/arch/arm/mm/cache-v3.S
+++ b/arch/arm/mm/cache-v3.S
@@ -128,6 +128,9 @@ ENTRY(v3_dma_map_area)
 ENDPROC(v3_dma_unmap_area)
 ENDPROC(v3_dma_map_area)
 
+	.globl	v3_flush_kern_cache_louis
+	.equ	v3_flush_kern_cache_louis, v3_flush_kern_cache_all
+
 	__INITDATA
 
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S
index 022135d..43e5d77 100644
--- a/arch/arm/mm/cache-v4.S
+++ b/arch/arm/mm/cache-v4.S
@@ -140,6 +140,9 @@ ENTRY(v4_dma_map_area)
 ENDPROC(v4_dma_unmap_area)
 ENDPROC(v4_dma_map_area)
 
+	.globl	v4_flush_kern_cache_louis
+	.equ	v4_flush_kern_cache_louis, v4_flush_kern_cache_all
+
 	__INITDATA
 
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
diff --git a/arch/arm/mm/cache-v4wb.S b/arch/arm/mm/cache-v4wb.S
index 8f1eeae..cd49453 100644
--- a/arch/arm/mm/cache-v4wb.S
+++ b/arch/arm/mm/cache-v4wb.S
@@ -251,6 +251,9 @@ ENTRY(v4wb_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(v4wb_dma_unmap_area)
 
+	.globl	v4wb_flush_kern_cache_louis
+	.equ	v4wb_flush_kern_cache_louis, v4wb_flush_kern_cache_all
+
 	__INITDATA
 
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S
index b34a5f9..11e5e58 100644
--- a/arch/arm/mm/cache-v4wt.S
+++ b/arch/arm/mm/cache-v4wt.S
@@ -196,6 +196,9 @@ ENTRY(v4wt_dma_map_area)
 ENDPROC(v4wt_dma_unmap_area)
 ENDPROC(v4wt_dma_map_area)
 
+	.globl	v4wt_flush_kern_cache_louis
+	.equ	v4wt_flush_kern_cache_louis, v4wt_flush_kern_cache_all
+
 	__INITDATA
 
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index f4e6027..7a3d3d8 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -343,6 +343,9 @@ ENTRY(v6_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(v6_dma_unmap_area)
 
+	.globl	v6_flush_kern_cache_louis
+	.equ	v6_flush_kern_cache_louis, v6_flush_kern_cache_all
+
 	__INITDATA
 
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 39e3fb3..d1fa2f6 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -33,6 +33,24 @@ ENTRY(v7_flush_icache_all)
 	mov	pc, lr
 ENDPROC(v7_flush_icache_all)
 
+ /*
+ *     v7_flush_dcache_louis()
+ *
+ *     Flush the D-cache up to the Level of Unification Inner Shareable
+ *
+ *     Corrupted registers: r0-r7, r9-r11 (r6 only in Thumb mode)
+ */
+
+ENTRY(v7_flush_dcache_louis)
+	dmb					@ ensure ordering with previous memory accesses
+	mrc	p15, 1, r0, c0, c0, 1		@ read clidr, r0 = clidr
+	ands	r3, r0, #0xe00000		@ extract LoUIS from clidr
+	mov	r3, r3, lsr #20			@ r3 = LoUIS * 2
+	moveq	pc, lr				@ return if level == 0
+	mov	r10, #0				@ r10 (starting level) = 0
+	b	loop1				@ start flushing cache levels
+ENDPROC(v7_flush_dcache_louis)
+
 /*
  *	v7_flush_dcache_all()
  *
@@ -120,6 +138,24 @@ ENTRY(v7_flush_kern_cache_all)
 	mov	pc, lr
 ENDPROC(v7_flush_kern_cache_all)
 
+ /*
+ *     v7_flush_kern_cache_louis(void)
+ *
+ *     Flush the data cache up to Level of Unification Inner Shareable.
+ *     Invalidate the I-cache to the point of unification.
+ */
+ENTRY(v7_flush_kern_cache_louis)
+ ARM(	stmfd	sp!, {r4-r5, r7, r9-r11, lr}	)
+ THUMB(	stmfd	sp!, {r4-r7, r9-r11, lr}	)
+	bl	v7_flush_dcache_louis
+	mov	r0, #0
+	ALT_SMP(mcr	p15, 0, r0, c7, c1, 0)	@ invalidate I-cache inner shareable
+	ALT_UP(mcr	p15, 0, r0, c7, c5, 0)	@ I+BTB cache invalidate
+ ARM(	ldmfd	sp!, {r4-r5, r7, r9-r11, lr}	)
+ THUMB(	ldmfd	sp!, {r4-r7, r9-r11, lr}	)
+	mov	pc, lr
+ENDPROC(v7_flush_kern_cache_louis)
+
 /*
  *	v7_flush_cache_all()
  *
diff --git a/arch/arm/mm/proc-arm1020.S b/arch/arm/mm/proc-arm1020.S
index 0650bb8..2bb61e7 100644
--- a/arch/arm/mm/proc-arm1020.S
+++ b/arch/arm/mm/proc-arm1020.S
@@ -368,6 +368,9 @@ ENTRY(arm1020_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(arm1020_dma_unmap_area)
 
+	.globl	arm1020_flush_kern_cache_louis
+	.equ	arm1020_flush_kern_cache_louis, arm1020_flush_kern_cache_all
+
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
 	define_cache_functions arm1020
 
diff --git a/arch/arm/mm/proc-arm1020e.S b/arch/arm/mm/proc-arm1020e.S
index 4188478..8f96aa4 100644
--- a/arch/arm/mm/proc-arm1020e.S
+++ b/arch/arm/mm/proc-arm1020e.S
@@ -354,6 +354,9 @@ ENTRY(arm1020e_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(arm1020e_dma_unmap_area)
 
+	.globl	arm1020e_flush_kern_cache_louis
+	.equ	arm1020e_flush_kern_cache_louis, arm1020e_flush_kern_cache_all
+
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
 	define_cache_functions arm1020e
 
diff --git a/arch/arm/mm/proc-arm1022.S b/arch/arm/mm/proc-arm1022.S
index 33c6882..8ebe4a4 100644
--- a/arch/arm/mm/proc-arm1022.S
+++ b/arch/arm/mm/proc-arm1022.S
@@ -343,6 +343,9 @@ ENTRY(arm1022_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(arm1022_dma_unmap_area)
 
+	.globl	arm1022_flush_kern_cache_louis
+	.equ	arm1022_flush_kern_cache_louis, arm1022_flush_kern_cache_all
+
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
 	define_cache_functions arm1022
 
diff --git a/arch/arm/mm/proc-arm1026.S b/arch/arm/mm/proc-arm1026.S
index fbc1d5f..093fc7e 100644
--- a/arch/arm/mm/proc-arm1026.S
+++ b/arch/arm/mm/proc-arm1026.S
@@ -337,6 +337,9 @@ ENTRY(arm1026_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(arm1026_dma_unmap_area)
 
+	.globl	arm1026_flush_kern_cache_louis
+	.equ	arm1026_flush_kern_cache_louis, arm1026_flush_kern_cache_all
+
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
 	define_cache_functions arm1026
 
diff --git a/arch/arm/mm/proc-arm920.S b/arch/arm/mm/proc-arm920.S
index 1a8c138..2c3b942 100644
--- a/arch/arm/mm/proc-arm920.S
+++ b/arch/arm/mm/proc-arm920.S
@@ -319,6 +319,9 @@ ENTRY(arm920_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(arm920_dma_unmap_area)
 
+	.globl	arm920_flush_kern_cache_louis
+	.equ	arm920_flush_kern_cache_louis, arm920_flush_kern_cache_all
+
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
 	define_cache_functions arm920
 #endif
diff --git a/arch/arm/mm/proc-arm922.S b/arch/arm/mm/proc-arm922.S
index 4c44d7e..4464c49 100644
--- a/arch/arm/mm/proc-arm922.S
+++ b/arch/arm/mm/proc-arm922.S
@@ -321,6 +321,9 @@ ENTRY(arm922_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(arm922_dma_unmap_area)
 
+	.globl	arm922_flush_kern_cache_louis
+	.equ	arm922_flush_kern_cache_louis, arm922_flush_kern_cache_all
+
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
 	define_cache_functions arm922
 #endif
diff --git a/arch/arm/mm/proc-arm925.S b/arch/arm/mm/proc-arm925.S
index ec5b118..281eb9b 100644
--- a/arch/arm/mm/proc-arm925.S
+++ b/arch/arm/mm/proc-arm925.S
@@ -376,6 +376,9 @@ ENTRY(arm925_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(arm925_dma_unmap_area)
 
+	.globl	arm925_flush_kern_cache_louis
+	.equ	arm925_flush_kern_cache_louis, arm925_flush_kern_cache_all
+
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
 	define_cache_functions arm925
 
diff --git a/arch/arm/mm/proc-arm926.S b/arch/arm/mm/proc-arm926.S
index c31e62c..f1803f7 100644
--- a/arch/arm/mm/proc-arm926.S
+++ b/arch/arm/mm/proc-arm926.S
@@ -339,6 +339,9 @@ ENTRY(arm926_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(arm926_dma_unmap_area)
 
+	.globl	arm926_flush_kern_cache_louis
+	.equ	arm926_flush_kern_cache_louis, arm926_flush_kern_cache_all
+
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
 	define_cache_functions arm926
 
diff --git a/arch/arm/mm/proc-arm940.S b/arch/arm/mm/proc-arm940.S
index a613a7d..8da189d 100644
--- a/arch/arm/mm/proc-arm940.S
+++ b/arch/arm/mm/proc-arm940.S
@@ -267,6 +267,9 @@ ENTRY(arm940_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(arm940_dma_unmap_area)
 
+	.globl	arm940_flush_kern_cache_louis
+	.equ	arm940_flush_kern_cache_louis, arm940_flush_kern_cache_all
+
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
 	define_cache_functions arm940
 
diff --git a/arch/arm/mm/proc-arm946.S b/arch/arm/mm/proc-arm946.S
index 9f4f299..f666cf3 100644
--- a/arch/arm/mm/proc-arm946.S
+++ b/arch/arm/mm/proc-arm946.S
@@ -310,6 +310,9 @@ ENTRY(arm946_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(arm946_dma_unmap_area)
 
+	.globl	arm946_flush_kern_cache_louis
+	.equ	arm946_flush_kern_cache_louis, arm946_flush_kern_cache_all
+
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
 	define_cache_functions arm946
 
diff --git a/arch/arm/mm/proc-feroceon.S b/arch/arm/mm/proc-feroceon.S
index 23a8e4c..85e5e3b 100644
--- a/arch/arm/mm/proc-feroceon.S
+++ b/arch/arm/mm/proc-feroceon.S
@@ -415,6 +415,9 @@ ENTRY(feroceon_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(feroceon_dma_unmap_area)
 
+	.globl	feroceon_flush_kern_cache_louis
+	.equ	feroceon_flush_kern_cache_louis, feroceon_flush_kern_cache_all
+
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
 	define_cache_functions feroceon
 
diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S
index 2d8ff3a..b29a226 100644
--- a/arch/arm/mm/proc-macros.S
+++ b/arch/arm/mm/proc-macros.S
@@ -299,6 +299,7 @@ ENTRY(\name\()_processor_functions)
 ENTRY(\name\()_cache_fns)
 	.long	\name\()_flush_icache_all
 	.long	\name\()_flush_kern_cache_all
+	.long   \name\()_flush_kern_cache_louis
 	.long	\name\()_flush_user_cache_all
 	.long	\name\()_flush_user_cache_range
 	.long	\name\()_coherent_kern_range
diff --git a/arch/arm/mm/proc-mohawk.S b/arch/arm/mm/proc-mohawk.S
index fbb2124..82f9cdc 100644
--- a/arch/arm/mm/proc-mohawk.S
+++ b/arch/arm/mm/proc-mohawk.S
@@ -303,6 +303,9 @@ ENTRY(mohawk_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(mohawk_dma_unmap_area)
 
+	.globl	mohawk_flush_kern_cache_louis
+	.equ	mohawk_flush_kern_cache_louis, mohawk_flush_kern_cache_all
+
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
 	define_cache_functions mohawk
 
diff --git a/arch/arm/mm/proc-xsc3.S b/arch/arm/mm/proc-xsc3.S
index b0d5786..eb93d64 100644
--- a/arch/arm/mm/proc-xsc3.S
+++ b/arch/arm/mm/proc-xsc3.S
@@ -337,6 +337,9 @@ ENTRY(xsc3_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(xsc3_dma_unmap_area)
 
+	.globl	xsc3_flush_kern_cache_louis
+	.equ	xsc3_flush_kern_cache_louis, xsc3_flush_kern_cache_all
+
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
 	define_cache_functions xsc3
 
diff --git a/arch/arm/mm/proc-xscale.S b/arch/arm/mm/proc-xscale.S
index 4ffebaa..b5ea31d 100644
--- a/arch/arm/mm/proc-xscale.S
+++ b/arch/arm/mm/proc-xscale.S
@@ -410,6 +410,9 @@ ENTRY(xscale_dma_unmap_area)
 	mov	pc, lr
 ENDPROC(xscale_dma_unmap_area)
 
+	.globl	xscale_flush_kern_cache_louis
+	.equ	xscale_flush_kern_cache_louis, xscale_flush_kern_cache_all
+
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
 	define_cache_functions xscale
 
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 2/5] ARM: mm: rename jump labels in v7_flush_dcache_all function
  2012-09-18 16:35 [RFC PATCH v2 0/5] ARM: augment cache flushing API Lorenzo Pieralisi
  2012-09-18 16:35 ` [RFC PATCH v2 1/5] ARM: mm: implement LoUIS API for cache maintenance ops Lorenzo Pieralisi
@ 2012-09-18 16:35 ` Lorenzo Pieralisi
  2012-09-18 18:13   ` Nicolas Pitre
  2012-09-19 13:51   ` Dave Martin
  2012-09-18 16:35 ` [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations Lorenzo Pieralisi
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 39+ messages in thread
From: Lorenzo Pieralisi @ 2012-09-18 16:35 UTC (permalink / raw)
  To: linux-arm-kernel

This patch renames jump labels in v7_flush_dcache_all in order to define
a specific flush cache levels entry point.

TODO: factor out the level flushing loop if considered worthwhile and
      define the input registers requirements.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 arch/arm/mm/cache-v7.S | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index d1fa2f6..140b294 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -48,7 +48,7 @@ ENTRY(v7_flush_dcache_louis)
 	mov	r3, r3, lsr #20			@ r3 = LoUIS * 2
 	moveq	pc, lr				@ return if level == 0
 	mov	r10, #0				@ r10 (starting level) = 0
-	b	loop1				@ start flushing cache levels
+	b	flush_levels			@ start flushing cache levels
 ENDPROC(v7_flush_dcache_louis)
 
 /*
@@ -67,7 +67,7 @@ ENTRY(v7_flush_dcache_all)
 	mov	r3, r3, lsr #23			@ left align loc bit field
 	beq	finished			@ if loc is 0, then no need to clean
 	mov	r10, #0				@ start clean at cache level 0
-loop1:
+flush_levels:
 	add	r2, r10, r10, lsr #1		@ work out 3x current cache level
 	mov	r1, r0, lsr r2			@ extract cache type bits from clidr
 	and	r1, r1, #7			@ mask of the bits for current cache only
@@ -89,9 +89,9 @@ loop1:
 	clz	r5, r4				@ find bit position of way size increment
 	ldr	r7, =0x7fff
 	ands	r7, r7, r1, lsr #13		@ extract max number of the index size
-loop2:
+loop1:
 	mov	r9, r4				@ create working copy of max way size
-loop3:
+loop2:
  ARM(	orr	r11, r10, r9, lsl r5	)	@ factor way and cache number into r11
  THUMB(	lsl	r6, r9, r5		)
  THUMB(	orr	r11, r10, r6		)	@ factor way and cache number into r11
@@ -100,13 +100,13 @@ loop3:
  THUMB(	orr	r11, r11, r6		)	@ factor index number into r11
 	mcr	p15, 0, r11, c7, c14, 2		@ clean & invalidate by set/way
 	subs	r9, r9, #1			@ decrement the way
-	bge	loop3
-	subs	r7, r7, #1			@ decrement the index
 	bge	loop2
+	subs	r7, r7, #1			@ decrement the index
+	bge	loop1
 skip:
 	add	r10, r10, #2			@ increment cache number
 	cmp	r3, r10
-	bgt	loop1
+	bgt	flush_levels
 finished:
 	mov	r10, #0				@ swith back to cache level 0
 	mcr	p15, 2, r10, c0, c0, 0		@ select current cache level in cssr
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-09-18 16:35 [RFC PATCH v2 0/5] ARM: augment cache flushing API Lorenzo Pieralisi
  2012-09-18 16:35 ` [RFC PATCH v2 1/5] ARM: mm: implement LoUIS API for cache maintenance ops Lorenzo Pieralisi
  2012-09-18 16:35 ` [RFC PATCH v2 2/5] ARM: mm: rename jump labels in v7_flush_dcache_all function Lorenzo Pieralisi
@ 2012-09-18 16:35 ` Lorenzo Pieralisi
  2012-09-18 18:18   ` Nicolas Pitre
  2012-09-19 13:46   ` Dave Martin
  2012-09-18 16:35 ` [RFC PATCH v2 4/5] ARM: kernel: update __cpu_disable to use cache LoUIS maintenance API Lorenzo Pieralisi
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 39+ messages in thread
From: Lorenzo Pieralisi @ 2012-09-18 16:35 UTC (permalink / raw)
  To: linux-arm-kernel

In processors like A15/A7 L2 cache is unified and integrated within the
processor cache hierarchy, so that it is not considered an outer cache
anymore. For processors like A15/A7 flush_cache_all() ends up cleaning
all cache levels up to Level of Coherency (LoC) that includes
the L2 unified cache.

When a single CPU is suspended (CPU idle) a complete L2 clean is not
required, so generic cpu_suspend code must clean the data cache using the
newly introduced cache LoUIS function.

The context and stack pointer (context pointer) are cleaned to main memory
using cache area functions that operate on MVA and guarantee that the data
is written back to main memory (perform cache cleaning up to the Point of
Coherency - PoC) so that the processor can fetch the context when the MMU
is off in the cpu_resume code path.

outer_cache management remains unchanged.

Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 arch/arm/kernel/suspend.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kernel/suspend.c b/arch/arm/kernel/suspend.c
index 1794cc3..358bca3 100644
--- a/arch/arm/kernel/suspend.c
+++ b/arch/arm/kernel/suspend.c
@@ -17,6 +17,8 @@ extern void cpu_resume_mmu(void);
  */
 void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
 {
+	u32 *ctx = ptr;
+
 	*save_ptr = virt_to_phys(ptr);
 
 	/* This must correspond to the LDM in cpu_resume() assembly */
@@ -26,7 +28,20 @@ void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
 
 	cpu_do_suspend(ptr);
 
-	flush_cache_all();
+	flush_cache_louis();
+
+	/*
+	 * flush_cache_louis does not guarantee that
+	 * save_ptr and ptr are cleaned to main memory,
+	 * just up to the Level of Unification Inner Shareable.
+	 * Since the context pointer and context itself
+	 * are to be retrieved with the MMU off that
+	 * data must be cleaned from all cache levels
+	 * to main memory using "area" cache primitives.
+	*/
+	__cpuc_flush_dcache_area(ctx, ptrsz);
+	__cpuc_flush_dcache_area(save_ptr, sizeof(*save_ptr));
+
 	outer_clean_range(*save_ptr, *save_ptr + ptrsz);
 	outer_clean_range(virt_to_phys(save_ptr),
 			  virt_to_phys(save_ptr) + sizeof(*save_ptr));
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 4/5] ARM: kernel: update __cpu_disable to use cache LoUIS maintenance API
  2012-09-18 16:35 [RFC PATCH v2 0/5] ARM: augment cache flushing API Lorenzo Pieralisi
                   ` (2 preceding siblings ...)
  2012-09-18 16:35 ` [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations Lorenzo Pieralisi
@ 2012-09-18 16:35 ` Lorenzo Pieralisi
  2012-09-18 18:19   ` Nicolas Pitre
  2012-09-18 16:35 ` [RFC PATCH v2 5/5] ARM: mm: update __v7_setup() to the new LoUIS cache " Lorenzo Pieralisi
  2012-09-20 11:27 ` [RFC PATCH v2 0/5] ARM: augment cache flushing API Lorenzo Pieralisi
  5 siblings, 1 reply; 39+ messages in thread
From: Lorenzo Pieralisi @ 2012-09-18 16:35 UTC (permalink / raw)
  To: linux-arm-kernel

When a CPU is hotplugged out caches that reside in its power domain
lose their contents and so must be cleaned to the next memory level.

Currently, __cpu_disable calls flush_cache_all() that for new generation
processor like A15/A7 ends up cleaning and invalidating all cache levels
up to Level of Coherency, which includes the unified L2.

This ends up being a waste of cycles since the L2 cache contents are not
lost on power down.

This patch updates __cpu_disable to use the new LoUIS API cache operations.

Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 arch/arm/kernel/smp.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index d3eb222..f44e9cd 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -136,8 +136,11 @@ int __cpu_disable(void)
 	/*
 	 * Flush user cache and TLB mappings, and then remove this CPU
 	 * from the vm mask set of all processes.
+	 *
+	 * Caches are flushed to the Level of Unification Inner Shareable
+	 * to write-back dirty lines to unified caches shared by all CPUs.
 	 */
-	flush_cache_all();
+	flush_cache_louis();
 	local_flush_tlb_all();
 
 	clear_tasks_mm_cpumask(cpu);
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 5/5] ARM: mm: update __v7_setup() to the new LoUIS cache maintenance API
  2012-09-18 16:35 [RFC PATCH v2 0/5] ARM: augment cache flushing API Lorenzo Pieralisi
                   ` (3 preceding siblings ...)
  2012-09-18 16:35 ` [RFC PATCH v2 4/5] ARM: kernel: update __cpu_disable to use cache LoUIS maintenance API Lorenzo Pieralisi
@ 2012-09-18 16:35 ` Lorenzo Pieralisi
  2012-09-18 18:20   ` Nicolas Pitre
  2012-09-20 11:27 ` [RFC PATCH v2 0/5] ARM: augment cache flushing API Lorenzo Pieralisi
  5 siblings, 1 reply; 39+ messages in thread
From: Lorenzo Pieralisi @ 2012-09-18 16:35 UTC (permalink / raw)
  To: linux-arm-kernel

From: Santosh Shilimkar <santosh.shilimkar@ti.com>

The ARMv7 processor setup function __v7_setup() cleans and invalidates the
CPU cache before enabling MMU to start the CPU with a clean CPU local cache.

But on ARMv7 architectures like Cortex-[A15/A8], this code will end
up flushing the L2 caches(up to level of Coherency) which is undesirable
and expensive. The setup functions are used in the CPU hotplug scenario too
and hence flushing all cache levels should be avoided.

This patch replaces the cache flushing call with the newly introduced
v7 dcache LoUIS API where only cache levels up to LoUIS are cleaned and
invalidated when a processors executes __v7_setup which is the expected
behavior.

For processors like A9 and A5 where the L2 cache is an outer one the
behavior should be unchanged.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 arch/arm/mm/proc-v7.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index c2e2b66..846d279 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -172,7 +172,7 @@ __v7_ca15mp_setup:
 __v7_setup:
 	adr	r12, __v7_setup_stack		@ the local stack
 	stmia	r12, {r0-r5, r7, r9, r11, lr}
-	bl	v7_flush_dcache_all
+	bl      v7_flush_dcache_louis
 	ldmia	r12, {r0-r5, r7, r9, r11, lr}
 
 	mrc	p15, 0, r0, c0, c0, 0		@ read main ID register
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 1/5] ARM: mm: implement LoUIS API for cache maintenance ops
  2012-09-18 16:35 ` [RFC PATCH v2 1/5] ARM: mm: implement LoUIS API for cache maintenance ops Lorenzo Pieralisi
@ 2012-09-18 18:12   ` Nicolas Pitre
  2012-09-19 12:30     ` Lorenzo Pieralisi
  0 siblings, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2012-09-18 18:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 18 Sep 2012, Lorenzo Pieralisi wrote:

> ARM v7 architecture introduced the concept of cache levels and related
> control registers. New processors like A7 and A15 embed an L2 unified cache
> controller that becomes part of the cache level hierarchy. Some operations in
> the kernel like cpu_suspend and __cpu_disable do not require a flush of the
> entire cache hierarchy to DRAM but just the cache levels belonging to the
> Level of Unification Inner Shareable (LoUIS), which in most of ARM v7 systems
> correspond to L1.
> 
> The current cache flushing API used in cpu_suspend and __cpu_disable,
> flush_cache_all(), ends up flushing the whole cache hierarchy since for
> v7 it cleans and invalidates all cache levels up to Level of Coherency
> (LoC) which cripples system performance when used in hot paths like hotplug
> and cpuidle.
> 
> Therefore a new kernel cache maintenance API must be added to cope with
> latest ARM system requirements.
> 
> This patch adds flush_cache_louis() to the ARM kernel cache maintenance API.
> 
> This function cleans and invalidates all data cache levels up to the
> Level of Unification Inner Shareable (LoUIS) and invalidates the instruction
> cache for processors that support it (> v7).
> 
> This patch also creates an alias of the cache LoUIS function to flush_kern_all
> for all processor versions prior to v7, so that the current cache flushing
> behaviour is unchanged for those processors.
> 
> v7 cache maintenance code implements a cache LoUIS function that cleans and
> invalidates the D-cache up to LoUIS and invalidates the I-cache, according
> to the new API.
> 
> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

Reviewed-by: Nicolas Pitre <nico@linaro.org>


> ---
>  arch/arm/include/asm/cacheflush.h | 15 +++++++++++++++
>  arch/arm/include/asm/glue-cache.h |  1 +
>  arch/arm/mm/cache-fa.S            |  3 +++
>  arch/arm/mm/cache-v3.S            |  3 +++
>  arch/arm/mm/cache-v4.S            |  3 +++
>  arch/arm/mm/cache-v4wb.S          |  3 +++
>  arch/arm/mm/cache-v4wt.S          |  3 +++
>  arch/arm/mm/cache-v6.S            |  3 +++
>  arch/arm/mm/cache-v7.S            | 36 ++++++++++++++++++++++++++++++++++++
>  arch/arm/mm/proc-arm1020.S        |  3 +++
>  arch/arm/mm/proc-arm1020e.S       |  3 +++
>  arch/arm/mm/proc-arm1022.S        |  3 +++
>  arch/arm/mm/proc-arm1026.S        |  3 +++
>  arch/arm/mm/proc-arm920.S         |  3 +++
>  arch/arm/mm/proc-arm922.S         |  3 +++
>  arch/arm/mm/proc-arm925.S         |  3 +++
>  arch/arm/mm/proc-arm926.S         |  3 +++
>  arch/arm/mm/proc-arm940.S         |  3 +++
>  arch/arm/mm/proc-arm946.S         |  3 +++
>  arch/arm/mm/proc-feroceon.S       |  3 +++
>  arch/arm/mm/proc-macros.S         |  1 +
>  arch/arm/mm/proc-mohawk.S         |  3 +++
>  arch/arm/mm/proc-xsc3.S           |  3 +++
>  arch/arm/mm/proc-xscale.S         |  3 +++
>  24 files changed, 113 insertions(+)
> 
> diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> index c6e2ed9..4e8217b 100644
> --- a/arch/arm/include/asm/cacheflush.h
> +++ b/arch/arm/include/asm/cacheflush.h
> @@ -50,6 +50,13 @@
>   *
>   *		Unconditionally clean and invalidate the entire cache.
>   *
> + *     flush_kern_louis()
> + *
> + *             Flush data cache levels up to the level of unification
> + *             inner shareable and invalidate the I-cache.
> + *             Only needed from v7 onwards, falls back to flush_cache_all()
> + *             for all other processor versions.
> + *
>   *	flush_user_all()
>   *
>   *		Clean and invalidate all user space cache entries
> @@ -98,6 +105,7 @@
>  struct cpu_cache_fns {
>  	void (*flush_icache_all)(void);
>  	void (*flush_kern_all)(void);
> +	void (*flush_kern_louis)(void);
>  	void (*flush_user_all)(void);
>  	void (*flush_user_range)(unsigned long, unsigned long, unsigned int);
>  
> @@ -120,6 +128,7 @@ extern struct cpu_cache_fns cpu_cache;
>  
>  #define __cpuc_flush_icache_all		cpu_cache.flush_icache_all
>  #define __cpuc_flush_kern_all		cpu_cache.flush_kern_all
> +#define __cpuc_flush_kern_louis		cpu_cache.flush_kern_louis
>  #define __cpuc_flush_user_all		cpu_cache.flush_user_all
>  #define __cpuc_flush_user_range		cpu_cache.flush_user_range
>  #define __cpuc_coherent_kern_range	cpu_cache.coherent_kern_range
> @@ -140,6 +149,7 @@ extern struct cpu_cache_fns cpu_cache;
>  
>  extern void __cpuc_flush_icache_all(void);
>  extern void __cpuc_flush_kern_all(void);
> +extern void __cpuc_flush_kern_louis(void);
>  extern void __cpuc_flush_user_all(void);
>  extern void __cpuc_flush_user_range(unsigned long, unsigned long, unsigned int);
>  extern void __cpuc_coherent_kern_range(unsigned long, unsigned long);
> @@ -205,6 +215,11 @@ static inline void __flush_icache_all(void)
>  	__flush_icache_preferred();
>  }
>  
> +/*
> + * Flush caches up to Level of Unification Inner Shareable
> + */
> +#define flush_cache_louis()		__cpuc_flush_kern_louis()
> +
>  #define flush_cache_all()		__cpuc_flush_kern_all()
>  
>  static inline void vivt_flush_cache_mm(struct mm_struct *mm)
> diff --git a/arch/arm/include/asm/glue-cache.h b/arch/arm/include/asm/glue-cache.h
> index 7e30874..2d6a7de 100644
> --- a/arch/arm/include/asm/glue-cache.h
> +++ b/arch/arm/include/asm/glue-cache.h
> @@ -132,6 +132,7 @@
>  #ifndef MULTI_CACHE
>  #define __cpuc_flush_icache_all		__glue(_CACHE,_flush_icache_all)
>  #define __cpuc_flush_kern_all		__glue(_CACHE,_flush_kern_cache_all)
> +#define __cpuc_flush_kern_louis		__glue(_CACHE,_flush_kern_cache_louis)
>  #define __cpuc_flush_user_all		__glue(_CACHE,_flush_user_cache_all)
>  #define __cpuc_flush_user_range		__glue(_CACHE,_flush_user_cache_range)
>  #define __cpuc_coherent_kern_range	__glue(_CACHE,_coherent_kern_range)
> diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S
> index 0720163..e505bef 100644
> --- a/arch/arm/mm/cache-fa.S
> +++ b/arch/arm/mm/cache-fa.S
> @@ -240,6 +240,9 @@ ENTRY(fa_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(fa_dma_unmap_area)
>  
> +	.globl	fa_flush_kern_cache_louis
> +	.equ	fa_flush_kern_cache_louis, fa_flush_kern_cache_all
> +
>  	__INITDATA
>  
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
> diff --git a/arch/arm/mm/cache-v3.S b/arch/arm/mm/cache-v3.S
> index 52e35f3..8a3fade 100644
> --- a/arch/arm/mm/cache-v3.S
> +++ b/arch/arm/mm/cache-v3.S
> @@ -128,6 +128,9 @@ ENTRY(v3_dma_map_area)
>  ENDPROC(v3_dma_unmap_area)
>  ENDPROC(v3_dma_map_area)
>  
> +	.globl	v3_flush_kern_cache_louis
> +	.equ	v3_flush_kern_cache_louis, v3_flush_kern_cache_all
> +
>  	__INITDATA
>  
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
> diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S
> index 022135d..43e5d77 100644
> --- a/arch/arm/mm/cache-v4.S
> +++ b/arch/arm/mm/cache-v4.S
> @@ -140,6 +140,9 @@ ENTRY(v4_dma_map_area)
>  ENDPROC(v4_dma_unmap_area)
>  ENDPROC(v4_dma_map_area)
>  
> +	.globl	v4_flush_kern_cache_louis
> +	.equ	v4_flush_kern_cache_louis, v4_flush_kern_cache_all
> +
>  	__INITDATA
>  
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
> diff --git a/arch/arm/mm/cache-v4wb.S b/arch/arm/mm/cache-v4wb.S
> index 8f1eeae..cd49453 100644
> --- a/arch/arm/mm/cache-v4wb.S
> +++ b/arch/arm/mm/cache-v4wb.S
> @@ -251,6 +251,9 @@ ENTRY(v4wb_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(v4wb_dma_unmap_area)
>  
> +	.globl	v4wb_flush_kern_cache_louis
> +	.equ	v4wb_flush_kern_cache_louis, v4wb_flush_kern_cache_all
> +
>  	__INITDATA
>  
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
> diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S
> index b34a5f9..11e5e58 100644
> --- a/arch/arm/mm/cache-v4wt.S
> +++ b/arch/arm/mm/cache-v4wt.S
> @@ -196,6 +196,9 @@ ENTRY(v4wt_dma_map_area)
>  ENDPROC(v4wt_dma_unmap_area)
>  ENDPROC(v4wt_dma_map_area)
>  
> +	.globl	v4wt_flush_kern_cache_louis
> +	.equ	v4wt_flush_kern_cache_louis, v4wt_flush_kern_cache_all
> +
>  	__INITDATA
>  
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
> diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
> index f4e6027..7a3d3d8 100644
> --- a/arch/arm/mm/cache-v6.S
> +++ b/arch/arm/mm/cache-v6.S
> @@ -343,6 +343,9 @@ ENTRY(v6_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(v6_dma_unmap_area)
>  
> +	.globl	v6_flush_kern_cache_louis
> +	.equ	v6_flush_kern_cache_louis, v6_flush_kern_cache_all
> +
>  	__INITDATA
>  
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
> diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> index 39e3fb3..d1fa2f6 100644
> --- a/arch/arm/mm/cache-v7.S
> +++ b/arch/arm/mm/cache-v7.S
> @@ -33,6 +33,24 @@ ENTRY(v7_flush_icache_all)
>  	mov	pc, lr
>  ENDPROC(v7_flush_icache_all)
>  
> + /*
> + *     v7_flush_dcache_louis()
> + *
> + *     Flush the D-cache up to the Level of Unification Inner Shareable
> + *
> + *     Corrupted registers: r0-r7, r9-r11 (r6 only in Thumb mode)
> + */
> +
> +ENTRY(v7_flush_dcache_louis)
> +	dmb					@ ensure ordering with previous memory accesses
> +	mrc	p15, 1, r0, c0, c0, 1		@ read clidr, r0 = clidr
> +	ands	r3, r0, #0xe00000		@ extract LoUIS from clidr
> +	mov	r3, r3, lsr #20			@ r3 = LoUIS * 2
> +	moveq	pc, lr				@ return if level == 0
> +	mov	r10, #0				@ r10 (starting level) = 0
> +	b	loop1				@ start flushing cache levels
> +ENDPROC(v7_flush_dcache_louis)
> +
>  /*
>   *	v7_flush_dcache_all()
>   *
> @@ -120,6 +138,24 @@ ENTRY(v7_flush_kern_cache_all)
>  	mov	pc, lr
>  ENDPROC(v7_flush_kern_cache_all)
>  
> + /*
> + *     v7_flush_kern_cache_louis(void)
> + *
> + *     Flush the data cache up to Level of Unification Inner Shareable.
> + *     Invalidate the I-cache to the point of unification.
> + */
> +ENTRY(v7_flush_kern_cache_louis)
> + ARM(	stmfd	sp!, {r4-r5, r7, r9-r11, lr}	)
> + THUMB(	stmfd	sp!, {r4-r7, r9-r11, lr}	)
> +	bl	v7_flush_dcache_louis
> +	mov	r0, #0
> +	ALT_SMP(mcr	p15, 0, r0, c7, c1, 0)	@ invalidate I-cache inner shareable
> +	ALT_UP(mcr	p15, 0, r0, c7, c5, 0)	@ I+BTB cache invalidate
> + ARM(	ldmfd	sp!, {r4-r5, r7, r9-r11, lr}	)
> + THUMB(	ldmfd	sp!, {r4-r7, r9-r11, lr}	)
> +	mov	pc, lr
> +ENDPROC(v7_flush_kern_cache_louis)
> +
>  /*
>   *	v7_flush_cache_all()
>   *
> diff --git a/arch/arm/mm/proc-arm1020.S b/arch/arm/mm/proc-arm1020.S
> index 0650bb8..2bb61e7 100644
> --- a/arch/arm/mm/proc-arm1020.S
> +++ b/arch/arm/mm/proc-arm1020.S
> @@ -368,6 +368,9 @@ ENTRY(arm1020_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(arm1020_dma_unmap_area)
>  
> +	.globl	arm1020_flush_kern_cache_louis
> +	.equ	arm1020_flush_kern_cache_louis, arm1020_flush_kern_cache_all
> +
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
>  	define_cache_functions arm1020
>  
> diff --git a/arch/arm/mm/proc-arm1020e.S b/arch/arm/mm/proc-arm1020e.S
> index 4188478..8f96aa4 100644
> --- a/arch/arm/mm/proc-arm1020e.S
> +++ b/arch/arm/mm/proc-arm1020e.S
> @@ -354,6 +354,9 @@ ENTRY(arm1020e_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(arm1020e_dma_unmap_area)
>  
> +	.globl	arm1020e_flush_kern_cache_louis
> +	.equ	arm1020e_flush_kern_cache_louis, arm1020e_flush_kern_cache_all
> +
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
>  	define_cache_functions arm1020e
>  
> diff --git a/arch/arm/mm/proc-arm1022.S b/arch/arm/mm/proc-arm1022.S
> index 33c6882..8ebe4a4 100644
> --- a/arch/arm/mm/proc-arm1022.S
> +++ b/arch/arm/mm/proc-arm1022.S
> @@ -343,6 +343,9 @@ ENTRY(arm1022_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(arm1022_dma_unmap_area)
>  
> +	.globl	arm1022_flush_kern_cache_louis
> +	.equ	arm1022_flush_kern_cache_louis, arm1022_flush_kern_cache_all
> +
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
>  	define_cache_functions arm1022
>  
> diff --git a/arch/arm/mm/proc-arm1026.S b/arch/arm/mm/proc-arm1026.S
> index fbc1d5f..093fc7e 100644
> --- a/arch/arm/mm/proc-arm1026.S
> +++ b/arch/arm/mm/proc-arm1026.S
> @@ -337,6 +337,9 @@ ENTRY(arm1026_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(arm1026_dma_unmap_area)
>  
> +	.globl	arm1026_flush_kern_cache_louis
> +	.equ	arm1026_flush_kern_cache_louis, arm1026_flush_kern_cache_all
> +
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
>  	define_cache_functions arm1026
>  
> diff --git a/arch/arm/mm/proc-arm920.S b/arch/arm/mm/proc-arm920.S
> index 1a8c138..2c3b942 100644
> --- a/arch/arm/mm/proc-arm920.S
> +++ b/arch/arm/mm/proc-arm920.S
> @@ -319,6 +319,9 @@ ENTRY(arm920_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(arm920_dma_unmap_area)
>  
> +	.globl	arm920_flush_kern_cache_louis
> +	.equ	arm920_flush_kern_cache_louis, arm920_flush_kern_cache_all
> +
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
>  	define_cache_functions arm920
>  #endif
> diff --git a/arch/arm/mm/proc-arm922.S b/arch/arm/mm/proc-arm922.S
> index 4c44d7e..4464c49 100644
> --- a/arch/arm/mm/proc-arm922.S
> +++ b/arch/arm/mm/proc-arm922.S
> @@ -321,6 +321,9 @@ ENTRY(arm922_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(arm922_dma_unmap_area)
>  
> +	.globl	arm922_flush_kern_cache_louis
> +	.equ	arm922_flush_kern_cache_louis, arm922_flush_kern_cache_all
> +
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
>  	define_cache_functions arm922
>  #endif
> diff --git a/arch/arm/mm/proc-arm925.S b/arch/arm/mm/proc-arm925.S
> index ec5b118..281eb9b 100644
> --- a/arch/arm/mm/proc-arm925.S
> +++ b/arch/arm/mm/proc-arm925.S
> @@ -376,6 +376,9 @@ ENTRY(arm925_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(arm925_dma_unmap_area)
>  
> +	.globl	arm925_flush_kern_cache_louis
> +	.equ	arm925_flush_kern_cache_louis, arm925_flush_kern_cache_all
> +
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
>  	define_cache_functions arm925
>  
> diff --git a/arch/arm/mm/proc-arm926.S b/arch/arm/mm/proc-arm926.S
> index c31e62c..f1803f7 100644
> --- a/arch/arm/mm/proc-arm926.S
> +++ b/arch/arm/mm/proc-arm926.S
> @@ -339,6 +339,9 @@ ENTRY(arm926_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(arm926_dma_unmap_area)
>  
> +	.globl	arm926_flush_kern_cache_louis
> +	.equ	arm926_flush_kern_cache_louis, arm926_flush_kern_cache_all
> +
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
>  	define_cache_functions arm926
>  
> diff --git a/arch/arm/mm/proc-arm940.S b/arch/arm/mm/proc-arm940.S
> index a613a7d..8da189d 100644
> --- a/arch/arm/mm/proc-arm940.S
> +++ b/arch/arm/mm/proc-arm940.S
> @@ -267,6 +267,9 @@ ENTRY(arm940_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(arm940_dma_unmap_area)
>  
> +	.globl	arm940_flush_kern_cache_louis
> +	.equ	arm940_flush_kern_cache_louis, arm940_flush_kern_cache_all
> +
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
>  	define_cache_functions arm940
>  
> diff --git a/arch/arm/mm/proc-arm946.S b/arch/arm/mm/proc-arm946.S
> index 9f4f299..f666cf3 100644
> --- a/arch/arm/mm/proc-arm946.S
> +++ b/arch/arm/mm/proc-arm946.S
> @@ -310,6 +310,9 @@ ENTRY(arm946_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(arm946_dma_unmap_area)
>  
> +	.globl	arm946_flush_kern_cache_louis
> +	.equ	arm946_flush_kern_cache_louis, arm946_flush_kern_cache_all
> +
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
>  	define_cache_functions arm946
>  
> diff --git a/arch/arm/mm/proc-feroceon.S b/arch/arm/mm/proc-feroceon.S
> index 23a8e4c..85e5e3b 100644
> --- a/arch/arm/mm/proc-feroceon.S
> +++ b/arch/arm/mm/proc-feroceon.S
> @@ -415,6 +415,9 @@ ENTRY(feroceon_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(feroceon_dma_unmap_area)
>  
> +	.globl	feroceon_flush_kern_cache_louis
> +	.equ	feroceon_flush_kern_cache_louis, feroceon_flush_kern_cache_all
> +
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
>  	define_cache_functions feroceon
>  
> diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S
> index 2d8ff3a..b29a226 100644
> --- a/arch/arm/mm/proc-macros.S
> +++ b/arch/arm/mm/proc-macros.S
> @@ -299,6 +299,7 @@ ENTRY(\name\()_processor_functions)
>  ENTRY(\name\()_cache_fns)
>  	.long	\name\()_flush_icache_all
>  	.long	\name\()_flush_kern_cache_all
> +	.long   \name\()_flush_kern_cache_louis
>  	.long	\name\()_flush_user_cache_all
>  	.long	\name\()_flush_user_cache_range
>  	.long	\name\()_coherent_kern_range
> diff --git a/arch/arm/mm/proc-mohawk.S b/arch/arm/mm/proc-mohawk.S
> index fbb2124..82f9cdc 100644
> --- a/arch/arm/mm/proc-mohawk.S
> +++ b/arch/arm/mm/proc-mohawk.S
> @@ -303,6 +303,9 @@ ENTRY(mohawk_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(mohawk_dma_unmap_area)
>  
> +	.globl	mohawk_flush_kern_cache_louis
> +	.equ	mohawk_flush_kern_cache_louis, mohawk_flush_kern_cache_all
> +
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
>  	define_cache_functions mohawk
>  
> diff --git a/arch/arm/mm/proc-xsc3.S b/arch/arm/mm/proc-xsc3.S
> index b0d5786..eb93d64 100644
> --- a/arch/arm/mm/proc-xsc3.S
> +++ b/arch/arm/mm/proc-xsc3.S
> @@ -337,6 +337,9 @@ ENTRY(xsc3_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(xsc3_dma_unmap_area)
>  
> +	.globl	xsc3_flush_kern_cache_louis
> +	.equ	xsc3_flush_kern_cache_louis, xsc3_flush_kern_cache_all
> +
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
>  	define_cache_functions xsc3
>  
> diff --git a/arch/arm/mm/proc-xscale.S b/arch/arm/mm/proc-xscale.S
> index 4ffebaa..b5ea31d 100644
> --- a/arch/arm/mm/proc-xscale.S
> +++ b/arch/arm/mm/proc-xscale.S
> @@ -410,6 +410,9 @@ ENTRY(xscale_dma_unmap_area)
>  	mov	pc, lr
>  ENDPROC(xscale_dma_unmap_area)
>  
> +	.globl	xscale_flush_kern_cache_louis
> +	.equ	xscale_flush_kern_cache_louis, xscale_flush_kern_cache_all
> +
>  	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
>  	define_cache_functions xscale
>  
> -- 
> 1.7.12
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 2/5] ARM: mm: rename jump labels in v7_flush_dcache_all function
  2012-09-18 16:35 ` [RFC PATCH v2 2/5] ARM: mm: rename jump labels in v7_flush_dcache_all function Lorenzo Pieralisi
@ 2012-09-18 18:13   ` Nicolas Pitre
  2012-09-19 13:51   ` Dave Martin
  1 sibling, 0 replies; 39+ messages in thread
From: Nicolas Pitre @ 2012-09-18 18:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 18 Sep 2012, Lorenzo Pieralisi wrote:

> This patch renames jump labels in v7_flush_dcache_all in order to define
> a specific flush cache levels entry point.
> 
> TODO: factor out the level flushing loop if considered worthwhile and
>       define the input registers requirements.
> 
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

Acked-by: Nicolas Pitre <nico@linaro.org>

> ---
>  arch/arm/mm/cache-v7.S | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> index d1fa2f6..140b294 100644
> --- a/arch/arm/mm/cache-v7.S
> +++ b/arch/arm/mm/cache-v7.S
> @@ -48,7 +48,7 @@ ENTRY(v7_flush_dcache_louis)
>  	mov	r3, r3, lsr #20			@ r3 = LoUIS * 2
>  	moveq	pc, lr				@ return if level == 0
>  	mov	r10, #0				@ r10 (starting level) = 0
> -	b	loop1				@ start flushing cache levels
> +	b	flush_levels			@ start flushing cache levels
>  ENDPROC(v7_flush_dcache_louis)
>  
>  /*
> @@ -67,7 +67,7 @@ ENTRY(v7_flush_dcache_all)
>  	mov	r3, r3, lsr #23			@ left align loc bit field
>  	beq	finished			@ if loc is 0, then no need to clean
>  	mov	r10, #0				@ start clean at cache level 0
> -loop1:
> +flush_levels:
>  	add	r2, r10, r10, lsr #1		@ work out 3x current cache level
>  	mov	r1, r0, lsr r2			@ extract cache type bits from clidr
>  	and	r1, r1, #7			@ mask of the bits for current cache only
> @@ -89,9 +89,9 @@ loop1:
>  	clz	r5, r4				@ find bit position of way size increment
>  	ldr	r7, =0x7fff
>  	ands	r7, r7, r1, lsr #13		@ extract max number of the index size
> -loop2:
> +loop1:
>  	mov	r9, r4				@ create working copy of max way size
> -loop3:
> +loop2:
>   ARM(	orr	r11, r10, r9, lsl r5	)	@ factor way and cache number into r11
>   THUMB(	lsl	r6, r9, r5		)
>   THUMB(	orr	r11, r10, r6		)	@ factor way and cache number into r11
> @@ -100,13 +100,13 @@ loop3:
>   THUMB(	orr	r11, r11, r6		)	@ factor index number into r11
>  	mcr	p15, 0, r11, c7, c14, 2		@ clean & invalidate by set/way
>  	subs	r9, r9, #1			@ decrement the way
> -	bge	loop3
> -	subs	r7, r7, #1			@ decrement the index
>  	bge	loop2
> +	subs	r7, r7, #1			@ decrement the index
> +	bge	loop1
>  skip:
>  	add	r10, r10, #2			@ increment cache number
>  	cmp	r3, r10
> -	bgt	loop1
> +	bgt	flush_levels
>  finished:
>  	mov	r10, #0				@ swith back to cache level 0
>  	mcr	p15, 2, r10, c0, c0, 0		@ select current cache level in cssr
> -- 
> 1.7.12
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-09-18 16:35 ` [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations Lorenzo Pieralisi
@ 2012-09-18 18:18   ` Nicolas Pitre
  2012-09-19 13:46   ` Dave Martin
  1 sibling, 0 replies; 39+ messages in thread
From: Nicolas Pitre @ 2012-09-18 18:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 18 Sep 2012, Lorenzo Pieralisi wrote:

> In processors like A15/A7 L2 cache is unified and integrated within the
> processor cache hierarchy, so that it is not considered an outer cache
> anymore. For processors like A15/A7 flush_cache_all() ends up cleaning
> all cache levels up to Level of Coherency (LoC) that includes
> the L2 unified cache.
> 
> When a single CPU is suspended (CPU idle) a complete L2 clean is not
> required, so generic cpu_suspend code must clean the data cache using the
> newly introduced cache LoUIS function.
> 
> The context and stack pointer (context pointer) are cleaned to main memory
> using cache area functions that operate on MVA and guarantee that the data
> is written back to main memory (perform cache cleaning up to the Point of
> Coherency - PoC) so that the processor can fetch the context when the MMU
> is off in the cpu_resume code path.
> 
> outer_cache management remains unchanged.
> 
> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

Reviewed-by: Nicolas Pitre <nico@linaro.org>

> ---
>  arch/arm/kernel/suspend.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/kernel/suspend.c b/arch/arm/kernel/suspend.c
> index 1794cc3..358bca3 100644
> --- a/arch/arm/kernel/suspend.c
> +++ b/arch/arm/kernel/suspend.c
> @@ -17,6 +17,8 @@ extern void cpu_resume_mmu(void);
>   */
>  void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
>  {
> +	u32 *ctx = ptr;
> +
>  	*save_ptr = virt_to_phys(ptr);
>  
>  	/* This must correspond to the LDM in cpu_resume() assembly */
> @@ -26,7 +28,20 @@ void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
>  
>  	cpu_do_suspend(ptr);
>  
> -	flush_cache_all();
> +	flush_cache_louis();
> +
> +	/*
> +	 * flush_cache_louis does not guarantee that
> +	 * save_ptr and ptr are cleaned to main memory,
> +	 * just up to the Level of Unification Inner Shareable.
> +	 * Since the context pointer and context itself
> +	 * are to be retrieved with the MMU off that
> +	 * data must be cleaned from all cache levels
> +	 * to main memory using "area" cache primitives.
> +	*/
> +	__cpuc_flush_dcache_area(ctx, ptrsz);
> +	__cpuc_flush_dcache_area(save_ptr, sizeof(*save_ptr));
> +
>  	outer_clean_range(*save_ptr, *save_ptr + ptrsz);
>  	outer_clean_range(virt_to_phys(save_ptr),
>  			  virt_to_phys(save_ptr) + sizeof(*save_ptr));
> -- 
> 1.7.12
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 4/5] ARM: kernel: update __cpu_disable to use cache LoUIS maintenance API
  2012-09-18 16:35 ` [RFC PATCH v2 4/5] ARM: kernel: update __cpu_disable to use cache LoUIS maintenance API Lorenzo Pieralisi
@ 2012-09-18 18:19   ` Nicolas Pitre
  0 siblings, 0 replies; 39+ messages in thread
From: Nicolas Pitre @ 2012-09-18 18:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 18 Sep 2012, Lorenzo Pieralisi wrote:

> When a CPU is hotplugged out caches that reside in its power domain
> lose their contents and so must be cleaned to the next memory level.
> 
> Currently, __cpu_disable calls flush_cache_all() that for new generation
> processor like A15/A7 ends up cleaning and invalidating all cache levels
> up to Level of Coherency, which includes the unified L2.
> 
> This ends up being a waste of cycles since the L2 cache contents are not
> lost on power down.
> 
> This patch updates __cpu_disable to use the new LoUIS API cache operations.
> 
> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

Acked-by: Nicolas Pitre <nico@linaro.org>

> ---
>  arch/arm/kernel/smp.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index d3eb222..f44e9cd 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -136,8 +136,11 @@ int __cpu_disable(void)
>  	/*
>  	 * Flush user cache and TLB mappings, and then remove this CPU
>  	 * from the vm mask set of all processes.
> +	 *
> +	 * Caches are flushed to the Level of Unification Inner Shareable
> +	 * to write-back dirty lines to unified caches shared by all CPUs.
>  	 */
> -	flush_cache_all();
> +	flush_cache_louis();
>  	local_flush_tlb_all();
>  
>  	clear_tasks_mm_cpumask(cpu);
> -- 
> 1.7.12
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 5/5] ARM: mm: update __v7_setup() to the new LoUIS cache maintenance API
  2012-09-18 16:35 ` [RFC PATCH v2 5/5] ARM: mm: update __v7_setup() to the new LoUIS cache " Lorenzo Pieralisi
@ 2012-09-18 18:20   ` Nicolas Pitre
  0 siblings, 0 replies; 39+ messages in thread
From: Nicolas Pitre @ 2012-09-18 18:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 18 Sep 2012, Lorenzo Pieralisi wrote:

> From: Santosh Shilimkar <santosh.shilimkar@ti.com>
> 
> The ARMv7 processor setup function __v7_setup() cleans and invalidates the
> CPU cache before enabling MMU to start the CPU with a clean CPU local cache.
> 
> But on ARMv7 architectures like Cortex-[A15/A8], this code will end
> up flushing the L2 caches(up to level of Coherency) which is undesirable
> and expensive. The setup functions are used in the CPU hotplug scenario too
> and hence flushing all cache levels should be avoided.
> 
> This patch replaces the cache flushing call with the newly introduced
> v7 dcache LoUIS API where only cache levels up to LoUIS are cleaned and
> invalidated when a processors executes __v7_setup which is the expected
> behavior.
> 
> For processors like A9 and A5 where the L2 cache is an outer one the
> behavior should be unchanged.
> 
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

Reviewed-by: Nicolas Pitre <nico@linaro.org>

> ---
>  arch/arm/mm/proc-v7.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
> index c2e2b66..846d279 100644
> --- a/arch/arm/mm/proc-v7.S
> +++ b/arch/arm/mm/proc-v7.S
> @@ -172,7 +172,7 @@ __v7_ca15mp_setup:
>  __v7_setup:
>  	adr	r12, __v7_setup_stack		@ the local stack
>  	stmia	r12, {r0-r5, r7, r9, r11, lr}
> -	bl	v7_flush_dcache_all
> +	bl      v7_flush_dcache_louis
>  	ldmia	r12, {r0-r5, r7, r9, r11, lr}
>  
>  	mrc	p15, 0, r0, c0, c0, 0		@ read main ID register
> -- 
> 1.7.12
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 1/5] ARM: mm: implement LoUIS API for cache maintenance ops
  2012-09-18 18:12   ` Nicolas Pitre
@ 2012-09-19 12:30     ` Lorenzo Pieralisi
  0 siblings, 0 replies; 39+ messages in thread
From: Lorenzo Pieralisi @ 2012-09-19 12:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 18, 2012 at 07:12:52PM +0100, Nicolas Pitre wrote:
> On Tue, 18 Sep 2012, Lorenzo Pieralisi wrote:
> 
> > ARM v7 architecture introduced the concept of cache levels and related
> > control registers. New processors like A7 and A15 embed an L2 unified cache
> > controller that becomes part of the cache level hierarchy. Some operations in
> > the kernel like cpu_suspend and __cpu_disable do not require a flush of the
> > entire cache hierarchy to DRAM but just the cache levels belonging to the
> > Level of Unification Inner Shareable (LoUIS), which in most of ARM v7 systems
> > correspond to L1.
> >
> > The current cache flushing API used in cpu_suspend and __cpu_disable,
> > flush_cache_all(), ends up flushing the whole cache hierarchy since for
> > v7 it cleans and invalidates all cache levels up to Level of Coherency
> > (LoC) which cripples system performance when used in hot paths like hotplug
> > and cpuidle.
> >
> > Therefore a new kernel cache maintenance API must be added to cope with
> > latest ARM system requirements.
> >
> > This patch adds flush_cache_louis() to the ARM kernel cache maintenance API.
> >
> > This function cleans and invalidates all data cache levels up to the
> > Level of Unification Inner Shareable (LoUIS) and invalidates the instruction
> > cache for processors that support it (> v7).
> >
> > This patch also creates an alias of the cache LoUIS function to flush_kern_all
> > for all processor versions prior to v7, so that the current cache flushing
> > behaviour is unchanged for those processors.
> >
> > v7 cache maintenance code implements a cache LoUIS function that cleans and
> > invalidates the D-cache up to LoUIS and invalidates the I-cache, according
> > to the new API.
> >
> > Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> > Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> 
> Reviewed-by: Nicolas Pitre <nico@linaro.org>

Thanks a lot for reviewing the series.

Lorenzo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-09-18 16:35 ` [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations Lorenzo Pieralisi
  2012-09-18 18:18   ` Nicolas Pitre
@ 2012-09-19 13:46   ` Dave Martin
  2012-09-20 10:25     ` Lorenzo Pieralisi
  1 sibling, 1 reply; 39+ messages in thread
From: Dave Martin @ 2012-09-19 13:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 18, 2012 at 05:35:33PM +0100, Lorenzo Pieralisi wrote:
> In processors like A15/A7 L2 cache is unified and integrated within the
> processor cache hierarchy, so that it is not considered an outer cache
> anymore. For processors like A15/A7 flush_cache_all() ends up cleaning
> all cache levels up to Level of Coherency (LoC) that includes
> the L2 unified cache.
> 
> When a single CPU is suspended (CPU idle) a complete L2 clean is not
> required, so generic cpu_suspend code must clean the data cache using the
> newly introduced cache LoUIS function.

For patches 3-5 in this series, we know that the assumption that
flushing LoUIS is sufficient for safely powering the CPU down is not
valid in the general case, though we've agreed it's a sensible
compromise for the CPU variants we know about today.

I think we do need to document this assumption, though.

At this point I don't mind whether it appears in code comments or in the
commit messages.

Cheers
---Dave

> 
> The context and stack pointer (context pointer) are cleaned to main memory
> using cache area functions that operate on MVA and guarantee that the data
> is written back to main memory (perform cache cleaning up to the Point of
> Coherency - PoC) so that the processor can fetch the context when the MMU
> is off in the cpu_resume code path.
> 
> outer_cache management remains unchanged.
> 
> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> ---
>  arch/arm/kernel/suspend.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/kernel/suspend.c b/arch/arm/kernel/suspend.c
> index 1794cc3..358bca3 100644
> --- a/arch/arm/kernel/suspend.c
> +++ b/arch/arm/kernel/suspend.c
> @@ -17,6 +17,8 @@ extern void cpu_resume_mmu(void);
>   */
>  void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
>  {
> +	u32 *ctx = ptr;
> +
>  	*save_ptr = virt_to_phys(ptr);
>  
>  	/* This must correspond to the LDM in cpu_resume() assembly */
> @@ -26,7 +28,20 @@ void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
>  
>  	cpu_do_suspend(ptr);
>  
> -	flush_cache_all();
> +	flush_cache_louis();
> +
> +	/*
> +	 * flush_cache_louis does not guarantee that
> +	 * save_ptr and ptr are cleaned to main memory,
> +	 * just up to the Level of Unification Inner Shareable.
> +	 * Since the context pointer and context itself
> +	 * are to be retrieved with the MMU off that
> +	 * data must be cleaned from all cache levels
> +	 * to main memory using "area" cache primitives.
> +	*/
> +	__cpuc_flush_dcache_area(ctx, ptrsz);
> +	__cpuc_flush_dcache_area(save_ptr, sizeof(*save_ptr));
> +
>  	outer_clean_range(*save_ptr, *save_ptr + ptrsz);
>  	outer_clean_range(virt_to_phys(save_ptr),
>  			  virt_to_phys(save_ptr) + sizeof(*save_ptr));
> -- 
> 1.7.12
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 2/5] ARM: mm: rename jump labels in v7_flush_dcache_all function
  2012-09-18 16:35 ` [RFC PATCH v2 2/5] ARM: mm: rename jump labels in v7_flush_dcache_all function Lorenzo Pieralisi
  2012-09-18 18:13   ` Nicolas Pitre
@ 2012-09-19 13:51   ` Dave Martin
  2012-09-20 10:32     ` Lorenzo Pieralisi
  1 sibling, 1 reply; 39+ messages in thread
From: Dave Martin @ 2012-09-19 13:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 18, 2012 at 05:35:32PM +0100, Lorenzo Pieralisi wrote:
> This patch renames jump labels in v7_flush_dcache_all in order to define
> a specific flush cache levels entry point.
> 
> TODO: factor out the level flushing loop if considered worthwhile and
>       define the input registers requirements.

In the context of this series, this patch seems to do nothing at all (?)

Maybe it would make sense to defer this patch until you post something
that uses it.

Given that I have a fair expectation that you will build something useful
on top of this in the near future, I don't have a strong feeling about
it, though.

Cheers
---Dave

> 
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> ---
>  arch/arm/mm/cache-v7.S | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> index d1fa2f6..140b294 100644
> --- a/arch/arm/mm/cache-v7.S
> +++ b/arch/arm/mm/cache-v7.S
> @@ -48,7 +48,7 @@ ENTRY(v7_flush_dcache_louis)
>  	mov	r3, r3, lsr #20			@ r3 = LoUIS * 2
>  	moveq	pc, lr				@ return if level == 0
>  	mov	r10, #0				@ r10 (starting level) = 0
> -	b	loop1				@ start flushing cache levels
> +	b	flush_levels			@ start flushing cache levels
>  ENDPROC(v7_flush_dcache_louis)
>  
>  /*
> @@ -67,7 +67,7 @@ ENTRY(v7_flush_dcache_all)
>  	mov	r3, r3, lsr #23			@ left align loc bit field
>  	beq	finished			@ if loc is 0, then no need to clean
>  	mov	r10, #0				@ start clean at cache level 0
> -loop1:
> +flush_levels:
>  	add	r2, r10, r10, lsr #1		@ work out 3x current cache level
>  	mov	r1, r0, lsr r2			@ extract cache type bits from clidr
>  	and	r1, r1, #7			@ mask of the bits for current cache only
> @@ -89,9 +89,9 @@ loop1:
>  	clz	r5, r4				@ find bit position of way size increment
>  	ldr	r7, =0x7fff
>  	ands	r7, r7, r1, lsr #13		@ extract max number of the index size
> -loop2:
> +loop1:
>  	mov	r9, r4				@ create working copy of max way size
> -loop3:
> +loop2:
>   ARM(	orr	r11, r10, r9, lsl r5	)	@ factor way and cache number into r11
>   THUMB(	lsl	r6, r9, r5		)
>   THUMB(	orr	r11, r10, r6		)	@ factor way and cache number into r11
> @@ -100,13 +100,13 @@ loop3:
>   THUMB(	orr	r11, r11, r6		)	@ factor index number into r11
>  	mcr	p15, 0, r11, c7, c14, 2		@ clean & invalidate by set/way
>  	subs	r9, r9, #1			@ decrement the way
> -	bge	loop3
> -	subs	r7, r7, #1			@ decrement the index
>  	bge	loop2
> +	subs	r7, r7, #1			@ decrement the index
> +	bge	loop1
>  skip:
>  	add	r10, r10, #2			@ increment cache number
>  	cmp	r3, r10
> -	bgt	loop1
> +	bgt	flush_levels
>  finished:
>  	mov	r10, #0				@ swith back to cache level 0
>  	mcr	p15, 2, r10, c0, c0, 0		@ select current cache level in cssr
> -- 
> 1.7.12
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-09-19 13:46   ` Dave Martin
@ 2012-09-20 10:25     ` Lorenzo Pieralisi
  2012-09-20 11:04       ` Dave Martin
  0 siblings, 1 reply; 39+ messages in thread
From: Lorenzo Pieralisi @ 2012-09-20 10:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 19, 2012 at 02:46:58PM +0100, Dave Martin wrote:
> On Tue, Sep 18, 2012 at 05:35:33PM +0100, Lorenzo Pieralisi wrote:
> > In processors like A15/A7 L2 cache is unified and integrated within the
> > processor cache hierarchy, so that it is not considered an outer cache
> > anymore. For processors like A15/A7 flush_cache_all() ends up cleaning
> > all cache levels up to Level of Coherency (LoC) that includes
> > the L2 unified cache.
> > 
> > When a single CPU is suspended (CPU idle) a complete L2 clean is not
> > required, so generic cpu_suspend code must clean the data cache using the
> > newly introduced cache LoUIS function.
> 
> For patches 3-5 in this series, we know that the assumption that
> flushing LoUIS is sufficient for safely powering the CPU down is not
> valid in the general case, though we've agreed it's a sensible
> compromise for the CPU variants we know about today.

I agree, but we should also keep in mind that there are suspend and
hotplug finishers where platform specific code can (and should sometimes)
carry out the required operations, if flushing to LoUIS is not sufficient.

Patch 3-5 are there to avoid carrying out heavy cache operations that
are not needed, not to define LoUIS as a sufficient cache level for
powering down a CPU.

Your concern is shared, though.

> 
> I think we do need to document this assumption, though.
> 
> At this point I don't mind whether it appears in code comments or in the
> commit messages.

It is a fair point. I will improve comments in the code and commit logs
for next version.

Lorenzo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 2/5] ARM: mm: rename jump labels in v7_flush_dcache_all function
  2012-09-19 13:51   ` Dave Martin
@ 2012-09-20 10:32     ` Lorenzo Pieralisi
  2012-09-20 11:01       ` Dave Martin
  0 siblings, 1 reply; 39+ messages in thread
From: Lorenzo Pieralisi @ 2012-09-20 10:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 19, 2012 at 02:51:56PM +0100, Dave Martin wrote:
> On Tue, Sep 18, 2012 at 05:35:32PM +0100, Lorenzo Pieralisi wrote:
> > This patch renames jump labels in v7_flush_dcache_all in order to define
> > a specific flush cache levels entry point.
> > 
> > TODO: factor out the level flushing loop if considered worthwhile and
> >       define the input registers requirements.
> 
> In the context of this series, this patch seems to do nothing at all (?)

Agreed, it is just replacing some labels. I thought that something like:

b flush_levels

is clearer than:

b loop1

If I manage to factor out the cache level flushing loop I think things
are even better, I just did not want to change v7_flush_dcache_all, I would
avoid doing that, unless, as I mentioned, it is considered worthwhile.

> Maybe it would make sense to defer this patch until you post something
> that uses it.

v7_flush_dcache_louis uses it, I have no problem in deferring it though.

Lorenzo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 2/5] ARM: mm: rename jump labels in v7_flush_dcache_all function
  2012-09-20 10:32     ` Lorenzo Pieralisi
@ 2012-09-20 11:01       ` Dave Martin
  0 siblings, 0 replies; 39+ messages in thread
From: Dave Martin @ 2012-09-20 11:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 20, 2012 at 11:32:12AM +0100, Lorenzo Pieralisi wrote:
> On Wed, Sep 19, 2012 at 02:51:56PM +0100, Dave Martin wrote:
> > On Tue, Sep 18, 2012 at 05:35:32PM +0100, Lorenzo Pieralisi wrote:
> > > This patch renames jump labels in v7_flush_dcache_all in order to define
> > > a specific flush cache levels entry point.
> > > 
> > > TODO: factor out the level flushing loop if considered worthwhile and
> > >       define the input registers requirements.
> > 
> > In the context of this series, this patch seems to do nothing at all (?)
> 
> Agreed, it is just replacing some labels. I thought that something like:
> 
> b flush_levels
> 
> is clearer than:
> 
> b loop1
> 
> If I manage to factor out the cache level flushing loop I think things
> are even better, I just did not want to change v7_flush_dcache_all, I would
> avoid doing that, unless, as I mentioned, it is considered worthwhile.
> 
> > Maybe it would make sense to defer this patch until you post something
> > that uses it.
> 
> v7_flush_dcache_louis uses it, I have no problem in deferring it though.

I don't think it's necessary to defer it -- I just wanted to understand
whether there was some context here I wasn't aware of.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-09-20 10:25     ` Lorenzo Pieralisi
@ 2012-09-20 11:04       ` Dave Martin
  2012-12-11 16:07         ` Guennadi Liakhovetski
  0 siblings, 1 reply; 39+ messages in thread
From: Dave Martin @ 2012-09-20 11:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 20, 2012 at 11:25:14AM +0100, Lorenzo Pieralisi wrote:
> On Wed, Sep 19, 2012 at 02:46:58PM +0100, Dave Martin wrote:
> > On Tue, Sep 18, 2012 at 05:35:33PM +0100, Lorenzo Pieralisi wrote:
> > > In processors like A15/A7 L2 cache is unified and integrated within the
> > > processor cache hierarchy, so that it is not considered an outer cache
> > > anymore. For processors like A15/A7 flush_cache_all() ends up cleaning
> > > all cache levels up to Level of Coherency (LoC) that includes
> > > the L2 unified cache.
> > > 
> > > When a single CPU is suspended (CPU idle) a complete L2 clean is not
> > > required, so generic cpu_suspend code must clean the data cache using the
> > > newly introduced cache LoUIS function.
> > 
> > For patches 3-5 in this series, we know that the assumption that
> > flushing LoUIS is sufficient for safely powering the CPU down is not
> > valid in the general case, though we've agreed it's a sensible
> > compromise for the CPU variants we know about today.
> 
> I agree, but we should also keep in mind that there are suspend and
> hotplug finishers where platform specific code can (and should sometimes)
> carry out the required operations, if flushing to LoUIS is not sufficient.
> 
> Patch 3-5 are there to avoid carrying out heavy cache operations that
> are not needed, not to define LoUIS as a sufficient cache level for
> powering down a CPU.
> 
> Your concern is shared, though.
> 
> > 
> > I think we do need to document this assumption, though.
> > 
> > At this point I don't mind whether it appears in code comments or in the
> > commit messages.
> 
> It is a fair point. I will improve comments in the code and commit logs
> for next version.

That should be fine.

Since the commit messages use quite precise terminology, I was worried
that they could be misinterpreted as stating the correct architectural
solution unless we point out that platform code maintainers still need
to pay attention to ensure that the correct levels are flushed for their
hardware.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 0/5] ARM: augment cache flushing API
  2012-09-18 16:35 [RFC PATCH v2 0/5] ARM: augment cache flushing API Lorenzo Pieralisi
                   ` (4 preceding siblings ...)
  2012-09-18 16:35 ` [RFC PATCH v2 5/5] ARM: mm: update __v7_setup() to the new LoUIS cache " Lorenzo Pieralisi
@ 2012-09-20 11:27 ` Lorenzo Pieralisi
  2012-09-21  8:07   ` Shawn Guo
  5 siblings, 1 reply; 39+ messages in thread
From: Lorenzo Pieralisi @ 2012-09-20 11:27 UTC (permalink / raw)
  To: linux-arm-kernel

[adding CCs: Kukjin Kim, Shawn Guo, Magnus Damm, Rob Herring]

Pushed out a branch containing the set @

git://linux-arm.org/linux-2.6-lp.git cache-louis

for testing purposes.

Thanks a lot,
Lorenzo

On Tue, Sep 18, 2012 at 05:35:30PM +0100, Lorenzo Pieralisi wrote:
> This patch series provides an update of a previous posting:
> 
> http://www.spinics.net/lists/arm-kernel/msg194946.html
> 
> v2 updates:
>  - Dropped v7 dcache level patch
>  - Refactor the set to make it work on all processors with MULTI_CACHE
>    and !MULTI_CACHE
>  - Factor out label redefinition in v7_flush_dcache_all
>  - Updated some comments
> 
> v7 ARM architecture introduced the concept of cache levels and relative
> control registers to manage them. Cache operations that operate on set/way
> require to define the cache level at which maintenance operations are carried
> out by using coprocessor registers.
> 
> Processors like A7/A15 integrates a unified L2 that is part of the cache
> level hierarchy; this implies that cache operations operating on all levels
> also end up cleaning the L2 unified cache which is a very time consuming
> operation and it is not needed for some power-down operations like single CPU
> shutdown.
> 
> For v7, flush_kern_all() cleans all the cache levels up to the Level of
> Coherency which includes L2 in it. This is suboptimal for code paths that end
> up shutting-down a single processor like CPU hotplug and CPU idle, where only
> per-CPU cache state (ie L1 integrated cache) has to be cleaned and invalidated.
> 
> To fix this performance issue this patchset introduces cache LoUIS (Level of
> Unification Inner Shareable) maintenance operations in the kernel.
> 
> A new cache operations pointer is added to cpu_cache_fns
> 
> void (*flush_kern_cache_louis)(void);
> 
> that allows to clean and invalidate all data cache levels up to the LoUIS and
> invalidate the instruction cache. This new API should provide a sufficiently
> optimized API to be used in generic C code in the kernel for power management
> operations on most v7 systems.
> 
> For architecture versions previous to v7, flush_kern_cache_louis() falls back
> to flush_kern_all() leaving the current behaviour unchanged.
> 
> For A9/A5 processors Level of Unification Inner Shareable and Level of
> Coherency are equivalent hence this patch should not affect current kernel
> behaviour in any way when run on A9/A5 based systems, but should nonetheless
> be thoroughly tested on them.
> 
> Tested on:
>   - OMAP4 (S2R, cpuidle and hotplug)
>   - OMAP5 (out of tree code) (S2R, cpuidle and hotplug)
>   - TC2 big.LITTLE testchip (out of tree code) (cpuidle, on both A7 and A15
>     clusters)
> 
> Lorenzo Pieralisi (4):
>   ARM: mm: implement LoUIS API for cache maintenance ops
>   ARM: mm: rename jump labels in v7_flush_dcache_all function
>   ARM: kernel: update cpu_suspend code to use cache LoUIS operations
>   ARM: kernel: update __cpu_disable to use cache LoUIS maintenance API
> 
> Santosh Shilimkar (1):
>   ARM: mm: update __v7_setup() to the new LoUIS cache maintenance API
> 
>  arch/arm/include/asm/cacheflush.h | 15 ++++++++++++
>  arch/arm/include/asm/glue-cache.h |  1 +
>  arch/arm/kernel/smp.c             |  5 +++-
>  arch/arm/kernel/suspend.c         | 17 +++++++++++++-
>  arch/arm/mm/cache-fa.S            |  3 +++
>  arch/arm/mm/cache-v3.S            |  3 +++
>  arch/arm/mm/cache-v4.S            |  3 +++
>  arch/arm/mm/cache-v4wb.S          |  3 +++
>  arch/arm/mm/cache-v4wt.S          |  3 +++
>  arch/arm/mm/cache-v6.S            |  3 +++
>  arch/arm/mm/cache-v7.S            | 48 ++++++++++++++++++++++++++++++++++-----
>  arch/arm/mm/proc-arm1020.S        |  3 +++
>  arch/arm/mm/proc-arm1020e.S       |  3 +++
>  arch/arm/mm/proc-arm1022.S        |  3 +++
>  arch/arm/mm/proc-arm1026.S        |  3 +++
>  arch/arm/mm/proc-arm920.S         |  3 +++
>  arch/arm/mm/proc-arm922.S         |  3 +++
>  arch/arm/mm/proc-arm925.S         |  3 +++
>  arch/arm/mm/proc-arm926.S         |  3 +++
>  arch/arm/mm/proc-arm940.S         |  3 +++
>  arch/arm/mm/proc-arm946.S         |  3 +++
>  arch/arm/mm/proc-feroceon.S       |  3 +++
>  arch/arm/mm/proc-macros.S         |  1 +
>  arch/arm/mm/proc-mohawk.S         |  3 +++
>  arch/arm/mm/proc-v7.S             |  2 +-
>  arch/arm/mm/proc-xsc3.S           |  3 +++
>  arch/arm/mm/proc-xscale.S         |  3 +++
>  27 files changed, 140 insertions(+), 9 deletions(-)
> 
> -- 
> 1.7.12
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 0/5] ARM: augment cache flushing API
  2012-09-20 11:27 ` [RFC PATCH v2 0/5] ARM: augment cache flushing API Lorenzo Pieralisi
@ 2012-09-21  8:07   ` Shawn Guo
  0 siblings, 0 replies; 39+ messages in thread
From: Shawn Guo @ 2012-09-21  8:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 20, 2012 at 12:27:55PM +0100, Lorenzo Pieralisi wrote:
> [adding CCs: Kukjin Kim, Shawn Guo, Magnus Damm, Rob Herring]
> 
On imx6q with suspend/hotplug,

Tested-by: Shawn Guo <shawn.guo@linaro.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-09-20 11:04       ` Dave Martin
@ 2012-12-11 16:07         ` Guennadi Liakhovetski
  2012-12-11 16:33           ` Will Deacon
  0 siblings, 1 reply; 39+ messages in thread
From: Guennadi Liakhovetski @ 2012-12-11 16:07 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all

On Thu, 20 Sep 2012, Dave Martin wrote:

> On Thu, Sep 20, 2012 at 11:25:14AM +0100, Lorenzo Pieralisi wrote:
> > On Wed, Sep 19, 2012 at 02:46:58PM +0100, Dave Martin wrote:
> > > On Tue, Sep 18, 2012 at 05:35:33PM +0100, Lorenzo Pieralisi wrote:
> > > > In processors like A15/A7 L2 cache is unified and integrated within the
> > > > processor cache hierarchy, so that it is not considered an outer cache
> > > > anymore. For processors like A15/A7 flush_cache_all() ends up cleaning
> > > > all cache levels up to Level of Coherency (LoC) that includes
> > > > the L2 unified cache.
> > > > 
> > > > When a single CPU is suspended (CPU idle) a complete L2 clean is not
> > > > required, so generic cpu_suspend code must clean the data cache using the
> > > > newly introduced cache LoUIS function.

Git bisect identified this patch, in the mainline as

commit dbee0c6fb4c1269b2dfc8b0b7a29907ea7fed560
Author: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Date:   Fri Sep 7 11:06:57 2012 +0530

    ARM: kernel: update cpu_suspend code to use cache LoUIS operations

as the culprit of the broken wake up from STR on mackerel, based on an 
sh7372 A8 SoC. .config attached.

Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/
-------------- next part --------------
# CONFIG_ARM_PATCH_PHYS_VIRT is not set
CONFIG_EXPERIMENTAL=y
CONFIG_CROSS_COMPILE="arm-none-linux-gnueabi-"
CONFIG_LOCALVERSION="-ap4"
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SYSVIPC=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS_ALL=y
CONFIG_EMBEDDED=y
CONFIG_SLAB=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_LBDAF is not set
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_IOSCHED_DEADLINE is not set
# CONFIG_IOSCHED_CFQ is not set
CONFIG_ARCH_SHMOBILE=y
CONFIG_ARCH_SH7372=y
CONFIG_MACH_AP4EVB=y
CONFIG_MACH_MACKEREL=y
CONFIG_MEMORY_SIZE=0x20000000
# CONFIG_SH_TIMER_CMT is not set
# CONFIG_EM_TIMER_STI is not set
# CONFIG_ARM_THUMB is not set
CONFIG_AEABI=y
CONFIG_FORCE_MAX_ZONEORDER=12
CONFIG_USE_OF=y
CONFIG_ZBOOT_ROM_TEXT=0x0
CONFIG_ZBOOT_ROM_BSS=0x0
CONFIG_ARM_APPENDED_DTB=y
CONFIG_ARM_ATAG_DTB_COMPAT=y
CONFIG_CMDLINE="console=ttySC0,115200 console=tty1 earlyprintk=sh-sci.0,115200"
CONFIG_KEXEC=y
CONFIG_VFP=y
CONFIG_NEON=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
CONFIG_PM_RUNTIME=y
CONFIG_NET=y
CONFIG_PACKET=m
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
CONFIG_IP_PNP_BOOTP=y
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
CONFIG_INET_DIAG=m
CONFIG_INET_UDP_DIAG=m
# CONFIG_INET6_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET6_XFRM_MODE_TUNNEL is not set
# CONFIG_INET6_XFRM_MODE_BEET is not set
# CONFIG_IPV6_SIT is not set
CONFIG_CFG80211=m
CONFIG_CFG80211_WEXT=y
CONFIG_MAC80211=m
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
# CONFIG_FIRMWARE_IN_KERNEL is not set
CONFIG_PROC_DEVICETREE=y
# CONFIG_BLK_DEV is not set
CONFIG_NETDEVICES=y
CONFIG_NETCONSOLE=y
# CONFIG_NET_VENDOR_BROADCOM is not set
# CONFIG_NET_VENDOR_CHELSIO is not set
# CONFIG_NET_VENDOR_CIRRUS is not set
# CONFIG_NET_VENDOR_FARADAY is not set
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_NET_VENDOR_MARVELL is not set
# CONFIG_NET_VENDOR_MICREL is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
# CONFIG_NET_VENDOR_SEEQ is not set
CONFIG_SMSC911X=y
# CONFIG_NET_VENDOR_STMICRO is not set
# CONFIG_NET_VENDOR_WIZNET is not set
CONFIG_MDIO_BITBANG=y
# CONFIG_WLAN is not set
CONFIG_INPUT_MOUSEDEV=m
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_KEYBOARD_ATKBD is not set
CONFIG_KEYBOARD_TCA6416=y
CONFIG_KEYBOARD_SH_KEYSC=y
# CONFIG_INPUT_MOUSE is not set
CONFIG_INPUT_TOUCHSCREEN=y
CONFIG_TOUCHSCREEN_TSC2007=y
CONFIG_TOUCHSCREEN_ST1232=y
CONFIG_INPUT_MISC=y
CONFIG_INPUT_ADXL34X=m
# CONFIG_SERIO is not set
# CONFIG_LEGACY_PTYS is not set
CONFIG_SERIAL_SH_SCI=y
CONFIG_SERIAL_SH_SCI_NR_UARTS=8
CONFIG_SERIAL_SH_SCI_CONSOLE=y
CONFIG_SERIAL_SH_SCI_DMA=y
CONFIG_I2C=y
CONFIG_I2C_CHARDEV=y
CONFIG_I2C_SH_MOBILE=y
CONFIG_SPI=y
CONFIG_SPI_GPIO=m
CONFIG_GPIO_SYSFS=y
CONFIG_POWER_SUPPLY=y
# CONFIG_HWMON is not set
CONFIG_SSB=m
CONFIG_SSB_SDIOHOST=y
CONFIG_REGULATOR=y
CONFIG_REGULATOR_DUMMY=y
CONFIG_MEDIA_SUPPORT=m
CONFIG_MEDIA_CAMERA_SUPPORT=y
CONFIG_MEDIA_CONTROLLER=y
CONFIG_VIDEO_V4L2_SUBDEV_API=y
CONFIG_VIDEO_ADV_DEBUG=y
CONFIG_VIDEO_OV7670=m
CONFIG_VIDEO_VIVI=m
CONFIG_V4L_PLATFORM_DRIVERS=y
CONFIG_VIDEO_SH_VOU=m
CONFIG_SOC_CAMERA=m
CONFIG_SOC_CAMERA_IMX074=m
CONFIG_SOC_CAMERA_MT9M111=m
CONFIG_SOC_CAMERA_MT9T112=m
CONFIG_SOC_CAMERA_MT9V022=m
CONFIG_SOC_CAMERA_PLATFORM=m
CONFIG_SOC_CAMERA_OV5642=m
CONFIG_VIDEO_SH_MOBILE_CSI2=m
CONFIG_VIDEO_SH_MOBILE_CEU=m
CONFIG_V4L_MEM2MEM_DRIVERS=y
CONFIG_VIDEO_MEM2MEM_TESTDEV=m
CONFIG_FB=y
CONFIG_FB_SH_MOBILE_LCDC=y
# CONFIG_LCD_CLASS_DEVICE is not set
# CONFIG_BACKLIGHT_GENERIC is not set
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_SOUND=m
CONFIG_SND=m
# CONFIG_SND_SUPPORT_OLD_API is not set
# CONFIG_SND_DRIVERS is not set
# CONFIG_SND_ARM is not set
# CONFIG_SND_SPI is not set
CONFIG_SND_SOC=m
CONFIG_SND_SOC_SH4_FSI=m
# CONFIG_USB_SUPPORT is not set
CONFIG_MMC=m
CONFIG_MMC_CLKGATE=y
CONFIG_MMC_SDHI=m
CONFIG_MMC_SH_MMCIF=m
CONFIG_RTC_CLASS=y
# CONFIG_RTC_HCTOSYS is not set
CONFIG_DMADEVICES=y
CONFIG_SH_DMAE=m
CONFIG_DMATEST=m
# CONFIG_IOMMU_SUPPORT is not set
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
# CONFIG_DNOTIFY is not set
CONFIG_FANOTIFY=y
CONFIG_VFAT_FS=m
CONFIG_TMPFS=y
# CONFIG_MISC_FILESYSTEMS is not set
CONFIG_NFS_FS=y
CONFIG_ROOT_NFS=y
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_15=m
CONFIG_PRINTK_TIME=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_LOCKUP_DETECTOR=y
# CONFIG_SCHED_DEBUG is not set
CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_FREE=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
# CONFIG_FTRACE is not set
# CONFIG_ARM_UNWIND is not set
CONFIG_DEBUG_USER=y
CONFIG_CRYPTO=y
CONFIG_CRYPTO_CBC=m
CONFIG_CRYPTO_ECB=m
CONFIG_CRYPTO_MD5=m
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_DES=m

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-12-11 16:07         ` Guennadi Liakhovetski
@ 2012-12-11 16:33           ` Will Deacon
  2012-12-11 16:38             ` Will Deacon
  0 siblings, 1 reply; 39+ messages in thread
From: Will Deacon @ 2012-12-11 16:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 11, 2012 at 04:07:56PM +0000, Guennadi Liakhovetski wrote:
> Hi all
> 
> On Thu, 20 Sep 2012, Dave Martin wrote:
> 
> > On Thu, Sep 20, 2012 at 11:25:14AM +0100, Lorenzo Pieralisi wrote:
> > > On Wed, Sep 19, 2012 at 02:46:58PM +0100, Dave Martin wrote:
> > > > On Tue, Sep 18, 2012 at 05:35:33PM +0100, Lorenzo Pieralisi wrote:
> > > > > In processors like A15/A7 L2 cache is unified and integrated within the
> > > > > processor cache hierarchy, so that it is not considered an outer cache
> > > > > anymore. For processors like A15/A7 flush_cache_all() ends up cleaning
> > > > > all cache levels up to Level of Coherency (LoC) that includes
> > > > > the L2 unified cache.
> > > > > 
> > > > > When a single CPU is suspended (CPU idle) a complete L2 clean is not
> > > > > required, so generic cpu_suspend code must clean the data cache using the
> > > > > newly introduced cache LoUIS function.
> 
> Git bisect identified this patch, in the mainline as
> 
> commit dbee0c6fb4c1269b2dfc8b0b7a29907ea7fed560
> Author: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Date:   Fri Sep 7 11:06:57 2012 +0530
> 
>     ARM: kernel: update cpu_suspend code to use cache LoUIS operations
> 
> as the culprit of the broken wake up from STR on mackerel, based on an 
> sh7372 A8 SoC. .config attached.

My guess is that because Cortex-A8 does not implement the MP extensions,
the LoUIS field of the CLIDR reads as zero, and the cache isn't flushed at
all (I can see an early exit in v7_flush_dcache_louis).

Lorenzo -- how is this supposed to work for uniprocessor CPUs?

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-12-11 16:33           ` Will Deacon
@ 2012-12-11 16:38             ` Will Deacon
  2012-12-11 17:07               ` Guennadi Liakhovetski
                                 ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Will Deacon @ 2012-12-11 16:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 11, 2012 at 04:33:13PM +0000, Will Deacon wrote:
> On Tue, Dec 11, 2012 at 04:07:56PM +0000, Guennadi Liakhovetski wrote:
> > Git bisect identified this patch, in the mainline as
> > 
> > commit dbee0c6fb4c1269b2dfc8b0b7a29907ea7fed560
> > Author: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > Date:   Fri Sep 7 11:06:57 2012 +0530
> > 
> >     ARM: kernel: update cpu_suspend code to use cache LoUIS operations
> > 
> > as the culprit of the broken wake up from STR on mackerel, based on an 
> > sh7372 A8 SoC. .config attached.
> 
> My guess is that because Cortex-A8 does not implement the MP extensions,
> the LoUIS field of the CLIDR reads as zero, and the cache isn't flushed at
> all (I can see an early exit in v7_flush_dcache_louis).
> 
> Lorenzo -- how is this supposed to work for uniprocessor CPUs?

Bah, forgot to ask you if the following patch helps...

Will

--->8

diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index cd95664..f58248f 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -44,7 +44,8 @@ ENDPROC(v7_flush_icache_all)
 ENTRY(v7_flush_dcache_louis)
        dmb                                     @ ensure ordering with previous memory accesses
        mrc     p15, 1, r0, c0, c0, 1           @ read clidr, r0 = clidr
-       ands    r3, r0, #0xe00000               @ extract LoUIS from clidr
+       ALT_SMP(ands    r3, r0, #(7 << 21))     @ extract LoUIS from clidr
+       ALT_UP(ands     r3, r0, #(7 << 27))     @ extract LoUU from clidr
        mov     r3, r3, lsr #20                 @ r3 = LoUIS * 2
        moveq   pc, lr                          @ return if level == 0
        mov     r10, #0                         @ r10 (starting level) = 0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-12-11 16:38             ` Will Deacon
@ 2012-12-11 17:07               ` Guennadi Liakhovetski
  2012-12-11 17:47                 ` Will Deacon
  2012-12-11 17:55               ` Guennadi Liakhovetski
  2012-12-11 23:27               ` Stephen Boyd
  2 siblings, 1 reply; 39+ messages in thread
From: Guennadi Liakhovetski @ 2012-12-11 17:07 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will

On Tue, 11 Dec 2012, Will Deacon wrote:

> On Tue, Dec 11, 2012 at 04:33:13PM +0000, Will Deacon wrote:
> > On Tue, Dec 11, 2012 at 04:07:56PM +0000, Guennadi Liakhovetski wrote:
> > > Git bisect identified this patch, in the mainline as
> > > 
> > > commit dbee0c6fb4c1269b2dfc8b0b7a29907ea7fed560
> > > Author: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > > Date:   Fri Sep 7 11:06:57 2012 +0530
> > > 
> > >     ARM: kernel: update cpu_suspend code to use cache LoUIS operations
> > > 
> > > as the culprit of the broken wake up from STR on mackerel, based on an 
> > > sh7372 A8 SoC. .config attached.
> > 
> > My guess is that because Cortex-A8 does not implement the MP extensions,
> > the LoUIS field of the CLIDR reads as zero, and the cache isn't flushed at
> > all (I can see an early exit in v7_flush_dcache_louis).
> > 
> > Lorenzo -- how is this supposed to work for uniprocessor CPUs?
> 
> Bah, forgot to ask you if the following patch helps...

Yes, it does.

Thanks
Guennadi

> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> index cd95664..f58248f 100644
> --- a/arch/arm/mm/cache-v7.S
> +++ b/arch/arm/mm/cache-v7.S
> @@ -44,7 +44,8 @@ ENDPROC(v7_flush_icache_all)
>  ENTRY(v7_flush_dcache_louis)
>         dmb                                     @ ensure ordering with previous memory accesses
>         mrc     p15, 1, r0, c0, c0, 1           @ read clidr, r0 = clidr
> -       ands    r3, r0, #0xe00000               @ extract LoUIS from clidr
> +       ALT_SMP(ands    r3, r0, #(7 << 21))     @ extract LoUIS from clidr
> +       ALT_UP(ands     r3, r0, #(7 << 27))     @ extract LoUU from clidr
>         mov     r3, r3, lsr #20                 @ r3 = LoUIS * 2
>         moveq   pc, lr                          @ return if level == 0
>         mov     r10, #0                         @ r10 (starting level) = 0
> 

---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-12-11 17:07               ` Guennadi Liakhovetski
@ 2012-12-11 17:47                 ` Will Deacon
  0 siblings, 0 replies; 39+ messages in thread
From: Will Deacon @ 2012-12-11 17:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 11, 2012 at 05:07:35PM +0000, Guennadi Liakhovetski wrote:
> Hi Will
> 
> On Tue, 11 Dec 2012, Will Deacon wrote:
> 
> > On Tue, Dec 11, 2012 at 04:33:13PM +0000, Will Deacon wrote:
> > > On Tue, Dec 11, 2012 at 04:07:56PM +0000, Guennadi Liakhovetski wrote:
> > > > Git bisect identified this patch, in the mainline as
> > > > 
> > > > commit dbee0c6fb4c1269b2dfc8b0b7a29907ea7fed560
> > > > Author: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > > > Date:   Fri Sep 7 11:06:57 2012 +0530
> > > > 
> > > >     ARM: kernel: update cpu_suspend code to use cache LoUIS operations
> > > > 
> > > > as the culprit of the broken wake up from STR on mackerel, based on an 
> > > > sh7372 A8 SoC. .config attached.
> > > 
> > > My guess is that because Cortex-A8 does not implement the MP extensions,
> > > the LoUIS field of the CLIDR reads as zero, and the cache isn't flushed at
> > > all (I can see an early exit in v7_flush_dcache_louis).
> > > 
> > > Lorenzo -- how is this supposed to work for uniprocessor CPUs?
> > 
> > Bah, forgot to ask you if the following patch helps...
> 
> Yes, it does.

Cracking, can I add you tested-by please?

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-12-11 16:38             ` Will Deacon
  2012-12-11 17:07               ` Guennadi Liakhovetski
@ 2012-12-11 17:55               ` Guennadi Liakhovetski
  2012-12-11 23:27               ` Stephen Boyd
  2 siblings, 0 replies; 39+ messages in thread
From: Guennadi Liakhovetski @ 2012-12-11 17:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 11 Dec 2012, Will Deacon wrote:

> On Tue, Dec 11, 2012 at 04:33:13PM +0000, Will Deacon wrote:
> > On Tue, Dec 11, 2012 at 04:07:56PM +0000, Guennadi Liakhovetski wrote:
> > > Git bisect identified this patch, in the mainline as
> > > 
> > > commit dbee0c6fb4c1269b2dfc8b0b7a29907ea7fed560
> > > Author: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > > Date:   Fri Sep 7 11:06:57 2012 +0530
> > > 
> > >     ARM: kernel: update cpu_suspend code to use cache LoUIS operations
> > > 
> > > as the culprit of the broken wake up from STR on mackerel, based on an 
> > > sh7372 A8 SoC. .config attached.
> > 
> > My guess is that because Cortex-A8 does not implement the MP extensions,
> > the LoUIS field of the CLIDR reads as zero, and the cache isn't flushed at
> > all (I can see an early exit in v7_flush_dcache_louis).
> > 
> > Lorenzo -- how is this supposed to work for uniprocessor CPUs?
> 
> Bah, forgot to ask you if the following patch helps...
> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> index cd95664..f58248f 100644
> --- a/arch/arm/mm/cache-v7.S
> +++ b/arch/arm/mm/cache-v7.S
> @@ -44,7 +44,8 @@ ENDPROC(v7_flush_icache_all)
>  ENTRY(v7_flush_dcache_louis)
>         dmb                                     @ ensure ordering with previous memory accesses
>         mrc     p15, 1, r0, c0, c0, 1           @ read clidr, r0 = clidr
> -       ands    r3, r0, #0xe00000               @ extract LoUIS from clidr
> +       ALT_SMP(ands    r3, r0, #(7 << 21))     @ extract LoUIS from clidr
> +       ALT_UP(ands     r3, r0, #(7 << 27))     @ extract LoUU from clidr
>         mov     r3, r3, lsr #20                 @ r3 = LoUIS * 2
>         moveq   pc, lr                          @ return if level == 0
>         mov     r10, #0                         @ r10 (starting level) = 0

[... later]

> > Yes, it does.
>
> Cracking, can I add you tested-by please?
 
Sure:
 
Tested-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
  
Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-12-11 16:38             ` Will Deacon
  2012-12-11 17:07               ` Guennadi Liakhovetski
  2012-12-11 17:55               ` Guennadi Liakhovetski
@ 2012-12-11 23:27               ` Stephen Boyd
  2012-12-12 10:31                 ` Will Deacon
  2012-12-12 10:33                 ` Lorenzo Pieralisi
  2 siblings, 2 replies; 39+ messages in thread
From: Stephen Boyd @ 2012-12-11 23:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/11/12 08:38, Will Deacon wrote:
> diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> index cd95664..f58248f 100644
> --- a/arch/arm/mm/cache-v7.S
> +++ b/arch/arm/mm/cache-v7.S
> @@ -44,7 +44,8 @@ ENDPROC(v7_flush_icache_all)
>  ENTRY(v7_flush_dcache_louis)
>         dmb                                     @ ensure ordering with previous memory accesses
>         mrc     p15, 1, r0, c0, c0, 1           @ read clidr, r0 = clidr
> -       ands    r3, r0, #0xe00000               @ extract LoUIS from clidr
> +       ALT_SMP(ands    r3, r0, #(7 << 21))     @ extract LoUIS from clidr
> +       ALT_UP(ands     r3, r0, #(7 << 27))     @ extract LoUU from clidr
>         mov     r3, r3, lsr #20                 @ r3 = LoUIS * 2

You need to fix this mov as well, right?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-12-11 23:27               ` Stephen Boyd
@ 2012-12-12 10:31                 ` Will Deacon
  2012-12-12 16:43                   ` Guennadi Liakhovetski
  2012-12-12 10:33                 ` Lorenzo Pieralisi
  1 sibling, 1 reply; 39+ messages in thread
From: Will Deacon @ 2012-12-12 10:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 11, 2012 at 11:27:39PM +0000, Stephen Boyd wrote:
> On 12/11/12 08:38, Will Deacon wrote:
> > diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> > index cd95664..f58248f 100644
> > --- a/arch/arm/mm/cache-v7.S
> > +++ b/arch/arm/mm/cache-v7.S
> > @@ -44,7 +44,8 @@ ENDPROC(v7_flush_icache_all)
> >  ENTRY(v7_flush_dcache_louis)
> >         dmb                                     @ ensure ordering with previous memory accesses
> >         mrc     p15, 1, r0, c0, c0, 1           @ read clidr, r0 = clidr
> > -       ands    r3, r0, #0xe00000               @ extract LoUIS from clidr
> > +       ALT_SMP(ands    r3, r0, #(7 << 21))     @ extract LoUIS from clidr
> > +       ALT_UP(ands     r3, r0, #(7 << 27))     @ extract LoUU from clidr
> >         mov     r3, r3, lsr #20                 @ r3 = LoUIS * 2
> 
> You need to fix this mov as well, right?

Ha, nice catch. So the original patch ended up with a ridiculously high
level number and would've flushed L2, hence we will need to retest with the
fix below...

Will

--->8

diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index cd95664..7539ec2 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -44,8 +44,10 @@ ENDPROC(v7_flush_icache_all)
 ENTRY(v7_flush_dcache_louis)
        dmb                                     @ ensure ordering with previous memory accesses
        mrc     p15, 1, r0, c0, c0, 1           @ read clidr, r0 = clidr
-       ands    r3, r0, #0xe00000               @ extract LoUIS from clidr
-       mov     r3, r3, lsr #20                 @ r3 = LoUIS * 2
+       ALT_SMP(ands    r3, r0, #(7 << 21))     @ extract LoUIS from clidr
+       ALT_UP(ands     r3, r0, #(7 << 27))     @ extract LoUU from clidr
+       ALT_SMP(mov     r3, r3, lsr #20)        @ r3 = LoUIS * 2
+       ALT_UP(mov      r3, r3, lsr #26)        @ r3 = LoUU * 2
        moveq   pc, lr                          @ return if level == 0
        mov     r10, #0                         @ r10 (starting level) = 0
        b       flush_levels                    @ start flushing cache levels

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-12-11 23:27               ` Stephen Boyd
  2012-12-12 10:31                 ` Will Deacon
@ 2012-12-12 10:33                 ` Lorenzo Pieralisi
  2012-12-12 13:36                   ` Will Deacon
  2012-12-12 16:43                   ` Guennadi Liakhovetski
  1 sibling, 2 replies; 39+ messages in thread
From: Lorenzo Pieralisi @ 2012-12-12 10:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 11, 2012 at 11:27:39PM +0000, Stephen Boyd wrote:
> On 12/11/12 08:38, Will Deacon wrote:
> > diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> > index cd95664..f58248f 100644
> > --- a/arch/arm/mm/cache-v7.S
> > +++ b/arch/arm/mm/cache-v7.S
> > @@ -44,7 +44,8 @@ ENDPROC(v7_flush_icache_all)
> >  ENTRY(v7_flush_dcache_louis)
> >         dmb                                     @ ensure ordering with previous memory accesses
> >         mrc     p15, 1, r0, c0, c0, 1           @ read clidr, r0 = clidr
> > -       ands    r3, r0, #0xe00000               @ extract LoUIS from clidr
> > +       ALT_SMP(ands    r3, r0, #(7 << 21))     @ extract LoUIS from clidr
> > +       ALT_UP(ands     r3, r0, #(7 << 27))     @ extract LoUU from clidr
> >         mov     r3, r3, lsr #20                 @ r3 = LoUIS * 2
> 
> You need to fix this mov as well, right?

And after doing that I think the suspend finisher will still have
to call flush_cache_all() since LoUU == 1 on A8, L2 is not cleaned
and that's probably what we want if it can be retained.

What about this (compile tested) ?

Lorenzo

--->8

diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index cd95664..036f80f 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -44,8 +44,9 @@ ENDPROC(v7_flush_icache_all)
 ENTRY(v7_flush_dcache_louis)
 	dmb					@ ensure ordering with previous memory accesses
 	mrc	p15, 1, r0, c0, c0, 1		@ read clidr, r0 = clidr
-	ands	r3, r0, #0xe00000		@ extract LoUIS from clidr
-	mov	r3, r3, lsr #20			@ r3 = LoUIS * 2
+	ALT_SMP(lsr	r3, r0, #20)		@ r3 = clidr[31:20]
+	ALT_UP(lsr	r3, r0, #26)		@ r3 = clidr[31:26]
+	ands	r3, r3, #0xe			@ r3 = LoUIS/LoUU * 2
 	moveq	pc, lr				@ return if level == 0
 	mov	r10, #0				@ r10 (starting level) = 0
 	b	flush_levels			@ start flushing cache levels

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-12-12 10:33                 ` Lorenzo Pieralisi
@ 2012-12-12 13:36                   ` Will Deacon
  2012-12-13  8:09                     ` Guennadi Liakhovetski
  2012-12-12 16:43                   ` Guennadi Liakhovetski
  1 sibling, 1 reply; 39+ messages in thread
From: Will Deacon @ 2012-12-12 13:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 12, 2012 at 10:33:38AM +0000, Lorenzo Pieralisi wrote:
> On Tue, Dec 11, 2012 at 11:27:39PM +0000, Stephen Boyd wrote:
> > On 12/11/12 08:38, Will Deacon wrote:
> > > diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> > > index cd95664..f58248f 100644
> > > --- a/arch/arm/mm/cache-v7.S
> > > +++ b/arch/arm/mm/cache-v7.S
> > > @@ -44,7 +44,8 @@ ENDPROC(v7_flush_icache_all)
> > >  ENTRY(v7_flush_dcache_louis)
> > >         dmb                                     @ ensure ordering with previous memory accesses
> > >         mrc     p15, 1, r0, c0, c0, 1           @ read clidr, r0 = clidr
> > > -       ands    r3, r0, #0xe00000               @ extract LoUIS from clidr
> > > +       ALT_SMP(ands    r3, r0, #(7 << 21))     @ extract LoUIS from clidr
> > > +       ALT_UP(ands     r3, r0, #(7 << 27))     @ extract LoUU from clidr
> > >         mov     r3, r3, lsr #20                 @ r3 = LoUIS * 2
> > 
> > You need to fix this mov as well, right?
> 
> And after doing that I think the suspend finisher will still have
> to call flush_cache_all() since LoUU == 1 on A8, L2 is not cleaned
> and that's probably what we want if it can be retained.

At some point we probably want to describe the level of flushing required in
the device tree as a property of the CPU node (or something similar). That
would allow us to have *one* function for flushing,
e.g. cpu_suspend_flush_cache which flushes to the appropriate level. Then
we could remove the louis flush from the CPU suspend code and instead make
it the finisher's responsibility to call our flushing function when it's
done, which helps to avoid over/under-flushing the cache.

In the meantime, fixing louis as we've suggested should work.

Back to the case in hand.... Lorenzo just pointed out to me that the
finished in question (sh7372_do_idle_sysc) calls v7_flush_dcache_all, so
the louis stuff should be irrelevant. The problem may actually be that the
finisher disables the L2 cache prior to cleaning/invalidating it, which is
the opposite order to that described by the A8 TRM.

Guennadi -- can you try moving the kernel_flush call before the L2 disable
in sh7372_do_idle_sysc please?

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-12-12 10:33                 ` Lorenzo Pieralisi
  2012-12-12 13:36                   ` Will Deacon
@ 2012-12-12 16:43                   ` Guennadi Liakhovetski
  1 sibling, 0 replies; 39+ messages in thread
From: Guennadi Liakhovetski @ 2012-12-12 16:43 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Lorenzo

On Wed, 12 Dec 2012, Lorenzo Pieralisi wrote:

> On Tue, Dec 11, 2012 at 11:27:39PM +0000, Stephen Boyd wrote:
> > On 12/11/12 08:38, Will Deacon wrote:
> > > diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> > > index cd95664..f58248f 100644
> > > --- a/arch/arm/mm/cache-v7.S
> > > +++ b/arch/arm/mm/cache-v7.S
> > > @@ -44,7 +44,8 @@ ENDPROC(v7_flush_icache_all)
> > >  ENTRY(v7_flush_dcache_louis)
> > >         dmb                                     @ ensure ordering with previous memory accesses
> > >         mrc     p15, 1, r0, c0, c0, 1           @ read clidr, r0 = clidr
> > > -       ands    r3, r0, #0xe00000               @ extract LoUIS from clidr
> > > +       ALT_SMP(ands    r3, r0, #(7 << 21))     @ extract LoUIS from clidr
> > > +       ALT_UP(ands     r3, r0, #(7 << 27))     @ extract LoUU from clidr
> > >         mov     r3, r3, lsr #20                 @ r3 = LoUIS * 2
> > 
> > You need to fix this mov as well, right?
> 
> And after doing that I think the suspend finisher will still have
> to call flush_cache_all() since LoUU == 1 on A8, L2 is not cleaned
> and that's probably what we want if it can be retained.
> 
> What about this (compile tested) ?

Works too.

Thanks
Guennadi

> 
> Lorenzo
> 
> --->8
> 
> diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> index cd95664..036f80f 100644
> --- a/arch/arm/mm/cache-v7.S
> +++ b/arch/arm/mm/cache-v7.S
> @@ -44,8 +44,9 @@ ENDPROC(v7_flush_icache_all)
>  ENTRY(v7_flush_dcache_louis)
>  	dmb					@ ensure ordering with previous memory accesses
>  	mrc	p15, 1, r0, c0, c0, 1		@ read clidr, r0 = clidr
> -	ands	r3, r0, #0xe00000		@ extract LoUIS from clidr
> -	mov	r3, r3, lsr #20			@ r3 = LoUIS * 2
> +	ALT_SMP(lsr	r3, r0, #20)		@ r3 = clidr[31:20]
> +	ALT_UP(lsr	r3, r0, #26)		@ r3 = clidr[31:26]
> +	ands	r3, r3, #0xe			@ r3 = LoUIS/LoUU * 2
>  	moveq	pc, lr				@ return if level == 0
>  	mov	r10, #0				@ r10 (starting level) = 0
>  	b	flush_levels			@ start flushing cache levels
> 

---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-12-12 10:31                 ` Will Deacon
@ 2012-12-12 16:43                   ` Guennadi Liakhovetski
  0 siblings, 0 replies; 39+ messages in thread
From: Guennadi Liakhovetski @ 2012-12-12 16:43 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will

On Wed, 12 Dec 2012, Will Deacon wrote:

> On Tue, Dec 11, 2012 at 11:27:39PM +0000, Stephen Boyd wrote:
> > On 12/11/12 08:38, Will Deacon wrote:
> > > diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> > > index cd95664..f58248f 100644
> > > --- a/arch/arm/mm/cache-v7.S
> > > +++ b/arch/arm/mm/cache-v7.S
> > > @@ -44,7 +44,8 @@ ENDPROC(v7_flush_icache_all)
> > >  ENTRY(v7_flush_dcache_louis)
> > >         dmb                                     @ ensure ordering with previous memory accesses
> > >         mrc     p15, 1, r0, c0, c0, 1           @ read clidr, r0 = clidr
> > > -       ands    r3, r0, #0xe00000               @ extract LoUIS from clidr
> > > +       ALT_SMP(ands    r3, r0, #(7 << 21))     @ extract LoUIS from clidr
> > > +       ALT_UP(ands     r3, r0, #(7 << 27))     @ extract LoUU from clidr
> > >         mov     r3, r3, lsr #20                 @ r3 = LoUIS * 2
> > 
> > You need to fix this mov as well, right?
> 
> Ha, nice catch. So the original patch ended up with a ridiculously high
> level number and would've flushed L2, hence we will need to retest with the
> fix below...

Had to apply manually, but it worked too.

Thanks
Guennadi

> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> index cd95664..7539ec2 100644
> --- a/arch/arm/mm/cache-v7.S
> +++ b/arch/arm/mm/cache-v7.S
> @@ -44,8 +44,10 @@ ENDPROC(v7_flush_icache_all)
>  ENTRY(v7_flush_dcache_louis)
>         dmb                                     @ ensure ordering with previous memory accesses
>         mrc     p15, 1, r0, c0, c0, 1           @ read clidr, r0 = clidr
> -       ands    r3, r0, #0xe00000               @ extract LoUIS from clidr
> -       mov     r3, r3, lsr #20                 @ r3 = LoUIS * 2
> +       ALT_SMP(ands    r3, r0, #(7 << 21))     @ extract LoUIS from clidr
> +       ALT_UP(ands     r3, r0, #(7 << 27))     @ extract LoUU from clidr
> +       ALT_SMP(mov     r3, r3, lsr #20)        @ r3 = LoUIS * 2
> +       ALT_UP(mov      r3, r3, lsr #26)        @ r3 = LoUU * 2
>         moveq   pc, lr                          @ return if level == 0
>         mov     r10, #0                         @ r10 (starting level) = 0
>         b       flush_levels                    @ start flushing cache levels
> 

---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-12-12 13:36                   ` Will Deacon
@ 2012-12-13  8:09                     ` Guennadi Liakhovetski
  2012-12-13 10:51                       ` Will Deacon
  0 siblings, 1 reply; 39+ messages in thread
From: Guennadi Liakhovetski @ 2012-12-13  8:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 12 Dec 2012, Will Deacon wrote:

> On Wed, Dec 12, 2012 at 10:33:38AM +0000, Lorenzo Pieralisi wrote:
> > On Tue, Dec 11, 2012 at 11:27:39PM +0000, Stephen Boyd wrote:
> > > On 12/11/12 08:38, Will Deacon wrote:
> > > > diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
> > > > index cd95664..f58248f 100644
> > > > --- a/arch/arm/mm/cache-v7.S
> > > > +++ b/arch/arm/mm/cache-v7.S
> > > > @@ -44,7 +44,8 @@ ENDPROC(v7_flush_icache_all)
> > > >  ENTRY(v7_flush_dcache_louis)
> > > >         dmb                                     @ ensure ordering with previous memory accesses
> > > >         mrc     p15, 1, r0, c0, c0, 1           @ read clidr, r0 = clidr
> > > > -       ands    r3, r0, #0xe00000               @ extract LoUIS from clidr
> > > > +       ALT_SMP(ands    r3, r0, #(7 << 21))     @ extract LoUIS from clidr
> > > > +       ALT_UP(ands     r3, r0, #(7 << 27))     @ extract LoUU from clidr
> > > >         mov     r3, r3, lsr #20                 @ r3 = LoUIS * 2
> > > 
> > > You need to fix this mov as well, right?
> > 
> > And after doing that I think the suspend finisher will still have
> > to call flush_cache_all() since LoUU == 1 on A8, L2 is not cleaned
> > and that's probably what we want if it can be retained.
> 
> At some point we probably want to describe the level of flushing required in
> the device tree as a property of the CPU node (or something similar). That
> would allow us to have *one* function for flushing,
> e.g. cpu_suspend_flush_cache which flushes to the appropriate level. Then
> we could remove the louis flush from the CPU suspend code and instead make
> it the finisher's responsibility to call our flushing function when it's
> done, which helps to avoid over/under-flushing the cache.
> 
> In the meantime, fixing louis as we've suggested should work.
> 
> Back to the case in hand.... Lorenzo just pointed out to me that the
> finished in question (sh7372_do_idle_sysc) calls v7_flush_dcache_all, so
> the louis stuff should be irrelevant. The problem may actually be that the
> finisher disables the L2 cache prior to cleaning/invalidating it, which is
> the opposite order to that described by the A8 TRM.
> 
> Guennadi -- can you try moving the kernel_flush call before the L2 disable
> in sh7372_do_idle_sysc please?

Yes, this works too.

Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-12-13  8:09                     ` Guennadi Liakhovetski
@ 2012-12-13 10:51                       ` Will Deacon
  2012-12-13 14:32                         ` Guennadi Liakhovetski
  0 siblings, 1 reply; 39+ messages in thread
From: Will Deacon @ 2012-12-13 10:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 13, 2012 at 08:09:33AM +0000, Guennadi Liakhovetski wrote:
> On Wed, 12 Dec 2012, Will Deacon wrote:
> > Back to the case in hand.... Lorenzo just pointed out to me that the
> > finished in question (sh7372_do_idle_sysc) calls v7_flush_dcache_all, so
> > the louis stuff should be irrelevant. The problem may actually be that the
> > finisher disables the L2 cache prior to cleaning/invalidating it, which is
> > the opposite order to that described by the A8 TRM.
> > 
> > Guennadi -- can you try moving the kernel_flush call before the L2 disable
> > in sh7372_do_idle_sysc please?
> 
> Yes, this works too.

That's good to know. Please can you send a patch for that? The sequence
currently being used by the finisher *is* buggy, and should be fixed
independently of the louis stuff.

Cheers,

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-12-13 10:51                       ` Will Deacon
@ 2012-12-13 14:32                         ` Guennadi Liakhovetski
  2012-12-13 14:39                           ` Santosh Shilimkar
  2012-12-13 14:52                           ` [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations Will Deacon
  0 siblings, 2 replies; 39+ messages in thread
From: Guennadi Liakhovetski @ 2012-12-13 14:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 13 Dec 2012, Will Deacon wrote:

> On Thu, Dec 13, 2012 at 08:09:33AM +0000, Guennadi Liakhovetski wrote:
> > On Wed, 12 Dec 2012, Will Deacon wrote:
> > > Back to the case in hand.... Lorenzo just pointed out to me that the
> > > finished in question (sh7372_do_idle_sysc) calls v7_flush_dcache_all, so
> > > the louis stuff should be irrelevant. The problem may actually be that the
> > > finisher disables the L2 cache prior to cleaning/invalidating it, which is
> > > the opposite order to that described by the A8 TRM.
> > > 
> > > Guennadi -- can you try moving the kernel_flush call before the L2 disable
> > > in sh7372_do_idle_sysc please?
> > 
> > Yes, this works too.
> 
> That's good to know. Please can you send a patch for that? The sequence
> currently being used by the finisher *is* buggy, and should be fixed
> independently of the louis stuff.

Well, the fix is yours, so, it should be "From: you." I can certainly send 
it just copying your description above, but I'd also need your Sob. 
Something like the below (feel free to improve the subject line and the 
description):

From: Will Deacon <will.deacon@arm.com>
Subject: [PATCH] ARM: sh7372: fix cache clean / invalidate order

According to the Cortex A8 TRM the L2 cache should be first cleaned and 
then disabled. Fix the swapped order on sh7372.

Signed-off-by: <you>
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
(or even just)
Tested-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>

diff --git a/arch/arm/mach-shmobile/sleep-sh7372.S b/arch/arm/mach-shmobile/sleep-sh7372.S
index 1d56467..df15d8a 100644
--- a/arch/arm/mach-shmobile/sleep-sh7372.S
+++ b/arch/arm/mach-shmobile/sleep-sh7372.S
@@ -59,16 +59,16 @@ sh7372_do_idle_sysc:
 	mcr	p15, 0, r0, c1, c0, 0
 	isb
 
-	/* disable L2 cache in the aux control register */
-	mrc     p15, 0, r10, c1, c0, 1
-	bic     r10, r10, #2
-	mcr     p15, 0, r10, c1, c0, 1
-
 	/*
 	 * Invalidate data cache again.
 	 */
 	ldr	r1, kernel_flush
 	blx	r1
+
+	/* disable L2 cache in the aux control register */
+	mrc     p15, 0, r10, c1, c0, 1
+	bic     r10, r10, #2
+	mcr     p15, 0, r10, c1, c0, 1
 	/*
 	 * The kernel doesn't interwork: v7_flush_dcache_all in particluar will
 	 * always return in Thumb state when CONFIG_THUMB2_KERNEL is enabled.

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-12-13 14:32                         ` Guennadi Liakhovetski
@ 2012-12-13 14:39                           ` Santosh Shilimkar
  2012-12-28 11:32                             ` [PATCH v2] ARM: sh7372: fix cache clean / invalidate order Guennadi Liakhovetski
  2012-12-13 14:52                           ` [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations Will Deacon
  1 sibling, 1 reply; 39+ messages in thread
From: Santosh Shilimkar @ 2012-12-13 14:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 13 December 2012 03:32 PM, Guennadi Liakhovetski wrote:
> On Thu, 13 Dec 2012, Will Deacon wrote:
>
>> On Thu, Dec 13, 2012 at 08:09:33AM +0000, Guennadi Liakhovetski wrote:
>>> On Wed, 12 Dec 2012, Will Deacon wrote:
>>>> Back to the case in hand.... Lorenzo just pointed out to me that the
>>>> finished in question (sh7372_do_idle_sysc) calls v7_flush_dcache_all, so
>>>> the louis stuff should be irrelevant. The problem may actually be that the
>>>> finisher disables the L2 cache prior to cleaning/invalidating it, which is
>>>> the opposite order to that described by the A8 TRM.
>>>>
>>>> Guennadi -- can you try moving the kernel_flush call before the L2 disable
>>>> in sh7372_do_idle_sysc please?
>>>
>>> Yes, this works too.
>>
>> That's good to know. Please can you send a patch for that? The sequence
>> currently being used by the finisher *is* buggy, and should be fixed
>> independently of the louis stuff.
>
> Well, the fix is yours, so, it should be "From: you." I can certainly send
> it just copying your description above, but I'd also need your Sob.
> Something like the below (feel free to improve the subject line and the
> description):
>
> From: Will Deacon <will.deacon@arm.com>
> Subject: [PATCH] ARM: sh7372: fix cache clean / invalidate order
>
> According to the Cortex A8 TRM the L2 cache should be first cleaned and
> then disabled. Fix the swapped order on sh7372.
>
> Signed-off-by: <you>
> Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
> (or even just)
> Tested-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
>
> diff --git a/arch/arm/mach-shmobile/sleep-sh7372.S b/arch/arm/mach-shmobile/sleep-sh7372.S
> index 1d56467..df15d8a 100644
> --- a/arch/arm/mach-shmobile/sleep-sh7372.S
> +++ b/arch/arm/mach-shmobile/sleep-sh7372.S
> @@ -59,16 +59,16 @@ sh7372_do_idle_sysc:
>   	mcr	p15, 0, r0, c1, c0, 0
>   	isb
>
> -	/* disable L2 cache in the aux control register */
> -	mrc     p15, 0, r10, c1, c0, 1
> -	bic     r10, r10, #2
> -	mcr     p15, 0, r10, c1, c0, 1
> -
>   	/*
>   	 * Invalidate data cache again.
>   	 */
kernel_flush does "Clean and Invalidate"
>   	ldr	r1, kernel_flush
>   	blx	r1
> +
> +	/* disable L2 cache in the aux control register */
> +	mrc     p15, 0, r10, c1, c0, 1
> +	bic     r10, r10, #2
> +	mcr     p15, 0, r10, c1, c0, 1
An isb will be make it safe.

Otherwise patch looks good to me.
Feel free to add my review-by tag if you need one.

Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations
  2012-12-13 14:32                         ` Guennadi Liakhovetski
  2012-12-13 14:39                           ` Santosh Shilimkar
@ 2012-12-13 14:52                           ` Will Deacon
  1 sibling, 0 replies; 39+ messages in thread
From: Will Deacon @ 2012-12-13 14:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 13, 2012 at 02:32:46PM +0000, Guennadi Liakhovetski wrote:
> On Thu, 13 Dec 2012, Will Deacon wrote:
> 
> > On Thu, Dec 13, 2012 at 08:09:33AM +0000, Guennadi Liakhovetski wrote:
> > > On Wed, 12 Dec 2012, Will Deacon wrote:
> > > > Back to the case in hand.... Lorenzo just pointed out to me that the
> > > > finished in question (sh7372_do_idle_sysc) calls v7_flush_dcache_all, so
> > > > the louis stuff should be irrelevant. The problem may actually be that the
> > > > finisher disables the L2 cache prior to cleaning/invalidating it, which is
> > > > the opposite order to that described by the A8 TRM.
> > > > 
> > > > Guennadi -- can you try moving the kernel_flush call before the L2 disable
> > > > in sh7372_do_idle_sysc please?
> > > 
> > > Yes, this works too.
> > 
> > That's good to know. Please can you send a patch for that? The sequence
> > currently being used by the finisher *is* buggy, and should be fixed
> > independently of the louis stuff.
> 
> Well, the fix is yours, so, it should be "From: you." I can certainly send 
> it just copying your description above, but I'd also need your Sob. 
> Something like the below (feel free to improve the subject line and the 
> description):

No, I didn't send any code for this so you should be the author. I can
review/possibly ack it if you like (please send a v2 addressing Santosh's
comments).

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2] ARM: sh7372: fix cache clean / invalidate order
  2012-12-13 14:39                           ` Santosh Shilimkar
@ 2012-12-28 11:32                             ` Guennadi Liakhovetski
  2012-12-28 21:50                               ` Simon Horman
  0 siblings, 1 reply; 39+ messages in thread
From: Guennadi Liakhovetski @ 2012-12-28 11:32 UTC (permalink / raw)
  To: linux-arm-kernel

According to the Cortex A8 TRM the L2 cache should be first cleaned and 
then disabled. Fix the swapped order on sh7372.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---

v2: addressed improvement suggestions by Santosh, thanks

diff --git a/arch/arm/mach-shmobile/sleep-sh7372.S b/arch/arm/mach-shmobile/sleep-sh7372.S
index 1d56467..a9df53b 100644
--- a/arch/arm/mach-shmobile/sleep-sh7372.S
+++ b/arch/arm/mach-shmobile/sleep-sh7372.S
@@ -59,17 +59,19 @@ sh7372_do_idle_sysc:
 	mcr	p15, 0, r0, c1, c0, 0
 	isb
 
+	/*
+	 * Clean and invalidate data cache again.
+	 */
+	ldr	r1, kernel_flush
+	blx	r1
+
 	/* disable L2 cache in the aux control register */
 	mrc     p15, 0, r10, c1, c0, 1
 	bic     r10, r10, #2
 	mcr     p15, 0, r10, c1, c0, 1
+	isb
 
 	/*
-	 * Invalidate data cache again.
-	 */
-	ldr	r1, kernel_flush
-	blx	r1
-	/*
 	 * The kernel doesn't interwork: v7_flush_dcache_all in particluar will
 	 * always return in Thumb state when CONFIG_THUMB2_KERNEL is enabled.
 	 * This sequence switches back to ARM.  Note that .align may insert a

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2] ARM: sh7372: fix cache clean / invalidate order
  2012-12-28 11:32                             ` [PATCH v2] ARM: sh7372: fix cache clean / invalidate order Guennadi Liakhovetski
@ 2012-12-28 21:50                               ` Simon Horman
  0 siblings, 0 replies; 39+ messages in thread
From: Simon Horman @ 2012-12-28 21:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Dec 28, 2012 at 12:32:54PM +0100, Guennadi Liakhovetski wrote:
> According to the Cortex A8 TRM the L2 cache should be first cleaned and 
> then disabled. Fix the swapped order on sh7372.
> 
> Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>

Thanks, applied to the soc branch of the renesas tree.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2012-12-28 21:50 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-18 16:35 [RFC PATCH v2 0/5] ARM: augment cache flushing API Lorenzo Pieralisi
2012-09-18 16:35 ` [RFC PATCH v2 1/5] ARM: mm: implement LoUIS API for cache maintenance ops Lorenzo Pieralisi
2012-09-18 18:12   ` Nicolas Pitre
2012-09-19 12:30     ` Lorenzo Pieralisi
2012-09-18 16:35 ` [RFC PATCH v2 2/5] ARM: mm: rename jump labels in v7_flush_dcache_all function Lorenzo Pieralisi
2012-09-18 18:13   ` Nicolas Pitre
2012-09-19 13:51   ` Dave Martin
2012-09-20 10:32     ` Lorenzo Pieralisi
2012-09-20 11:01       ` Dave Martin
2012-09-18 16:35 ` [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations Lorenzo Pieralisi
2012-09-18 18:18   ` Nicolas Pitre
2012-09-19 13:46   ` Dave Martin
2012-09-20 10:25     ` Lorenzo Pieralisi
2012-09-20 11:04       ` Dave Martin
2012-12-11 16:07         ` Guennadi Liakhovetski
2012-12-11 16:33           ` Will Deacon
2012-12-11 16:38             ` Will Deacon
2012-12-11 17:07               ` Guennadi Liakhovetski
2012-12-11 17:47                 ` Will Deacon
2012-12-11 17:55               ` Guennadi Liakhovetski
2012-12-11 23:27               ` Stephen Boyd
2012-12-12 10:31                 ` Will Deacon
2012-12-12 16:43                   ` Guennadi Liakhovetski
2012-12-12 10:33                 ` Lorenzo Pieralisi
2012-12-12 13:36                   ` Will Deacon
2012-12-13  8:09                     ` Guennadi Liakhovetski
2012-12-13 10:51                       ` Will Deacon
2012-12-13 14:32                         ` Guennadi Liakhovetski
2012-12-13 14:39                           ` Santosh Shilimkar
2012-12-28 11:32                             ` [PATCH v2] ARM: sh7372: fix cache clean / invalidate order Guennadi Liakhovetski
2012-12-28 21:50                               ` Simon Horman
2012-12-13 14:52                           ` [RFC PATCH v2 3/5] ARM: kernel: update cpu_suspend code to use cache LoUIS operations Will Deacon
2012-12-12 16:43                   ` Guennadi Liakhovetski
2012-09-18 16:35 ` [RFC PATCH v2 4/5] ARM: kernel: update __cpu_disable to use cache LoUIS maintenance API Lorenzo Pieralisi
2012-09-18 18:19   ` Nicolas Pitre
2012-09-18 16:35 ` [RFC PATCH v2 5/5] ARM: mm: update __v7_setup() to the new LoUIS cache " Lorenzo Pieralisi
2012-09-18 18:20   ` Nicolas Pitre
2012-09-20 11:27 ` [RFC PATCH v2 0/5] ARM: augment cache flushing API Lorenzo Pieralisi
2012-09-21  8:07   ` Shawn Guo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).