[PATCH RFCv2 0/5] ARM core support for hardware I/O coherency in non-SMP platforms

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFCv2 0/5] ARM core support for hardware I/O coherency in non-SMP platforms
@ 2014-05-20 15:35 Thomas Petazzoni
  2014-05-20 15:35 ` [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+ Thomas Petazzoni
                   ` (5 more replies)
  0 siblings, 6 replies; 29+ messages in thread
From: Thomas Petazzoni @ 2014-05-20 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

Several of the latest Marvell Armada EBU platforms have a special
feature that is fairly uncommon on ARM platforms: hardware I/O
coherency. This feature allows to remove all cache maintenance
operations that are normally needed when doing DMA transfers (only an
I/O sync barrier is needed for DMA transfers from device).

This hardware I/O coherency mechanism needs a set of ARM core
requirements to operate properly:

 1 On Armada 370 (a single core processor)

   a The cache policy of pages must be set to "write allocate".

 2 On Armada XP (which has dual core and quad core variants)

   a The cache policy of pages must be set to "write allocate".
   b The SMP bit in the Auxiliary Functional Modes Control Register 0
     must be set (remember the CPU core is PJ4B)
   c The pages must be set as shareable.

 3 On Armada 375/38x (which have single core and dual core variants)

   a The cache policy of pages must be set to "write allocate".
   b The SMP and TLB broadcast bits must be set in the Auxiliary
     Control Register (the core is a Cortex-A9)
   c The pages must be set as shareable.
   d The SCU must be enabled

All of these requirements are met when the kernel is configured with
CONFIG_SMP *and* when the processor is actually a multiple core
processors (otherwise, due to the current CONFIG_SMP_ON_UP logic,
is_smp() returns false, and most of the requirements above are not
met).

For example, as of today, Armada 370 is broken because even if we
build the kernel CONFIG_SMP, the pages do not have the "write
allocate" attribute, because Armada 370 is single core, therefore the
CONFIG_SMP_ON_UP logic decides that we're not on an SMP system, and
therefore is_smp() returns false. Therefore, the fact that hardware
I/O coherency is enabled on Armada 370 today is incorrect, and there
is no way to fix that without making changes to the ARM core code.

Similarly, the Armada 380 (single core variant of the Armada 385) will
have the same problem: it's a single core processor, but that requires
several characteristics of an SMP configuration to properly use
hardware I/O coherency.

Also:

 - Not being able to use hardware I/O coherency on Armada 370 and
   Armada 380 is not really an option, as it is a major feature of
   those SoCs.

 - Having to enable CONFIG_SMP is also not very nice, as it comes with
   other performance penalties that could be avoided by using a
   !CONFIG_SMP configuration on those single core systems. And again,
   enabling CONFIG_SMP does *not* work for single core processors due
   to CONFIG_SMP_ON_UP.

Therefore, this RFC patch series proposes one solution to allow these
requirements to be met in the situations we are interested.

Even though the patch series is at RFC stage, I'm interested in
collecting ACKs even for certain patches only (like PATCH 1 and PATCH
2, which are probably less complicated than PATCH 3). This way, the
problem could be solved progressively, and the size of the patch
series reduced.

Basically, the patch series goes like this:

 * PATCH 1 makes write allocate the default cache policy for ARMv6+
   system. This was suggested by Catalin Marinas and Rob Herring. This
   patch provides the requirements 1.a, 2.a and 3.a detailed previously.

 * PATCH 2 allows the SCU to be enabled even in !CONFIG_SMP
   configurations. This patch provides the requirement 3.d detailed
   previously.

 * PATCH 3 makes CONFIG_SMP_ON_UP no longer dependent on CONFIG_SMP,
   which allows to enable SMP-related features (SMP bit, TLB broadcast
   bit, shareable pages) even when the platform uses a single core but
   advertises MP features in its MPIDR register. Provided
   CONFIG_SMP_ON_UP is enabled, this allows !CONFIG_SMP configurations
   to use hardware I/O coherency. This patch provides the requirements
   2.b, 2.c, 3.b and 3.c detailed previously.

 * PATCH 4 and 5 are simple additions to the mvebu coherency code.

Changes since v1:

 * Completely move away from the machine_desc flags approach, and
   instead try to find a global solution:
    - Make write allocate the default cache policy on all ARMv6+
      systems.
    - Allow CONFIG_SMP_ON_UP to be used on !CONFIG_SMP configurations,
      in order to enable MP related features in !CONFIG_SMP
      situations.

Thanks,

Thomas

Thomas Petazzoni (5):
  ARM: use write allocate by default on ARMv6+
  ARM: kernel: allow the SCU to be enabled even on !SMP
  ARM: allow CONFIG_SMP_ON_UP on non-SMP configurations
  ARM: mvebu: enable coherency only when possible on Armada XP
  ARM: mvebu: add debugging message indicating the status of I/O
    coherency

 arch/arm/Kconfig                 |  2 +-
 arch/arm/include/asm/assembler.h |  2 +-
 arch/arm/include/asm/smp_plat.h  |  8 ++++----
 arch/arm/include/asm/smp_scu.h   |  2 +-
 arch/arm/kernel/smp_scu.c        |  2 --
 arch/arm/mach-mvebu/coherency.c  | 18 +++++++++++++++---
 arch/arm/mm/mmu.c                | 10 +++++-----
 arch/arm/mm/proc-v6.S            |  2 +-
 arch/arm/mm/proc-v7-2level.S     |  6 +++---
 arch/arm/mm/proc-v7-3level.S     |  6 +++---
 arch/arm/mm/proc-v7.S            |  4 ++--
 11 files changed, 36 insertions(+), 26 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-05-20 15:35 [PATCH RFCv2 0/5] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
@ 2014-05-20 15:35 ` Thomas Petazzoni
  2014-05-26 16:26   ` Russell King - ARM Linux
  2014-05-20 15:35 ` [PATCH RFCv2 2/5] ARM: kernel: allow the SCU to be enabled even on !SMP Thomas Petazzoni
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Thomas Petazzoni @ 2014-05-20 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

Currently, the default cache policy for ARMv6+ platform is writeback
in UP mode and write allocate in SMP mode.

Some platforms that provide hardware I/O coherency capabilities (such
as Marvell Armada 370/XP/375/38x) require the cache policy to be write
allocate even in UP mode. This is especially true on Armada 370, which
is a single core processor, so even with CONFIG_SMP=y, is_smp()
returns false and the cache policy remains defined to writeback.

As suggested by Catalin Marinas and Rob Herring, there is in fact no
good reason to keep using writeback by default on ARMv6+, and we could
switch to write allocate by default for all ARMv6+ platforms instead
of special-casing the Marvell platforms.

This patch implements this switch to use write allocate by default on
ARMv6+ platforms.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/mm/mmu.c            | 10 +++++-----
 arch/arm/mm/proc-v6.S        |  2 +-
 arch/arm/mm/proc-v7-2level.S |  6 +++---
 arch/arm/mm/proc-v7-3level.S |  6 +++---
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index b68c6b2..5e7ab39 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -57,7 +57,7 @@ pmd_t *top_pmd;
 #define CPOLICY_WRITEBACK	3
 #define CPOLICY_WRITEALLOC	4
 
-static unsigned int cachepolicy __initdata = CPOLICY_WRITEBACK;
+static unsigned int cachepolicy __initdata = CPOLICY_WRITEALLOC;
 static unsigned int ecc_mask __initdata = 0;
 pgprot_t pgprot_user;
 pgprot_t pgprot_kernel;
@@ -147,8 +147,8 @@ static int __init early_cachepolicy(char *p)
 	 * page tables.
 	 */
 	if (cpu_architecture() >= CPU_ARCH_ARMv6) {
-		printk(KERN_WARNING "Only cachepolicy=writeback supported on ARMv6 and later\n");
-		cachepolicy = CPOLICY_WRITEBACK;
+		printk(KERN_WARNING "Only cachepolicy={writeback,writealloc} supported on ARMv6 and later\n");
+		cachepolicy = CPOLICY_WRITEALLOC;
 	}
 	flush_cache_all();
 	set_cr(cr_alignment);
@@ -408,14 +408,14 @@ static void __init build_mem_type_table(void)
 		if (cachepolicy > CPOLICY_WRITETHROUGH)
 			cachepolicy = CPOLICY_WRITETHROUGH;
 #endif
+		if (cachepolicy > CPOLICY_WRITEBACK)
+			cachepolicy = CPOLICY_WRITEBACK;
 	}
 	if (cpu_arch < CPU_ARCH_ARMv5) {
 		if (cachepolicy >= CPOLICY_WRITEALLOC)
 			cachepolicy = CPOLICY_WRITEBACK;
 		ecc_mask = 0;
 	}
-	if (is_smp())
-		cachepolicy = CPOLICY_WRITEALLOC;
 
 	/*
 	 * Strip out features not present on earlier architectures.
diff --git a/arch/arm/mm/proc-v6.S b/arch/arm/mm/proc-v6.S
index 32b3558..950d4c2 100644
--- a/arch/arm/mm/proc-v6.S
+++ b/arch/arm/mm/proc-v6.S
@@ -31,7 +31,7 @@
 #define TTB_RGN_WB	(3 << 3)
 
 #define TTB_FLAGS_UP	TTB_RGN_WBWA
-#define PMD_FLAGS_UP	PMD_SECT_WB
+#define PMD_FLAGS_UP	PMD_SECT_WBWA
 #define TTB_FLAGS_SMP	TTB_RGN_WBWA|TTB_S
 #define PMD_FLAGS_SMP	PMD_SECT_WBWA|PMD_SECT_S
 
diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S
index 1f52915..1032e00 100644
--- a/arch/arm/mm/proc-v7-2level.S
+++ b/arch/arm/mm/proc-v7-2level.S
@@ -19,9 +19,9 @@
 #define TTB_IRGN_WT	((1 << 0) | (0 << 6))
 #define TTB_IRGN_WB	((1 << 0) | (1 << 6))
 
-/* PTWs cacheable, inner WB not shareable, outer WB not shareable */
-#define TTB_FLAGS_UP	TTB_IRGN_WB|TTB_RGN_OC_WB
-#define PMD_FLAGS_UP	PMD_SECT_WB
+/* PTWs cacheable, inner WBWA not shareable, outer WBWA not shareable */
+#define TTB_FLAGS_UP	TTB_IRGN_WBWA|TTB_RGN_OC_WBWA
+#define PMD_FLAGS_UP	PMD_SECT_WBWA
 
 /* PTWs cacheable, inner WBWA shareable, outer WBWA not shareable */
 #define TTB_FLAGS_SMP	TTB_IRGN_WBWA|TTB_S|TTB_NOS|TTB_RGN_OC_WBWA
diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 01a719e..e3cb8f7 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -31,9 +31,9 @@
 #define TTB_S		(3 << 12)
 #define TTB_EAE		(1 << 31)
 
-/* PTWs cacheable, inner WB not shareable, outer WB not shareable */
-#define TTB_FLAGS_UP	(TTB_IRGN_WB|TTB_RGN_OC_WB)
-#define PMD_FLAGS_UP	(PMD_SECT_WB)
+/* PTWs cacheable, inner WBWA not shareable, outer WBWA not shareable */
+#define TTB_FLAGS_UP	(TTB_IRGN_WBWA|TTB_RGN_OC_WBWA)
+#define PMD_FLAGS_UP	(PMD_SECT_WBWA)
 
 /* PTWs cacheable, inner WBWA shareable, outer WBWA not shareable */
 #define TTB_FLAGS_SMP	(TTB_IRGN_WBWA|TTB_S|TTB_RGN_OC_WBWA)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 2/5] ARM: kernel: allow the SCU to be enabled even on !SMP
  2014-05-20 15:35 [PATCH RFCv2 0/5] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
  2014-05-20 15:35 ` [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+ Thomas Petazzoni
@ 2014-05-20 15:35 ` Thomas Petazzoni
  2014-05-20 15:35 ` [PATCH RFCv2 3/5] ARM: allow CONFIG_SMP_ON_UP on non-SMP configurations Thomas Petazzoni
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 29+ messages in thread
From: Thomas Petazzoni @ 2014-05-20 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

Certain non-SMP platforms rely on the SCU being enabled to provide
hardware I/O coherency between DMA masters and the CPU. Therefore, the
scu_enable() hook should not be reduced to a no-op when CONFIG_SMP is
disabled.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/include/asm/smp_scu.h | 2 +-
 arch/arm/kernel/smp_scu.c      | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/smp_scu.h b/arch/arm/include/asm/smp_scu.h
index 0393fba..e78f72d 100644
--- a/arch/arm/include/asm/smp_scu.h
+++ b/arch/arm/include/asm/smp_scu.h
@@ -37,7 +37,7 @@ static inline int scu_power_mode(void __iomem *scu_base, unsigned int mode)
 }
 #endif
 
-#if defined(CONFIG_SMP) && defined(CONFIG_HAVE_ARM_SCU)
+#if defined(CONFIG_HAVE_ARM_SCU)
 void scu_enable(void __iomem *scu_base);
 #else
 static inline void scu_enable(void __iomem *scu_base) {}
diff --git a/arch/arm/kernel/smp_scu.c b/arch/arm/kernel/smp_scu.c
index 1aafa0d..09d21b0 100644
--- a/arch/arm/kernel/smp_scu.c
+++ b/arch/arm/kernel/smp_scu.c
@@ -22,7 +22,6 @@
 #define SCU_INVALIDATE		0x0c
 #define SCU_FPGA_REVISION	0x10
 
-#ifdef CONFIG_SMP
 /*
  * Get the number of CPU cores from the SCU configuration
  */
@@ -62,7 +61,6 @@ void scu_enable(void __iomem *scu_base)
 	 */
 	flush_cache_all();
 }
-#endif
 
 /*
  * Set the executing CPUs power mode as defined.  This will be in
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 3/5] ARM: allow CONFIG_SMP_ON_UP on non-SMP configurations
  2014-05-20 15:35 [PATCH RFCv2 0/5] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
  2014-05-20 15:35 ` [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+ Thomas Petazzoni
  2014-05-20 15:35 ` [PATCH RFCv2 2/5] ARM: kernel: allow the SCU to be enabled even on !SMP Thomas Petazzoni
@ 2014-05-20 15:35 ` Thomas Petazzoni
  2014-05-26 16:35   ` Russell King - ARM Linux
  2014-05-20 15:35 ` [PATCH RFCv2 4/5] ARM: mvebu: enable coherency only when possible on Armada XP Thomas Petazzoni
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Thomas Petazzoni @ 2014-05-20 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

Currently CONFIG_SMP_ON_UP is used when CONFIG_SMP is enabled to allow
a SMP kernel to boot on UP platforms that do not support certain
aspects of SMP platforms.

However, some of the SMP characteristics are needed on platforms that
have SMP capabilities but are single core, and use their SMP
capabilities to provide hardware I/O coherency. This is for example
the case on Marvell Armada platforms.

Instead of special casing these platforms to enable the SMP and TLB
broadcast bit and use shareable pages, this patch proposes to simply
allow CONFIG_SMP_ON_UP to be enabled even on non-SMP
configurations. When CONFIG_SMP_ON_UP is enabled in a !CONFIG_SMP
configuration, it will check that the processor is SMP capable, and if
yes use all the SMP characteristics (SMP and TLB broadcast bit,
shareable pages, etc.). It will not support multiprocessing of course,
but it will have sufficient capabilities to allow hardware I/O
coherency to work.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
Of course, the name of CONFIG_SMP_ON_UP is maybe no longer
appropriate. Suggestions are welcome to change this: this patch is
really the minimal set of changes to get things to work.

We could also decide to completely get rid of CONFIG_SMP_ON_UP and
make it always on. Suggestions welcome.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/Kconfig                 | 2 +-
 arch/arm/include/asm/assembler.h | 2 +-
 arch/arm/include/asm/smp_plat.h  | 8 ++++----
 arch/arm/mm/proc-v7.S            | 4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index ab438cb..d3da54b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1501,7 +1501,7 @@ config SMP
 
 config SMP_ON_UP
 	bool "Allow booting SMP kernel on uniprocessor systems (EXPERIMENTAL)"
-	depends on SMP && !XIP_KERNEL && MMU
+	depends on !XIP_KERNEL && MMU
 	default y
 	help
 	  SMP kernels contain instructions which fail on non-SMP processors.
diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index b974184..6e9fb68 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -223,7 +223,7 @@
 	.long	9999b,9001f;			\
 	.popsection
 
-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) || defined(CONFIG_SMP_ON_UP)
 #define ALT_SMP(instr...)					\
 9998:	instr
 /*
diff --git a/arch/arm/include/asm/smp_plat.h b/arch/arm/include/asm/smp_plat.h
index a252c0b..496a71d 100644
--- a/arch/arm/include/asm/smp_plat.h
+++ b/arch/arm/include/asm/smp_plat.h
@@ -15,13 +15,13 @@
  */
 static inline bool is_smp(void)
 {
-#ifndef CONFIG_SMP
-	return false;
-#elif defined(CONFIG_SMP_ON_UP)
+#if defined(CONFIG_SMP_ON_UP)
 	extern unsigned int smp_on_up;
 	return !!smp_on_up;
-#else
+#elif defined(CONFIG_SMP)
 	return true;
+#else
+	return false;
 #endif
 }
 
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 195731d..c7777a5 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -196,7 +196,7 @@ __v7_ca12mp_setup:
 __v7_ca15mp_setup:
 	mov	r10, #0
 1:
-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) || defined(CONFIG_SMP_ON_UP)
 	ALT_SMP(mrc	p15, 0, r0, c1, c0, 1)
 	ALT_UP(mov	r0, #(1 << 6))		@ fake it for UP
 	tst	r0, #(1 << 6)			@ SMP/nAMP mode enabled?
@@ -248,7 +248,7 @@ __v7_pj4b_setup:
 
 	/* Auxiliary Functional Modes Control Register 0 */
 	mrc	p15, 1,	r0, c15, c2, 0
-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) || defined(CONFIG_SMP_ON_UP)
 	orr	r0, r0, #PJ4B_SMP_CFB
 #endif
 	orr	r0, r0, #PJ4B_L1_PAR_CHK
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 4/5] ARM: mvebu: enable coherency only when possible on Armada XP
  2014-05-20 15:35 [PATCH RFCv2 0/5] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
                   ` (2 preceding siblings ...)
  2014-05-20 15:35 ` [PATCH RFCv2 3/5] ARM: allow CONFIG_SMP_ON_UP on non-SMP configurations Thomas Petazzoni
@ 2014-05-20 15:35 ` Thomas Petazzoni
  2014-05-20 15:35 ` [PATCH RFCv2 5/5] ARM: mvebu: add debugging message indicating the status of I/O coherency Thomas Petazzoni
  2014-05-26 13:32 ` [PATCH RFCv2 0/5] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
  5 siblings, 0 replies; 29+ messages in thread
From: Thomas Petazzoni @ 2014-05-20 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Armada XP, the hardware I/O coherency can only be enabled if the
SMP capabilities are enabled (shareable pages, SMP bit, etc.). This
commit ensures that we don't enable hardware I/O coherency if such
requirements are not met.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/mach-mvebu/coherency.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
index d5a975b..2a204d0 100644
--- a/arch/arm/mach-mvebu/coherency.c
+++ b/arch/arm/mach-mvebu/coherency.c
@@ -322,9 +322,16 @@ static int coherency_type(void)
 	if (np) {
 		int type = (int) match->data;
 
-		/* Armada 370/XP coherency works in both UP and SMP */
-		if (type == COHERENCY_FABRIC_TYPE_ARMADA_370_XP)
-			return type;
+		/*
+		 * Armada 370 coherency has not special requirements,
+		 * Armada XP coherency requires SMP capabilities.
+		 */
+		if (type == COHERENCY_FABRIC_TYPE_ARMADA_370_XP) {
+			if (of_machine_is_compatible("marvell,armadaxp") && is_smp())
+				return type;
+			else if (of_machine_is_compatible("marvell,armada370"))
+				return type;
+		}
 
 		/* Armada 375 coherency works only on SMP */
 		else if (type == COHERENCY_FABRIC_TYPE_ARMADA_375 && is_smp())
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 5/5] ARM: mvebu: add debugging message indicating the status of I/O coherency
  2014-05-20 15:35 [PATCH RFCv2 0/5] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
                   ` (3 preceding siblings ...)
  2014-05-20 15:35 ` [PATCH RFCv2 4/5] ARM: mvebu: enable coherency only when possible on Armada XP Thomas Petazzoni
@ 2014-05-20 15:35 ` Thomas Petazzoni
  2014-05-26 13:32 ` [PATCH RFCv2 0/5] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
  5 siblings, 0 replies; 29+ messages in thread
From: Thomas Petazzoni @ 2014-05-20 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

Since the requirements for I/O coherency to work may not be trivial,
it's useful to have a debugging message that indicates whether the
hardware I/O coherency is enabled or not.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/mach-mvebu/coherency.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
index 2a204d0..6fad623 100644
--- a/arch/arm/mach-mvebu/coherency.c
+++ b/arch/arm/mach-mvebu/coherency.c
@@ -363,6 +363,11 @@ int __init coherency_init(void)
 		 type == COHERENCY_FABRIC_TYPE_ARMADA_380)
 		armada_375_380_coherency_init(np);
 
+	if (type != COHERENCY_FABRIC_TYPE_NONE)
+		pr_info("hardware I/O coherency enabled\n");
+	else
+		pr_info("hardware I/O coherency disabled\n");
+
 	return 0;
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 0/5] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-20 15:35 [PATCH RFCv2 0/5] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
                   ` (4 preceding siblings ...)
  2014-05-20 15:35 ` [PATCH RFCv2 5/5] ARM: mvebu: add debugging message indicating the status of I/O coherency Thomas Petazzoni
@ 2014-05-26 13:32 ` Thomas Petazzoni
  5 siblings, 0 replies; 29+ messages in thread
From: Thomas Petazzoni @ 2014-05-26 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

Catalin, Will, Russell,

Do you have some comments about this patch series?

It would be interesting to first get your opinion on PATCH 1/5 so that
we can hopefully get this sorted out for 3.16.

For the rest of the series, especially PATCH 3/5, your feedback would
be appreciated, so that I can see if we're going in the right direction
here.

Thanks,

Thomas

On Tue, 20 May 2014 17:35:00 +0200, Thomas Petazzoni wrote:
> Hello,
> 
> Several of the latest Marvell Armada EBU platforms have a special
> feature that is fairly uncommon on ARM platforms: hardware I/O
> coherency. This feature allows to remove all cache maintenance
> operations that are normally needed when doing DMA transfers (only an
> I/O sync barrier is needed for DMA transfers from device).
> 
> This hardware I/O coherency mechanism needs a set of ARM core
> requirements to operate properly:
> 
>  1 On Armada 370 (a single core processor)
> 
>    a The cache policy of pages must be set to "write allocate".
> 
>  2 On Armada XP (which has dual core and quad core variants)
> 
>    a The cache policy of pages must be set to "write allocate".
>    b The SMP bit in the Auxiliary Functional Modes Control Register 0
>      must be set (remember the CPU core is PJ4B)
>    c The pages must be set as shareable.
> 
>  3 On Armada 375/38x (which have single core and dual core variants)
> 
>    a The cache policy of pages must be set to "write allocate".
>    b The SMP and TLB broadcast bits must be set in the Auxiliary
>      Control Register (the core is a Cortex-A9)
>    c The pages must be set as shareable.
>    d The SCU must be enabled
> 
> All of these requirements are met when the kernel is configured with
> CONFIG_SMP *and* when the processor is actually a multiple core
> processors (otherwise, due to the current CONFIG_SMP_ON_UP logic,
> is_smp() returns false, and most of the requirements above are not
> met).
> 
> For example, as of today, Armada 370 is broken because even if we
> build the kernel CONFIG_SMP, the pages do not have the "write
> allocate" attribute, because Armada 370 is single core, therefore the
> CONFIG_SMP_ON_UP logic decides that we're not on an SMP system, and
> therefore is_smp() returns false. Therefore, the fact that hardware
> I/O coherency is enabled on Armada 370 today is incorrect, and there
> is no way to fix that without making changes to the ARM core code.
> 
> Similarly, the Armada 380 (single core variant of the Armada 385) will
> have the same problem: it's a single core processor, but that requires
> several characteristics of an SMP configuration to properly use
> hardware I/O coherency.
> 
> Also:
> 
>  - Not being able to use hardware I/O coherency on Armada 370 and
>    Armada 380 is not really an option, as it is a major feature of
>    those SoCs.
> 
>  - Having to enable CONFIG_SMP is also not very nice, as it comes with
>    other performance penalties that could be avoided by using a
>    !CONFIG_SMP configuration on those single core systems. And again,
>    enabling CONFIG_SMP does *not* work for single core processors due
>    to CONFIG_SMP_ON_UP.
> 
> Therefore, this RFC patch series proposes one solution to allow these
> requirements to be met in the situations we are interested.
> 
> Even though the patch series is at RFC stage, I'm interested in
> collecting ACKs even for certain patches only (like PATCH 1 and PATCH
> 2, which are probably less complicated than PATCH 3). This way, the
> problem could be solved progressively, and the size of the patch
> series reduced.
> 
> Basically, the patch series goes like this:
> 
>  * PATCH 1 makes write allocate the default cache policy for ARMv6+
>    system. This was suggested by Catalin Marinas and Rob Herring. This
>    patch provides the requirements 1.a, 2.a and 3.a detailed previously.
> 
>  * PATCH 2 allows the SCU to be enabled even in !CONFIG_SMP
>    configurations. This patch provides the requirement 3.d detailed
>    previously.
> 
>  * PATCH 3 makes CONFIG_SMP_ON_UP no longer dependent on CONFIG_SMP,
>    which allows to enable SMP-related features (SMP bit, TLB broadcast
>    bit, shareable pages) even when the platform uses a single core but
>    advertises MP features in its MPIDR register. Provided
>    CONFIG_SMP_ON_UP is enabled, this allows !CONFIG_SMP configurations
>    to use hardware I/O coherency. This patch provides the requirements
>    2.b, 2.c, 3.b and 3.c detailed previously.
> 
>  * PATCH 4 and 5 are simple additions to the mvebu coherency code.
> 
> Changes since v1:
> 
>  * Completely move away from the machine_desc flags approach, and
>    instead try to find a global solution:
>     - Make write allocate the default cache policy on all ARMv6+
>       systems.
>     - Allow CONFIG_SMP_ON_UP to be used on !CONFIG_SMP configurations,
>       in order to enable MP related features in !CONFIG_SMP
>       situations.
> 
> Thanks,
> 
> Thomas
> 
> Thomas Petazzoni (5):
>   ARM: use write allocate by default on ARMv6+
>   ARM: kernel: allow the SCU to be enabled even on !SMP
>   ARM: allow CONFIG_SMP_ON_UP on non-SMP configurations
>   ARM: mvebu: enable coherency only when possible on Armada XP
>   ARM: mvebu: add debugging message indicating the status of I/O
>     coherency
> 
>  arch/arm/Kconfig                 |  2 +-
>  arch/arm/include/asm/assembler.h |  2 +-
>  arch/arm/include/asm/smp_plat.h  |  8 ++++----
>  arch/arm/include/asm/smp_scu.h   |  2 +-
>  arch/arm/kernel/smp_scu.c        |  2 --
>  arch/arm/mach-mvebu/coherency.c  | 18 +++++++++++++++---
>  arch/arm/mm/mmu.c                | 10 +++++-----
>  arch/arm/mm/proc-v6.S            |  2 +-
>  arch/arm/mm/proc-v7-2level.S     |  6 +++---
>  arch/arm/mm/proc-v7-3level.S     |  6 +++---
>  arch/arm/mm/proc-v7.S            |  4 ++--
>  11 files changed, 36 insertions(+), 26 deletions(-)
> 



-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-05-20 15:35 ` [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+ Thomas Petazzoni
@ 2014-05-26 16:26   ` Russell King - ARM Linux
  2014-05-27  9:15     ` Thomas Petazzoni
  0 siblings, 1 reply; 29+ messages in thread
From: Russell King - ARM Linux @ 2014-05-26 16:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 20, 2014 at 05:35:01PM +0200, Thomas Petazzoni wrote:
> Currently, the default cache policy for ARMv6+ platform is writeback
> in UP mode and write allocate in SMP mode
> 
> Some platforms that provide hardware I/O coherency capabilities (such
> as Marvell Armada 370/XP/375/38x) require the cache policy to be write
> allocate even in UP mode. This is especially true on Armada 370, which
> is a single core processor, so even with CONFIG_SMP=y, is_smp()
> returns false and the cache policy remains defined to writeback.
> 
> As suggested by Catalin Marinas and Rob Herring, there is in fact no
> good reason to keep using writeback by default on ARMv6+, and we could
> switch to write allocate by default for all ARMv6+ platforms instead
> of special-casing the Marvell platforms.

Firstly, whether write allocate is beneficial or detrimental depends on
the workload being executed, and I'm not about to change this based on
such loose reasoning.

Secondly, we're losing information.  SMP uses write-allocate not due to
some kind of default, not because of some policy reason, but because SMP
is *required* architecturally to use write-allocate and this must never
allowed to be overridden.  You break this by allowing SMP to specify
writeback read-allocate.

Thirdly, you remove the possiblity of using writeback write-allocate on
ARMv5.  There are ARMv5 CPUs which have caches capable of that.

So, I think this is rather broken.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 3/5] ARM: allow CONFIG_SMP_ON_UP on non-SMP configurations
  2014-05-20 15:35 ` [PATCH RFCv2 3/5] ARM: allow CONFIG_SMP_ON_UP on non-SMP configurations Thomas Petazzoni
@ 2014-05-26 16:35   ` Russell King - ARM Linux
  2014-05-27 11:59     ` Thomas Petazzoni
  0 siblings, 1 reply; 29+ messages in thread
From: Russell King - ARM Linux @ 2014-05-26 16:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 20, 2014 at 05:35:03PM +0200, Thomas Petazzoni wrote:
> Currently CONFIG_SMP_ON_UP is used when CONFIG_SMP is enabled to allow
> a SMP kernel to boot on UP platforms that do not support certain
> aspects of SMP platforms.
> 
> However, some of the SMP characteristics are needed on platforms that
> have SMP capabilities but are single core, and use their SMP
> capabilities to provide hardware I/O coherency. This is for example
> the case on Marvell Armada platforms.
> 
> Instead of special casing these platforms to enable the SMP and TLB
> broadcast bit and use shareable pages, this patch proposes to simply
> allow CONFIG_SMP_ON_UP to be enabled even on non-SMP
> configurations. When CONFIG_SMP_ON_UP is enabled in a !CONFIG_SMP
> configuration, it will check that the processor is SMP capable, and if
> yes use all the SMP characteristics (SMP and TLB broadcast bit,
> shareable pages, etc.). It will not support multiprocessing of course,
> but it will have sufficient capabilities to allow hardware I/O
> coherency to work.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> ---
> Of course, the name of CONFIG_SMP_ON_UP is maybe no longer
> appropriate. Suggestions are welcome to change this: this patch is
> really the minimal set of changes to get things to work.
> 
> We could also decide to completely get rid of CONFIG_SMP_ON_UP and
> make it always on. Suggestions welcome.

This feels very wrong - while you may need a SCU for coherency, which
implies that you have a SMP capable CPU, this feels very much like
overloading a currently well-defined facility with a well-defined
purpose (which gets used for a lot of things) with a new purpose, which
is going to be painful in the future if we ever need to separate the
two things.  So no, I really don't like this.

If we need to change things here, make the change properly even if it
means that it takes more time, and don't do quick sticky-plasters like
this - the quick sticky-plaster method ultimately makes stuff less
maintainable.

Secondly, doesn't this make your first patch redundant - if you need the
SMP instructions, then is_smp() returns true, which then forces WBWA
cache mode.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-05-26 16:26   ` Russell King - ARM Linux
@ 2014-05-27  9:15     ` Thomas Petazzoni
  2014-05-27  9:47       ` Russell King - ARM Linux
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Petazzoni @ 2014-05-27  9:15 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

Thanks for your review!

On Mon, 26 May 2014 17:26:43 +0100, Russell King - ARM Linux wrote:
> On Tue, May 20, 2014 at 05:35:01PM +0200, Thomas Petazzoni wrote:
> > Currently, the default cache policy for ARMv6+ platform is writeback
> > in UP mode and write allocate in SMP mode
> > 
> > Some platforms that provide hardware I/O coherency capabilities
> > (such as Marvell Armada 370/XP/375/38x) require the cache policy to
> > be write allocate even in UP mode. This is especially true on
> > Armada 370, which is a single core processor, so even with
> > CONFIG_SMP=y, is_smp() returns false and the cache policy remains
> > defined to writeback.
> > 
> > As suggested by Catalin Marinas and Rob Herring, there is in fact no
> > good reason to keep using writeback by default on ARMv6+, and we
> > could switch to write allocate by default for all ARMv6+ platforms
> > instead of special-casing the Marvell platforms.
> 
> Firstly, whether write allocate is beneficial or detrimental depends
> on the workload being executed, and I'm not about to change this
> based on such loose reasoning.

Well, in my case, what's driving the usage of write allocate instead of
write back is not a performance concern. Supporting hardware I/O
coherency on Marvell Armada platforms *requires* having the  write
allocate cache policy. So in my "PATCH RFCv1" I added flags to
machine_desc so that individual machines can tell whether they
absolutely need write allocate or not.

The feedback from Catalin and Rob was that maybe we could simply make
write allocate the default for ARMv6+, which would avoid the need to
make a special case for Marvell Armada platforms.

Maybe Catalin and Rob can detail why they think switching to write
allocate as the default for ARMv6+ makes sense?

> Secondly, we're losing information.  SMP uses write-allocate not due
> to some kind of default, not because of some policy reason, but
> because SMP is *required* architecturally to use write-allocate and
> this must never allowed to be overridden.

Absolutely. Just like it is *required* for hardware I/O coherency on
Marvell Armada platforms, even those that are non-SMP like Armada 370
or Armada 380.

> You break this by allowing SMP to specify writeback read-allocate.

I am certainly not very familiar with the code in mm/mmu.c, but I'm not
sure to see why I allow SMP to specify writeback read-allocate. What
the code does is:

 * Make CPOLICY_WRITEALLOC the default value for "cachepolicy"

 * In early_cachepolicy(), if a specific cachepolicy= argument is
   passed to the kernel, we use it, except for >= ARMv6, where
   CPOLICY_WRITEALLOC is forced.

 * In build_mem_type_table(), we override the cache policy to buffered,
   writethrough or writeback only in < ARMv6 and < ARMv5 situations.

I've grepped again for all places where "cachepolicy" is modified in
mm/mmu.c, and I fail to see where cachepolicy could be set back to
writeback read-allocate in ARMv6+ platforms.

> Thirdly, you remove the possiblity of using writeback write-allocate
> on ARMv5.  There are ARMv5 CPUs which have caches capable of that.

Indeed, with my patch, CPOLICY_WRITEALLOC is no longer allowed on
ARMv5. That's because I wanted to preserve the default of being
CPOLICY_WRITEBACK on ARMv5.

So maybe, I should keep the cachepolicy default value to
CPOLICY_WRITEBACK, and then add a:

	if (cpu_arch >= CPU_ARCH_ARMv6)
		cachepolicy = CPOLICY_WRITEALLOC;

in build_mem_type_table().

Would this work for you?

Thanks for your feedback,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-05-27  9:15     ` Thomas Petazzoni
@ 2014-05-27  9:47       ` Russell King - ARM Linux
  2014-05-27 11:52         ` Thomas Petazzoni
  0 siblings, 1 reply; 29+ messages in thread
From: Russell King - ARM Linux @ 2014-05-27  9:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 27, 2014 at 11:15:42AM +0200, Thomas Petazzoni wrote:
> Russell,
> 
> Thanks for your review!
> 
> On Mon, 26 May 2014 17:26:43 +0100, Russell King - ARM Linux wrote:
> > On Tue, May 20, 2014 at 05:35:01PM +0200, Thomas Petazzoni wrote:
> > > Currently, the default cache policy for ARMv6+ platform is writeback
> > > in UP mode and write allocate in SMP mode
> > > 
> > > Some platforms that provide hardware I/O coherency capabilities
> > > (such as Marvell Armada 370/XP/375/38x) require the cache policy to
> > > be write allocate even in UP mode. This is especially true on
> > > Armada 370, which is a single core processor, so even with
> > > CONFIG_SMP=y, is_smp() returns false and the cache policy remains
> > > defined to writeback.
> > > 
> > > As suggested by Catalin Marinas and Rob Herring, there is in fact no
> > > good reason to keep using writeback by default on ARMv6+, and we
> > > could switch to write allocate by default for all ARMv6+ platforms
> > > instead of special-casing the Marvell platforms.
> > 
> > Firstly, whether write allocate is beneficial or detrimental depends
> > on the workload being executed, and I'm not about to change this
> > based on such loose reasoning.
> 
> Well, in my case, what's driving the usage of write allocate instead of
> write back is not a performance concern. Supporting hardware I/O
> coherency on Marvell Armada platforms *requires* having the  write
> allocate cache policy. So in my "PATCH RFCv1" I added flags to
> machine_desc so that individual machines can tell whether they
> absolutely need write allocate or not.
> 
> The feedback from Catalin and Rob was that maybe we could simply make
> write allocate the default for ARMv6+, which would avoid the need to
> make a special case for Marvell Armada platforms.
> 
> Maybe Catalin and Rob can detail why they think switching to write
> allocate as the default for ARMv6+ makes sense?
> 
> > Secondly, we're losing information.  SMP uses write-allocate not due
> > to some kind of default, not because of some policy reason, but
> > because SMP is *required* architecturally to use write-allocate and
> > this must never allowed to be overridden.
> 
> Absolutely. Just like it is *required* for hardware I/O coherency on
> Marvell Armada platforms, even those that are non-SMP like Armada 370
> or Armada 380.

What this boils down to is this statement:

	If we require hardware coherency between any other master and
	a CPU, we require the caches to be in write allocate mode.

whether "any other master" is another CPU or a device.  So, SMP is just
a specific case of the general case, and we should treat the selection
of this as per the generic case.

> > You break this by allowing SMP to specify writeback read-allocate.
> 
> I am certainly not very familiar with the code in mm/mmu.c, but I'm not
> sure to see why I allow SMP to specify writeback read-allocate. What
> the code does is:

@@ -408,14 +408,14 @@ static void __init build_mem_type_table(void)
                if (cachepolicy > CPOLICY_WRITETHROUGH)
                        cachepolicy = CPOLICY_WRITETHROUGH;
 #endif
+               if (cachepolicy > CPOLICY_WRITEBACK)
+                       cachepolicy = CPOLICY_WRITEBACK;
        }

and the rest of this hunk.

>  * In early_cachepolicy(), if a specific cachepolicy= argument is
>    passed to the kernel, we use it, except for >= ARMv6, where
>    CPOLICY_WRITEALLOC is forced.

What if you want to run ARMv6 UP (non-coherent) in read-allocate mode
(which is the way it is today?)

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-05-27  9:47       ` Russell King - ARM Linux
@ 2014-05-27 11:52         ` Thomas Petazzoni
  2014-05-27 12:18           ` Russell King - ARM Linux
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Petazzoni @ 2014-05-27 11:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Russell,

On Tue, 27 May 2014 10:47:49 +0100, Russell King - ARM Linux wrote:

> > Absolutely. Just like it is *required* for hardware I/O coherency on
> > Marvell Armada platforms, even those that are non-SMP like Armada
> > 370 or Armada 380.
> 
> What this boils down to is this statement:
> 
> 	If we require hardware coherency between any other master and
> 	a CPU, we require the caches to be in write allocate mode.
> 
> whether "any other master" is another CPU or a device.  So, SMP is
> just a specific case of the general case, and we should treat the
> selection of this as per the generic case.

Right. However, remember that I have to support hardware I/O coherency
on processors for which is_smp() returns false, even when CONFIG_SMP=y
and CONFIG_SMP_ON_UP=y. The Armada 370 is in this case.

> > > You break this by allowing SMP to specify writeback read-allocate.
> > 
> > I am certainly not very familiar with the code in mm/mmu.c, but I'm
> > not sure to see why I allow SMP to specify writeback read-allocate.
> > What the code does is:
> 
> @@ -408,14 +408,14 @@ static void __init build_mem_type_table(void)
>                 if (cachepolicy > CPOLICY_WRITETHROUGH)
>                         cachepolicy = CPOLICY_WRITETHROUGH;
>  #endif
> +               if (cachepolicy > CPOLICY_WRITEBACK)
> +                       cachepolicy = CPOLICY_WRITEBACK;
>         }
> 
> and the rest of this hunk.

If specific hunk is inside a:

	if (cpu_arch < CPU_ARCH_ARMv6) {

and as far as I'm aware, there are no ARMv5 or earlier platforms that
are SMP-capable. Are there any?

> >  * In early_cachepolicy(), if a specific cachepolicy= argument is
> >    passed to the kernel, we use it, except for >= ARMv6, where
> >    CPOLICY_WRITEALLOC is forced.
> 
> What if you want to run ARMv6 UP (non-coherent) in read-allocate mode
> (which is the way it is today?)

That's indeed no longer possible with my patch, but we could allow
that with a default on write-allocate, but a possibility of using
cachepolicy=writeback to get back the previous default cache policy.
Would that work for you?

However, how to verify that cachepolicy=writeback doesn't contradict
the need for write-allocate policy for SMP or hardware I/O coherency?

Again, switching to use write-allocate by default was Catalin and Rob
suggestion. My original suggestion was to make it a per-platform
choice, through flags in machine_desc (see my PATCH RFCv1 at
http://thread.gmane.org/gmane.linux.ports.arm.kernel/325200/).

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 3/5] ARM: allow CONFIG_SMP_ON_UP on non-SMP configurations
  2014-05-26 16:35   ` Russell King - ARM Linux
@ 2014-05-27 11:59     ` Thomas Petazzoni
  2014-05-27 12:22       ` Russell King - ARM Linux
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Petazzoni @ 2014-05-27 11:59 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

On Mon, 26 May 2014 17:35:56 +0100, Russell King - ARM Linux wrote:

> > Instead of special casing these platforms to enable the SMP and TLB
> > broadcast bit and use shareable pages, this patch proposes to simply
> > allow CONFIG_SMP_ON_UP to be enabled even on non-SMP
> > configurations. When CONFIG_SMP_ON_UP is enabled in a !CONFIG_SMP
> > configuration, it will check that the processor is SMP capable, and
> > if yes use all the SMP characteristics (SMP and TLB broadcast bit,
> > shareable pages, etc.). It will not support multiprocessing of
> > course, but it will have sufficient capabilities to allow hardware
> > I/O coherency to work.
> > 
> > Signed-off-by: Thomas Petazzoni
> > <thomas.petazzoni@free-electrons.com> ---
> > Of course, the name of CONFIG_SMP_ON_UP is maybe no longer
> > appropriate. Suggestions are welcome to change this: this patch is
> > really the minimal set of changes to get things to work.
> > 
> > We could also decide to completely get rid of CONFIG_SMP_ON_UP and
> > make it always on. Suggestions welcome.
> 
> This feels very wrong - while you may need a SCU for coherency, which
> implies that you have a SMP capable CPU,

Actually, this patch is not so much about the SCU, but about having the
SMP bit set in ACTLR and the page table entries having the shareable
attribute. This is needed for Armada XP, Armada 375 and Armada 38x to
provide hardware I/O coherency even when CONFIG_SMP is disabled.

> this feels very much like
> overloading a currently well-defined facility with a well-defined
> purpose (which gets used for a lot of things) with a new purpose,
> which is going to be painful in the future if we ever need to
> separate the two things.  So no, I really don't like this.
> 
> If we need to change things here, make the change properly even if it
> means that it takes more time, and don't do quick sticky-plasters like
> this - the quick sticky-plaster method ultimately makes stuff less
> maintainable.

Sure, but you could give some hints or starting points as to what you
consider the proper way of making such a change?

> Secondly, doesn't this make your first patch redundant - if you need
> the SMP instructions, then is_smp() returns true, which then forces
> WBWA cache mode.

No, because the Armada 370 (single core processor) needs the
write-allocate cache policy to provide hardware I/O coherency, but its
MPIDR does not indicate it's a SMP-capable processor. So even with
CONFIG_SMP_ON_UP=y, is_smp() returns false on Armada 370, and the kernel
continues to use the WB cache mode.

Again, I'm all open to your suggestions on how to make the requirements
of hardware I/O coherency available on !CONFIG_SMP configurations. I've
highlighted very clearly in the cover letter of this patch series what
are these requirements.

Thanks for your feedback!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-05-27 11:52         ` Thomas Petazzoni
@ 2014-05-27 12:18           ` Russell King - ARM Linux
  2014-05-27 12:37             ` Thomas Petazzoni
  0 siblings, 1 reply; 29+ messages in thread
From: Russell King - ARM Linux @ 2014-05-27 12:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 27, 2014 at 01:52:16PM +0200, Thomas Petazzoni wrote:
> Hello Russell,
> 
> On Tue, 27 May 2014 10:47:49 +0100, Russell King - ARM Linux wrote:
> 
> > > Absolutely. Just like it is *required* for hardware I/O coherency on
> > > Marvell Armada platforms, even those that are non-SMP like Armada
> > > 370 or Armada 380.
> > 
> > What this boils down to is this statement:
> > 
> > 	If we require hardware coherency between any other master and
> > 	a CPU, we require the caches to be in write allocate mode.
> > 
> > whether "any other master" is another CPU or a device.  So, SMP is
> > just a specific case of the general case, and we should treat the
> > selection of this as per the generic case.
> 
> Right. However, remember that I have to support hardware I/O coherency
> on processors for which is_smp() returns false, even when CONFIG_SMP=y
> and CONFIG_SMP_ON_UP=y. The Armada 370 is in this case.

If is_smp() returns false, that is because CONFIG_SMP was not defined.
In patch 3, you turn this into something which follows smp_on_up when
SMP_ON_UP is enabled.  smp_on_up is true when we have the (currently)
SMP case instructions in place, rather than the UP case.

Since you need the SMP case instructions for hardware coherency, the
effect of patch 3 is that smp_on_up will be true for hardware coherency
as well.  Hence, is_smp() *will* return true for hardware coherency.

So your above statement is wrong with patch 3 applied, and this nicely
illustrates why this "simple" solution is a problem - we end up with a
load of things which _appear_ to be only for SMP which *unexpectedly*
end up being enabled for hardware coherency.

> > > > You break this by allowing SMP to specify writeback read-allocate.
> > > 
> > > I am certainly not very familiar with the code in mm/mmu.c, but I'm
> > > not sure to see why I allow SMP to specify writeback read-allocate.
> > > What the code does is:
> > 
> > @@ -408,14 +408,14 @@ static void __init build_mem_type_table(void)
> >                 if (cachepolicy > CPOLICY_WRITETHROUGH)
> >                         cachepolicy = CPOLICY_WRITETHROUGH;
> >  #endif
> > +               if (cachepolicy > CPOLICY_WRITEBACK)
> > +                       cachepolicy = CPOLICY_WRITEBACK;
> >         }
> > 
> > and the rest of this hunk.
> 
> If specific hunk is inside a:
> 
> 	if (cpu_arch < CPU_ARCH_ARMv6) {
> 
> and as far as I'm aware, there are no ARMv5 or earlier platforms that
> are SMP-capable. Are there any?

First, I said in one of my previous messages, there *are* ARMv5 which
*do* support write-allocate, so this line *imposes* a *new* restriction
that we no longer support write-allocate on ARMv5.  So the above quote
is a problem there.

Second, I said "and the rest of this hunk" which is where you remove the
is_smp() test.

> > >  * In early_cachepolicy(), if a specific cachepolicy= argument is
> > >    passed to the kernel, we use it, except for >= ARMv6, where
> > >    CPOLICY_WRITEALLOC is forced.
> > 
> > What if you want to run ARMv6 UP (non-coherent) in read-allocate mode
> > (which is the way it is today?)
> 
> That's indeed no longer possible with my patch, but we could allow
> that with a default on write-allocate, but a possibility of using
> cachepolicy=writeback to get back the previous default cache policy.
> Would that work for you?

There's the problem that the initial page table setup (done by the
assembly code) must match the page table setup done later in C as far
as these attributes go, so it's not that easy.

I think if we make the is_smp() thing be a is_hw_coherency() thing, and
turn the SMP_ON_UP thing to be ARM_HW_COHERENCY (or make ARM_HW_COHERENCY
a subset of it) then we can address this case in a more maintainable way.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 3/5] ARM: allow CONFIG_SMP_ON_UP on non-SMP configurations
  2014-05-27 11:59     ` Thomas Petazzoni
@ 2014-05-27 12:22       ` Russell King - ARM Linux
  2014-05-27 12:47         ` Thomas Petazzoni
  0 siblings, 1 reply; 29+ messages in thread
From: Russell King - ARM Linux @ 2014-05-27 12:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 27, 2014 at 01:59:18PM +0200, Thomas Petazzoni wrote:
> No, because the Armada 370 (single core processor) needs the
> write-allocate cache policy to provide hardware I/O coherency, but its
> MPIDR does not indicate it's a SMP-capable processor. So even with
> CONFIG_SMP_ON_UP=y, is_smp() returns false on Armada 370, and the kernel
> continues to use the WB cache mode.

So what's the point of enabling SMP_ON_UP?  If is_smp() returns false
with SMP=y and SMP_ON_UP=y on Armada 370, then you get none of the
SMP instructions either.  So I now start to question what the point of
_this_ patch is, which enables SMP_ON_UP on !SMP capable kernels.

A SMP kernel will select the non-SMP instructions for a UP CPU, and
is_smp() will return false.  A UP kernel will compile in the non-SMP
instructions and is_smp() will be false.  A UP kernel with SMP_ON_UP=y
will select the non-SMP instructions for your CPU and is_smp() will
return false.  So what's the point?

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-05-27 12:18           ` Russell King - ARM Linux
@ 2014-05-27 12:37             ` Thomas Petazzoni
  2014-05-28 11:24               ` Russell King - ARM Linux
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Petazzoni @ 2014-05-27 12:37 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

On Tue, 27 May 2014 13:18:06 +0100, Russell King - ARM Linux wrote:

> > > whether "any other master" is another CPU or a device.  So, SMP is
> > > just a specific case of the general case, and we should treat the
> > > selection of this as per the generic case.
> > 
> > Right. However, remember that I have to support hardware I/O coherency
> > on processors for which is_smp() returns false, even when CONFIG_SMP=y
> > and CONFIG_SMP_ON_UP=y. The Armada 370 is in this case.
> 
> If is_smp() returns false, that is because CONFIG_SMP was not defined.

No: on CONFIG_SMP=y and CONFIG_SMP_ON_UP=y, the value returned by
is_smp() depends on whether the code in arch/arm/kernel/head.S believes
the processor is a multi-core processor or not. It looks at the MPIDR,
if it's a Cortex-A9 it also looks at the SCU to see how much cores it
advertises. On Armada 370, this test concludes that the processor is
*not* SMP, and therefore even with CONFIG_SMP=y and CONFIG_SMP_ON_UP=y,
is_smp() returns false.

The current code of is_smp() in mainline is:

static inline bool is_smp(void)
{
#ifndef CONFIG_SMP
	return false;
#elif defined(CONFIG_SMP_ON_UP)
	extern unsigned int smp_on_up;
	return !!smp_on_up;
#else
	return true;
#endif
}

So if you have CONFIG_SMP and CONFIG_SMP_ON_UP, the value returned by
is_smp() is the value of smp_on_up, which is decided by the code in
arch/arm/kernel/head.S as decided above.

Also, remember that the problem is a bit more complicated than is_smp()
returning true or false:

 * For Armada 370 (which is single core and seen as UP according to the
   test in arch/arm/kernel/head.S), I want write-allocate policy but
   *NOT* the shareable pages.

 * For Armada 380 (which is single core), I want write-allocate policy
   *AND* the shareable pages.

> In patch 3, you turn this into something which follows smp_on_up when
> SMP_ON_UP is enabled.

Right.

> smp_on_up is true when we have the (currently)
> SMP case instructions in place, rather than the UP case.

If by SMP case instructions you mean ALT_SMP() vs. ALT_UP(), then yes.

> Since you need the SMP case instructions for hardware coherency, the
> effect of patch 3 is that smp_on_up will be true for hardware coherency
> as well.  Hence, is_smp() *will* return true for hardware coherency.

Not for Armada 370, see explanation above.

> So your above statement is wrong with patch 3 applied, and this nicely
> illustrates why this "simple" solution is a problem - we end up with a
> load of things which _appear_ to be only for SMP which *unexpectedly*
> end up being enabled for hardware coherency.

Indeed: the solution of making is_smp() return true for !CONFIG_SMP
configurations is also one of the potential direction that emerged from
the discussion around PATCH RFCv1 of this series.

> > > > > You break this by allowing SMP to specify writeback read-allocate.
> > > > 
> > > > I am certainly not very familiar with the code in mm/mmu.c, but I'm
> > > > not sure to see why I allow SMP to specify writeback read-allocate.
> > > > What the code does is:
> > > 
> > > @@ -408,14 +408,14 @@ static void __init build_mem_type_table(void)
> > >                 if (cachepolicy > CPOLICY_WRITETHROUGH)
> > >                         cachepolicy = CPOLICY_WRITETHROUGH;
> > >  #endif
> > > +               if (cachepolicy > CPOLICY_WRITEBACK)
> > > +                       cachepolicy = CPOLICY_WRITEBACK;
> > >         }
> > > 
> > > and the rest of this hunk.
> > 
> > If specific hunk is inside a:
> > 
> > 	if (cpu_arch < CPU_ARCH_ARMv6) {
> > 
> > and as far as I'm aware, there are no ARMv5 or earlier platforms that
> > are SMP-capable. Are there any?
> 
> First, I said in one of my previous messages, there *are* ARMv5 which
> *do* support write-allocate, so this line *imposes* a *new* restriction
> that we no longer support write-allocate on ARMv5.  So the above quote
> is a problem there.

Right, understood.

> Second, I said "and the rest of this hunk" which is where you remove the
> is_smp() test.

Ok.

> > > >  * In early_cachepolicy(), if a specific cachepolicy= argument is
> > > >    passed to the kernel, we use it, except for >= ARMv6, where
> > > >    CPOLICY_WRITEALLOC is forced.
> > > 
> > > What if you want to run ARMv6 UP (non-coherent) in read-allocate mode
> > > (which is the way it is today?)
> > 
> > That's indeed no longer possible with my patch, but we could allow
> > that with a default on write-allocate, but a possibility of using
> > cachepolicy=writeback to get back the previous default cache policy.
> > Would that work for you?
> 
> There's the problem that the initial page table setup (done by the
> assembly code) must match the page table setup done later in C as far
> as these attributes go, so it's not that easy.

Indeed, that's the main problem here. Once we are in C code, it's a lot
easier to do platform-specific stuff.

> I think if we make the is_smp() thing be a is_hw_coherency() thing, and
> turn the SMP_ON_UP thing to be ARM_HW_COHERENCY (or make ARM_HW_COHERENCY
> a subset of it) then we can address this case in a more maintainable way.

How do we address the problem that Armada 370 has different
requirements than the other I/O coherency SoC ? In RFC PATCHv2, it was
solved by having:

 * Armada 370 needs only write-allocate and does not work with
   shareable pages. Since write-allocate was becoming the default for
   ARMv6+, this requirement was met. And since Armada 370 is recognized
   as non-SMP by the SMP_ON_UP, is_smp() continues to return false, and
   shareable pages are not used.

 * Armada XP/375/38x need both write-allocate and shareable pages.
   Write-allocate is coming from the fact that i was becoming the
   default for ARMv6+. The shareable pages were coming from the fact
   that is_smp() returns true when SMP_ON_UP is enabled.

As you can see, and as I highlighted above, it's not a simple as a
is_hw_coherent() thing: Armada 370 is I/O coherent, but does not want
many of the properties of SMP-capable systems such as shareable pages.

Suggestions?

Thanks,

Thomas Petazzoni
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 3/5] ARM: allow CONFIG_SMP_ON_UP on non-SMP configurations
  2014-05-27 12:22       ` Russell King - ARM Linux
@ 2014-05-27 12:47         ` Thomas Petazzoni
  0 siblings, 0 replies; 29+ messages in thread
From: Thomas Petazzoni @ 2014-05-27 12:47 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

On Tue, 27 May 2014 13:22:54 +0100, Russell King - ARM Linux wrote:
> On Tue, May 27, 2014 at 01:59:18PM +0200, Thomas Petazzoni wrote:
> > No, because the Armada 370 (single core processor) needs the
> > write-allocate cache policy to provide hardware I/O coherency, but its
> > MPIDR does not indicate it's a SMP-capable processor. So even with
> > CONFIG_SMP_ON_UP=y, is_smp() returns false on Armada 370, and the kernel
> > continues to use the WB cache mode.
> 
> So what's the point of enabling SMP_ON_UP?  If is_smp() returns false
> with SMP=y and SMP_ON_UP=y on Armada 370, then you get none of the
> SMP instructions either.  So I now start to question what the point of
> _this_ patch is, which enables SMP_ON_UP on !SMP capable kernels.
> 
> A SMP kernel will select the non-SMP instructions for a UP CPU, and
> is_smp() will return false.  A UP kernel will compile in the non-SMP
> instructions and is_smp() will be false.  A UP kernel with SMP_ON_UP=y
> will select the non-SMP instructions for your CPU and is_smp() will
> return false.  So what's the point?

The point is this particular patch that allows CONFIG_SMP_ON_UP
on !CONFIG_SMP is to have hardware I/O coherency working in !CONFIG_SMP
configurations for Armada XP, Armada 375 and Armada 38x. The main
reason is that Armada 380 (not 385) is a single core processor, so
people want to run !CONFIG_SMP configurations on it.

However, even when CONFIG_SMP is *disabled*, we want to be able to
provide hardware I/O coherency on Armada 380 (and potentially also
Armada XP and 375, even though those processors have multiple cores, so
the usefulness of running !CONFIG_SMP kernels on these is quite
limited). Note that Armada 380, despite being single core, is
recognized as being SMP-capable, so is_smp() will return true when
!CONFIG_SMP + CONFIG_SMP_ON_UP=y.

In order to provide hardware I/O coherency on these Armada XP, 375
and 38x processors, we need:

 * Write-allocate policy, as is required on Armada 370.
 * SMP bit set in ACTLR
 * Shareable pages

See again the cover letter for this patch series, it really
exhaustively lists all the requirements we need for hardware I/O
coherency to work on the various processors.

Thanks again for your feedback, definitely very useful!

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-05-27 12:37             ` Thomas Petazzoni
@ 2014-05-28 11:24               ` Russell King - ARM Linux
  2014-05-30  8:08                 ` Thomas Petazzoni
  0 siblings, 1 reply; 29+ messages in thread
From: Russell King - ARM Linux @ 2014-05-28 11:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 27, 2014 at 02:37:45PM +0200, Thomas Petazzoni wrote:
> How do we address the problem that Armada 370 has different
> requirements than the other I/O coherency SoC ? In RFC PATCHv2, it was
> solved by having:
> 
>  * Armada 370 needs only write-allocate and does not work with
>    shareable pages. Since write-allocate was becoming the default for
>    ARMv6+, this requirement was met. And since Armada 370 is recognized
>    as non-SMP by the SMP_ON_UP, is_smp() continues to return false, and
>    shareable pages are not used.
> 
>  * Armada XP/375/38x need both write-allocate and shareable pages.
>    Write-allocate is coming from the fact that i was becoming the
>    default for ARMv6+. The shareable pages were coming from the fact
>    that is_smp() returns true when SMP_ON_UP is enabled.

But the way you go about this is totally silly - you effectively end
up enabling all the SMP stuff even though you don't need it (which means
that on a SMP kernel, you end up with a bunch of extra stuff on those
platforms which aren't SMP, just because you want maybe one or two
is_smp() sites to return true.

How about this patch for a start - which incidentally fixes two minor
bugs where specifying cachepolicy=<anything> on ARMv6 provokes a warning,
and sets the policy to writeback read-allocate, only to have it overriden
later.  The second bug is that we "hoped" that is_smp() reflects the asm
code's page table setup - this patch makes it more explicit by reading
the PMD flags, finding the appropriate entry in the table, and setting
the cache policy to that.  I've left the is_smp() check in because we
really do want to detect if something goes awry there.

However, the effect of this patch is that the C code now follows how
the assembly code sets up the page tables, which means that this is now
controllable via the PMD MMU flags in the assembly procinfo structure.
This means that for the Armada devices which need write-alloc for
coherency, you can specify that in the proc-*.S files - yes, it means
that you need a separate entry.

We should probably do a similar thing for the shared flag, but that's
something which can come after this patch.

8<==
From: Russell King <rmk+kernel@arm.linux.org.uk>
Subject: [PATCH] ARM: ARMv6: ensure C page table setup code follows assembly
 code

Fix a long standing minor bug where, for ARMv6, we don't enforce the C
code setting the same cache policy as the assembly code.  This was
introduced partially by commit 11179d8ca28d ([ARM] 4497/1: Only allow
safe cache configurations on ARMv6 and later) and also by adding SMP
support.

This patch sets the default cache policy based on the flags used by the
assembly code, and then ensures that when a cache policy command line
argument is used, we verify that on ARMv6, it matches the initial setup.

This has the side effect that the C code will now follow the settings
that the proc-*.S files use, effectively allowing them to control the
policy.  This is desirable for coherency support, which, like SMP, also
requires write-allocate cache mode.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 arch/arm/kernel/setup.c |  5 +++-
 arch/arm/mm/mmu.c       | 63 ++++++++++++++++++++++++++++++++++++-------------
 2 files changed, 51 insertions(+), 17 deletions(-)

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index df21f9f98945..aa516bc4ca30 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -72,6 +72,7 @@ static int __init fpe_setup(char *line)
 __setup("fpe=", fpe_setup);
 #endif
 
+extern void init_default_cache_policy(unsigned long);
 extern void paging_init(const struct machine_desc *desc);
 extern void early_paging_init(const struct machine_desc *,
 			      struct proc_info_list *);
@@ -603,7 +604,9 @@ static void __init setup_processor(void)
 #ifndef CONFIG_ARM_THUMB
 	elf_hwcap &= ~(HWCAP_THUMB | HWCAP_IDIVT);
 #endif
-
+#ifdef CONFIG_CPU_CP15
+	init_default_cache_policy(list->__cpu_mm_mmu_flags);
+#endif
 	erratum_a15_798181_init();
 
 	feat_v6_fixup();
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index a476051c0567..704ff018e67b 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -118,6 +118,29 @@ static struct cachepolicy cache_policies[] __initdata = {
 };
 
 #ifdef CONFIG_CPU_CP15
+/*
+ * Initialise the cache_policy variable with the initial state specified
+ * via the "pmd" value.  This is used to ensure that on ARMv6 and later,
+ * the C code sets the page tables up with the same policy as the head
+ * assembly code, which avoids an illegal state where the TLBs can get
+ * confused.  See comments in early_cachepolicy() for more information.
+ */
+void __init init_default_cache_policy(unsigned long pmd)
+{
+	int i;
+
+	pmd &= PMD_SECT_TEX(1) | PMD_SECT_BUFFERABLE | PMD_SECT_CACHEABLE;
+
+	for (i = 0; i < ARRAY_SIZE(cache_policies); i++)
+		if (cache_policies[i].pmd == pmd) {
+			cachepolicy = i;
+			break;
+		}
+
+	if (i == ARRAY_SIZE(cache_policies))
+		pr_err("ERROR: could not find cache policy\n");
+}
+
 unsigned long __init __clear_cr(unsigned long mask)
 {
 	cr_alignment = cr_alignment & ~mask;
@@ -125,27 +148,26 @@ unsigned long __init __clear_cr(unsigned long mask)
 }
 
 /*
- * These are useful for identifying cache coherency
- * problems by allowing the cache or the cache and
- * writebuffer to be turned off.  (Note: the write
- * buffer should not be on and the cache off).
+ * These are useful for identifying cache coherency problems by allowing
+ * the cache or the cache and writebuffer to be turned off.  (Note: the
+ * write buffer should not be on and the cache off).
  */
 static int __init early_cachepolicy(char *p)
 {
-	unsigned long cr = get_cr();
-	int i;
+	int i, selected = -1;
 
 	for (i = 0; i < ARRAY_SIZE(cache_policies); i++) {
 		int len = strlen(cache_policies[i].policy);
 
 		if (memcmp(p, cache_policies[i].policy, len) == 0) {
-			cachepolicy = i;
-			cr = __clear_cr(cache_policies[i].cr_mask);
+			selected = i;
 			break;
 		}
 	}
-	if (i == ARRAY_SIZE(cache_policies))
-		printk(KERN_ERR "ERROR: unknown or unsupported cache policy\n");
+
+	if (selected == -1)
+		pr_err("ERROR: unknown or unsupported cache policy\n");
+
 	/*
 	 * This restriction is partly to do with the way we boot; it is
 	 * unpredictable to have memory mapped using two different sets of
@@ -153,12 +175,18 @@ static int __init early_cachepolicy(char *p)
 	 * change these attributes once the initial assembly has setup the
 	 * page tables.
 	 */
-	if (cpu_architecture() >= CPU_ARCH_ARMv6) {
-		printk(KERN_WARNING "Only cachepolicy=writeback supported on ARMv6 and later\n");
-		cachepolicy = CPOLICY_WRITEBACK;
+	if (cpu_architecture() >= CPU_ARCH_ARMv6 && selected != cachepolicy) {
+		pr_warn("Only cachepolicy=%s supported on ARMv6 and later\n",
+			cache_policies[cachepolicy].policy);
+		return 0;
+	}
+
+	if (selected != cachepolicy) {
+		unsigned long cr = __clear_cr(cache_policies[selected].cr_mask);
+		cachepolicy = selected;
+		flush_cache_all();
+		set_cr(cr);
 	}
-	flush_cache_all();
-	set_cr(cr);
 	return 0;
 }
 early_param("cachepolicy", early_cachepolicy);
@@ -392,8 +420,11 @@ static void __init build_mem_type_table(void)
 			cachepolicy = CPOLICY_WRITEBACK;
 		ecc_mask = 0;
 	}
-	if (is_smp())
+
+	if (is_smp() && cachepolicy != CPOLICY_WRITEALLOC) {
+		pr_warn("Forcing write-allocate cache policy for SMP\n");
 		cachepolicy = CPOLICY_WRITEALLOC;
+	}
 
 	/*
 	 * Strip out features not present on earlier architectures.
-- 
1.8.3.1



-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-05-28 11:24               ` Russell King - ARM Linux
@ 2014-05-30  8:08                 ` Thomas Petazzoni
  2014-05-30  9:20                   ` Russell King - ARM Linux
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Petazzoni @ 2014-05-30  8:08 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

Thanks a lot for working on a new proposal for this topic.

On Wed, 28 May 2014 12:24:50 +0100, Russell King - ARM Linux wrote:

> But the way you go about this is totally silly - you effectively end
> up enabling all the SMP stuff even though you don't need it (which means
> that on a SMP kernel, you end up with a bunch of extra stuff on those
> platforms which aren't SMP, just because you want maybe one or two
> is_smp() sites to return true.

Again, have you looked at the PATCH RFCv1 for this patch series? My
first version was doing *exactly* what you suggest: only make the few
call sites of is_smp() I was interested in to return true for the
specific platforms that needed it.

PATCH RFCv2 was generated after a discussion with Catalin and Rob, who
suggested that we might want to simply always use the SMP capabilities
on SMP-capable processors.

> How about this patch for a start - which incidentally fixes two minor
> bugs where specifying cachepolicy=<anything> on ARMv6 provokes a warning,
> and sets the policy to writeback read-allocate, only to have it overriden
> later.  The second bug is that we "hoped" that is_smp() reflects the asm
> code's page table setup - this patch makes it more explicit by reading
> the PMD flags, finding the appropriate entry in the table, and setting
> the cache policy to that.  I've left the is_smp() check in because we
> really do want to detect if something goes awry there.
> 
> However, the effect of this patch is that the C code now follows how
> the assembly code sets up the page tables, which means that this is now
> controllable via the PMD MMU flags in the assembly procinfo structure.
> This means that for the Armada devices which need write-alloc for
> coherency, you can specify that in the proc-*.S files - yes, it means
> that you need a separate entry.

While that would work for Armada 370 and Armada XP, it does not work
for Armada 375 and Armada 38x, as far as I can see. Armada 375 and
Armada 38x are SMP-capable, but we want to support hardware I/O
coherency in a CONFIG_SMP=n configuration (mainly because Armada 380 is
single-core). This means that in proc-v7.S, we would have to define
that the page tables should use the write-allocate cache policy, and
the shareable attribute, and this for *all* Cortex-A9, because from a
MIDR point of view, there is no difference between the Cortex-A9 in
Armada 375/38x and other Cortex-A9. And there are Cortex-A9, like the
Aegis thing from TI that are *not* SMP-capable, so enabling shareable
attributes will break.

So unless I'm missing something, I don't see how your approach solves
the problem for the Cortex-A9 platforms. What would be your suggestion
to solve this issue?

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-05-30  8:08                 ` Thomas Petazzoni
@ 2014-05-30  9:20                   ` Russell King - ARM Linux
  2014-06-13 10:11                     ` Thomas Petazzoni
  0 siblings, 1 reply; 29+ messages in thread
From: Russell King - ARM Linux @ 2014-05-30  9:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 30, 2014 at 10:08:19AM +0200, Thomas Petazzoni wrote:
> Russell,
> 
> Thanks a lot for working on a new proposal for this topic.
> 
> On Wed, 28 May 2014 12:24:50 +0100, Russell King - ARM Linux wrote:
> 
> > But the way you go about this is totally silly - you effectively end
> > up enabling all the SMP stuff even though you don't need it (which means
> > that on a SMP kernel, you end up with a bunch of extra stuff on those
> > platforms which aren't SMP, just because you want maybe one or two
> > is_smp() sites to return true.
> 
> Again, have you looked at the PATCH RFCv1 for this patch series? My
> first version was doing *exactly* what you suggest: only make the few
> call sites of is_smp() I was interested in to return true for the
> specific platforms that needed it.

No, and I don't intend to, because it's probably well buried.

> PATCH RFCv2 was generated after a discussion with Catalin and Rob, who
> suggested that we might want to simply always use the SMP capabilities
> on SMP-capable processors.

Frankly, they are both wrong if they suggested the approach in your v2
patch for all the reasons I've raised as objections against v2.

> > How about this patch for a start - which incidentally fixes two minor
> > bugs where specifying cachepolicy=<anything> on ARMv6 provokes a warning,
> > and sets the policy to writeback read-allocate, only to have it overriden
> > later.  The second bug is that we "hoped" that is_smp() reflects the asm
> > code's page table setup - this patch makes it more explicit by reading
> > the PMD flags, finding the appropriate entry in the table, and setting
> > the cache policy to that.  I've left the is_smp() check in because we
> > really do want to detect if something goes awry there.
> > 
> > However, the effect of this patch is that the C code now follows how
> > the assembly code sets up the page tables, which means that this is now
> > controllable via the PMD MMU flags in the assembly procinfo structure.
> > This means that for the Armada devices which need write-alloc for
> > coherency, you can specify that in the proc-*.S files - yes, it means
> > that you need a separate entry.
> 
> While that would work for Armada 370 and Armada XP, it does not work
> for Armada 375 and Armada 38x, as far as I can see. Armada 375 and
> Armada 38x are SMP-capable,

I'm *not* trying to solve everything here, I'm only solving the problem
of the C setup code matching what the assembly setup code is doing -
which is a long standing minor bug for ARMv6+.

> but we want to support hardware I/O
> coherency in a CONFIG_SMP=n configuration (mainly because Armada 380 is
> single-core). This means that in proc-v7.S, we would have to define
> that the page tables should use the write-allocate cache policy, and
> the shareable attribute, and this for *all* Cortex-A9, because from a
> MIDR point of view, there is no difference between the Cortex-A9 in
> Armada 375/38x and other Cortex-A9. And there are Cortex-A9, like the
> Aegis thing from TI that are *not* SMP-capable, so enabling shareable
> attributes will break.

If that's the case, we can't then support IO coherency at all - because
to have the asm code setup the page tables without the S bit set, and
then have the C code overwrite them with the S bit set is also a violation
of the architecture, as Catalin said when he patched the kernel adding
this restriction:

        /*
         * This restriction is partly to do with the way we boot; it is
         * unpredictable to have memory mapped using two different sets of
         * memory attributes (shared, type, and cache attribs).  We can not
         * change these attributes once the initial assembly has setup the
         * page tables.
         */

Whatever the assembly code does in respect to these attributes, the C
code has to follow.  So, my approach is entirely correct and right.

What this means is if we can't distinguish Armada's IO coherency at
assembly time, then we can't setup the page tables without the sharable
bit set in the asm code, and then set them up with the sharable bit set
in the C code.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-05-30  9:20                   ` Russell King - ARM Linux
@ 2014-06-13 10:11                     ` Thomas Petazzoni
  2014-06-13 10:20                       ` Russell King - ARM Linux
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Petazzoni @ 2014-06-13 10:11 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

Thanks again for your feedback, and sorry for the delay in replying, I
was busy with other things.

On Fri, 30 May 2014 10:20:44 +0100, Russell King - ARM Linux wrote:

> > Again, have you looked at the PATCH RFCv1 for this patch series? My
> > first version was doing *exactly* what you suggest: only make the few
> > call sites of is_smp() I was interested in to return true for the
> > specific platforms that needed it.
> 
> No, and I don't intend to, because it's probably well buried.

Well, it's not that buried since it only dates back from last month and
is available at
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-May/256258.html.
I think looking at the prior versions of a patch series is kind of
important to understand the various solutions that have been
discussed/explored, especially when comments made on later versions
reflect back on things that where already proposed in earlier versions.

> > While that would work for Armada 370 and Armada XP, it does not work
> > for Armada 375 and Armada 38x, as far as I can see. Armada 375 and
> > Armada 38x are SMP-capable,
> 
> I'm *not* trying to solve everything here, I'm only solving the problem
> of the C setup code matching what the assembly setup code is doing -
> which is a long standing minor bug for ARMv6+.

Right, understood. Since we have a special proc_info entry for Armada
370/XP (as they are PJ4B and not Cortex-A based), this means we can
ensure the cache policy is write-allocate for both UP and SMP for those
processors.

> > but we want to support hardware I/O
> > coherency in a CONFIG_SMP=n configuration (mainly because Armada 380 is
> > single-core). This means that in proc-v7.S, we would have to define
> > that the page tables should use the write-allocate cache policy, and
> > the shareable attribute, and this for *all* Cortex-A9, because from a
> > MIDR point of view, there is no difference between the Cortex-A9 in
> > Armada 375/38x and other Cortex-A9. And there are Cortex-A9, like the
> > Aegis thing from TI that are *not* SMP-capable, so enabling shareable
> > attributes will break.
> 
> If that's the case, we can't then support IO coherency at all - because
> to have the asm code setup the page tables without the S bit set, and
> then have the C code overwrite them with the S bit set is also a violation
> of the architecture, as Catalin said when he patched the kernel adding
> this restriction:
> 
>         /*
>          * This restriction is partly to do with the way we boot; it is
>          * unpredictable to have memory mapped using two different sets of
>          * memory attributes (shared, type, and cache attribs).  We can not
>          * change these attributes once the initial assembly has setup the
>          * page tables.
>          */
> 
> Whatever the assembly code does in respect to these attributes, the C
> code has to follow.  So, my approach is entirely correct and right.
> 
> What this means is if we can't distinguish Armada's IO coherency at
> assembly time, then we can't setup the page tables without the sharable
> bit set in the asm code, and then set them up with the sharable bit set
> in the C code.

I spoke briefly about this with Albin Tonnerre from ARM, and his
feedback is that it should not cause any problem to set the SMP bit and
the shareable attribute even on non-SMP Cortex-A9 processors such as
the Aegis. If that's correct, then we could do it for all Cortex-A9,
without having to know whether what we have is a SoC from Marvell with
I/O coherency or any other Cortex-A9 based SoC.

Would this be an acceptable solution for you?

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-06-13 10:11                     ` Thomas Petazzoni
@ 2014-06-13 10:20                       ` Russell King - ARM Linux
  2014-06-13 11:34                         ` Thomas Petazzoni
  0 siblings, 1 reply; 29+ messages in thread
From: Russell King - ARM Linux @ 2014-06-13 10:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 13, 2014 at 12:11:21PM +0200, Thomas Petazzoni wrote:
> On Fri, 30 May 2014 10:20:44 +0100, Russell King - ARM Linux wrote:
> > > Again, have you looked at the PATCH RFCv1 for this patch series? My
> > > first version was doing *exactly* what you suggest: only make the few
> > > call sites of is_smp() I was interested in to return true for the
> > > specific platforms that needed it.
> > 
> > No, and I don't intend to, because it's probably well buried.
> 
> Well, it's not that buried since it only dates back from last month and
> is available at
> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-May/256258.html.
> I think looking at the prior versions of a patch series is kind of
> important to understand the various solutions that have been
> discussed/explored, especially when comments made on later versions
> reflect back on things that where already proposed in earlier versions.

For me, "last month" is as good as buried.  With already 2300 messages
this month alone, I'm not going to be looking back through last months
8297 messages.  If it wasn't sent during the last week, it's buried.

> Right, understood. Since we have a special proc_info entry for Armada
> 370/XP (as they are PJ4B and not Cortex-A based), this means we can
> ensure the cache policy is write-allocate for both UP and SMP for those
> processors.

Yep.

> I spoke briefly about this with Albin Tonnerre from ARM, and his
> feedback is that it should not cause any problem to set the SMP bit and
> the shareable attribute even on non-SMP Cortex-A9 processors such as
> the Aegis. If that's correct, then we could do it for all Cortex-A9,
> without having to know whether what we have is a SoC from Marvell with
> I/O coherency or any other Cortex-A9 based SoC.

Setting the sharable attribute is not really a good idea - implementations
are permitted to treat a writeback sharable mapping as uncacheable, which
would be a severe performance degredation to single core CPUs.

The patches I sent during this thread are now merged into mainline.  The
setting of the shared bit, and the memory cache policy are now both
derived from the proc_info structures, specifically the __cpu_mm_mmu_flags
member.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-06-13 10:20                       ` Russell King - ARM Linux
@ 2014-06-13 11:34                         ` Thomas Petazzoni
  2014-06-13 13:51                           ` Thomas Petazzoni
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Petazzoni @ 2014-06-13 11:34 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

On Fri, 13 Jun 2014 11:20:19 +0100, Russell King - ARM Linux wrote:

> > I spoke briefly about this with Albin Tonnerre from ARM, and his
> > feedback is that it should not cause any problem to set the SMP bit and
> > the shareable attribute even on non-SMP Cortex-A9 processors such as
> > the Aegis. If that's correct, then we could do it for all Cortex-A9,
> > without having to know whether what we have is a SoC from Marvell with
> > I/O coherency or any other Cortex-A9 based SoC.
> 
> Setting the sharable attribute is not really a good idea - implementations
> are permitted to treat a writeback sharable mapping as uncacheable, which
> would be a severe performance degredation to single core CPUs.

Then maybe we could set the shareable attribute on SMP-capable
processors even if CONFIG_SMP is disabled? This way, those single core
CPUs that are really not SMP capable continue to run without the
shareable attribute, while all SMP-capable processors run with the
shareable attribute, regardless of whether the kernel is configured
CONFIG_SMP or not.

> The patches I sent during this thread are now merged into mainline.  The
> setting of the shared bit, and the memory cache policy are now both
> derived from the proc_info structures, specifically the __cpu_mm_mmu_flags
> member.

Right. So for Armada 370/XP, I'll update the proc_info structure. We
still have to find the right solution/compromise for the Armada 375/38x
though.

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-06-13 11:34                         ` Thomas Petazzoni
@ 2014-06-13 13:51                           ` Thomas Petazzoni
  2014-06-17  8:48                             ` Thomas Petazzoni
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Petazzoni @ 2014-06-13 13:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Russell,

On Fri, 13 Jun 2014 13:34:37 +0200, Thomas Petazzoni wrote:

> > The patches I sent during this thread are now merged into mainline.  The
> > setting of the shared bit, and the memory cache policy are now both
> > derived from the proc_info structures, specifically the __cpu_mm_mmu_flags
> > member.
> 
> Right. So for Armada 370/XP, I'll update the proc_info structure. We
> still have to find the right solution/compromise for the Armada 375/38x
> though.

So, I had a look at that, and there are still I believe a few issues to
solve:

 * Your change that takes the PMD flags set by the assembly code as the
   reference can easily be used by specifying mm_mmuflags =
   PMD_SECT_WBWA for Armada 370 and PMD_SECT_WBWA|PMD_SECT_S for Armada
   XP to get write-allocate on Armada 370 and write-allocate +
   shareable on Armada XP.

   However, this adjusts only the PMD flags, and does not adjust
   the TTB flags: they are still governed only by the ALT_UP/ALT_SMP
   logic:

#define TTB_FLAGS_UP	TTB_IRGN_WB|TTB_RGN_OC_WB
#define TTB_FLAGS_SMP	TTB_IRGN_WBWA|TTB_S|TTB_NOS|TTB_RGN_OC_WBWA
[...]
	ALT_SMP(orr	r0, r0, #TTB_FLAGS_SMP)
	ALT_UP(orr	r0, r0, #TTB_FLAGS_UP)

   Do you have a suggestion on how to handle this?

 * Regarding setting the SMP bit and TLB broadcast bit for Cortex-A9,
   I've moved the __v7_ca9mp_setup code outside of the common Cortex-A
   code so that it does set both bits as soon as it's a SMP-capable
   Cortex-A9. Something like (untested):

/*
 * Cortex-A9 is handled as a special case, as we want the SMP bit to
 * be set regardless of whether CONFIG_SMP is enabled or not: as soon
 * as the processor is SMP capable, we want to enable the SMP bit so
 * that Cortex-A9 processors that support hardware I/O coherency will
 * operate properly.
 */
__v7_ca9mp_setup:
	mcr	p15, 0, r0, c0, c0, 5		@ read MPIDR
	and	r0, r0, #0xc0000000		@ multiprocessing extensions and
	teq	r0, #0x80000000			@ not part of a uniprocessor system?
	bne	__v7_setup

	mrc	p15, 0, r0, c1, c0, 1
	tst	r0, #(1 << 6)
	orreq	r0, r0, #((1 << 6) | (1 << 10))
	mcreq	p15, 0, r0, c1, c0, 1
	b	__v7_setup

   Would this be acceptable?

 * Regarding setting the write allocate and shareable attribute in the
   PMD bits for SMP-capable Cortex-A9, I was hoping to be able to do
   that from the __v7_ca9mp_setup function above, but in fact this
   per-processor 'initfunc' is called *after* the page tables are
   created. So I'm not sure where to put the logic that detects we have
   an SMP-capable Cortex-A9 and tweak the PMD flags accordingly. Any
   suggestion?

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-06-13 13:51                           ` Thomas Petazzoni
@ 2014-06-17  8:48                             ` Thomas Petazzoni
  2014-06-17 11:26                               ` Russell King - ARM Linux
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Petazzoni @ 2014-06-17  8:48 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

Do you have some input/feedback about the below proposals/questions?

Thanks a lot!

Thomas

On Fri, 13 Jun 2014 15:51:39 +0200, Thomas Petazzoni wrote:
> Hello Russell,
> 
> On Fri, 13 Jun 2014 13:34:37 +0200, Thomas Petazzoni wrote:
> 
> > > The patches I sent during this thread are now merged into mainline.  The
> > > setting of the shared bit, and the memory cache policy are now both
> > > derived from the proc_info structures, specifically the __cpu_mm_mmu_flags
> > > member.
> > 
> > Right. So for Armada 370/XP, I'll update the proc_info structure. We
> > still have to find the right solution/compromise for the Armada 375/38x
> > though.
> 
> So, I had a look at that, and there are still I believe a few issues to
> solve:
> 
>  * Your change that takes the PMD flags set by the assembly code as the
>    reference can easily be used by specifying mm_mmuflags =
>    PMD_SECT_WBWA for Armada 370 and PMD_SECT_WBWA|PMD_SECT_S for Armada
>    XP to get write-allocate on Armada 370 and write-allocate +
>    shareable on Armada XP.
> 
>    However, this adjusts only the PMD flags, and does not adjust
>    the TTB flags: they are still governed only by the ALT_UP/ALT_SMP
>    logic:
> 
> #define TTB_FLAGS_UP	TTB_IRGN_WB|TTB_RGN_OC_WB
> #define TTB_FLAGS_SMP	TTB_IRGN_WBWA|TTB_S|TTB_NOS|TTB_RGN_OC_WBWA
> [...]
> 	ALT_SMP(orr	r0, r0, #TTB_FLAGS_SMP)
> 	ALT_UP(orr	r0, r0, #TTB_FLAGS_UP)
> 
>    Do you have a suggestion on how to handle this?
> 
>  * Regarding setting the SMP bit and TLB broadcast bit for Cortex-A9,
>    I've moved the __v7_ca9mp_setup code outside of the common Cortex-A
>    code so that it does set both bits as soon as it's a SMP-capable
>    Cortex-A9. Something like (untested):
> 
> /*
>  * Cortex-A9 is handled as a special case, as we want the SMP bit to
>  * be set regardless of whether CONFIG_SMP is enabled or not: as soon
>  * as the processor is SMP capable, we want to enable the SMP bit so
>  * that Cortex-A9 processors that support hardware I/O coherency will
>  * operate properly.
>  */
> __v7_ca9mp_setup:
> 	mcr	p15, 0, r0, c0, c0, 5		@ read MPIDR
> 	and	r0, r0, #0xc0000000		@ multiprocessing extensions and
> 	teq	r0, #0x80000000			@ not part of a uniprocessor system?
> 	bne	__v7_setup
> 
> 	mrc	p15, 0, r0, c1, c0, 1
> 	tst	r0, #(1 << 6)
> 	orreq	r0, r0, #((1 << 6) | (1 << 10))
> 	mcreq	p15, 0, r0, c1, c0, 1
> 	b	__v7_setup
> 
>    Would this be acceptable?
> 
>  * Regarding setting the write allocate and shareable attribute in the
>    PMD bits for SMP-capable Cortex-A9, I was hoping to be able to do
>    that from the __v7_ca9mp_setup function above, but in fact this
>    per-processor 'initfunc' is called *after* the page tables are
>    created. So I'm not sure where to put the logic that detects we have
>    an SMP-capable Cortex-A9 and tweak the PMD flags accordingly. Any
>    suggestion?
> 
> Thanks!
> 
> Thomas



-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-06-17  8:48                             ` Thomas Petazzoni
@ 2014-06-17 11:26                               ` Russell King - ARM Linux
  2014-06-28 15:39                                 ` Thomas Petazzoni
  0 siblings, 1 reply; 29+ messages in thread
From: Russell King - ARM Linux @ 2014-06-17 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 17, 2014 at 10:48:54AM +0200, Thomas Petazzoni wrote:
> Russell,
> 
> Do you have some input/feedback about the below proposals/questions?

Not at the moment, I'm busy sorting through the 150-odd months-old
patches which remain in my tree post-merge window, working through
regressions, and trying to sort them out so I can (re-)post them and
hopefully get rid of them.

Of course, if it didn't take many months to get patches into mainline,
then I wouldn't be doing this right now, and I could spend more time
thinking more about these sorts of issues, rather than (eg) thinking
about how to generate DT binding documentation for drivers which have
been merged without that required documentation, which I then need to
modify, and then promptly get asked to write that required documentation
for.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-06-17 11:26                               ` Russell King - ARM Linux
@ 2014-06-28 15:39                                 ` Thomas Petazzoni
  2014-06-28 16:34                                   ` Russell King - ARM Linux
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Petazzoni @ 2014-06-28 15:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Russell,

On Tue, 17 Jun 2014 12:26:24 +0100, Russell King - ARM Linux wrote:
> On Tue, Jun 17, 2014 at 10:48:54AM +0200, Thomas Petazzoni wrote:
> > Russell,
> > 
> > Do you have some input/feedback about the below proposals/questions?
> 
> Not at the moment, I'm busy sorting through the 150-odd months-old
> patches which remain in my tree post-merge window, working through
> regressions, and trying to sort them out so I can (re-)post them and
> hopefully get rid of them.
> 
> Of course, if it didn't take many months to get patches into mainline,
> then I wouldn't be doing this right now, and I could spend more time
> thinking more about these sorts of issues, rather than (eg) thinking
> about how to generate DT binding documentation for drivers which have
> been merged without that required documentation, which I then need to
> modify, and then promptly get asked to write that required documentation
> for.

I surely understand that it sometimes takes a lot of time to get some
feedback from maintainers, I also had similar troubles with the
maintainers of certain subsystems.

However, I'd really like to see the hardware I/O coherency issue make
some progress: it's currently completely broken on Armada 370, causing
random corruptions, and the only solution is to be able to properly
enable write-allocate. Your patch series makes that partially possible,
but there are some remaining problems, which I summarized at
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-June/263524.html.

Thanks a lot for your help and suggestions,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-06-28 15:39                                 ` Thomas Petazzoni
@ 2014-06-28 16:34                                   ` Russell King - ARM Linux
  2014-06-28 16:52                                     ` Thomas Petazzoni
  0 siblings, 1 reply; 29+ messages in thread
From: Russell King - ARM Linux @ 2014-06-28 16:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Jun 28, 2014 at 05:39:55PM +0200, Thomas Petazzoni wrote:
> Hello Russell,
> 
> On Tue, 17 Jun 2014 12:26:24 +0100, Russell King - ARM Linux wrote:
> > On Tue, Jun 17, 2014 at 10:48:54AM +0200, Thomas Petazzoni wrote:
> > > Russell,
> > > 
> > > Do you have some input/feedback about the below proposals/questions?
> > 
> > Not at the moment, I'm busy sorting through the 150-odd months-old
> > patches which remain in my tree post-merge window, working through
> > regressions, and trying to sort them out so I can (re-)post them and
> > hopefully get rid of them.
> > 
> > Of course, if it didn't take many months to get patches into mainline,
> > then I wouldn't be doing this right now, and I could spend more time
> > thinking more about these sorts of issues, rather than (eg) thinking
> > about how to generate DT binding documentation for drivers which have
> > been merged without that required documentation, which I then need to
> > modify, and then promptly get asked to write that required documentation
> > for.
> 
> I surely understand that it sometimes takes a lot of time to get some
> feedback from maintainers, I also had similar troubles with the
> maintainers of certain subsystems.

Yes, it takes /months/.  The typical way things work is you send a
patch set of any size.  You get a number of minor comments on it.  You
address those comments.  You get another set of minor comments on a
different selection of patches.  You address those.  So the loop
continues, wasting lots of time updating the patches and re-posting
them.

That means that the maintainers who are trying to push those patches
are consumed with this process trying to deal with these other people
rather than working on the area that they're supposed to be maintaining.

It's a vicious circle where eventually no one is doing any useful work -
one which I see kernel development heading headlong towards.

I still have a significant quantity of patches (slightly more than
when I sent my reply on the 17th of June) with only a relatively small
amount of movement towards getting some of them accepted by maintainers.

I'm not talking about patches which I've only just created - I'm talking
about patches which are more than /six/ months old.

The more time this takes, the less time I have to look at core ARM stuff.

Then you have problems like the fscked regulator code, which seems to be
the only code in the kernel which doesn't allow GPIO number 0 to be
specified, not even via DT.  It just ignores it.  It seems that I'm
having to work to fix that code, rather than being able to report it
as a bug and have the bug fixed.

There is very little to no sharing of these issues, so consequently I'm
the one who ends up with _loads_ of crap on my plate to try and solve.

> However, I'd really like to see the hardware I/O coherency issue make
> some progress: it's currently completely broken on Armada 370, causing
> random corruptions, and the only solution is to be able to properly
> enable write-allocate. Your patch series makes that partially possible,
> but there are some remaining problems, which I summarized at
> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-June/263524.html.

Yes, I'm aware that there are still some remaining problems, but I don't
have a solution for it yet.

What we really need for your case is someway to detect your SoCs in the
early assembly code - and right now I don't have any idea how to do that.
The reason we need to do that is because we must have the write-allocate
cache mode and the shared bits set appropriately from the initial page
table setup in the early assembly code, because we can't just overwrite
the existing page tables with differing cache attributes without polluting
the TLB with those differing cache attributes (which can lead to
unpredictable behaviour.)

The Armada 37x/38x problem is a very difficult one to solve... really
I wish that those spins of the SoC had been chucked in the trash, because
working around this properly is going to take a lot of time and effort.
That would've been /far/ better for us.

I'll also note that it would've been a _lot_ easier to work around had
we not had this DT stuff, and kept the machine IDs being passed to the
kernel.  Such is life though, while DT makes some stuff easier, it
makes other stuff absolute hell.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+
  2014-06-28 16:34                                   ` Russell King - ARM Linux
@ 2014-06-28 16:52                                     ` Thomas Petazzoni
  0 siblings, 0 replies; 29+ messages in thread
From: Thomas Petazzoni @ 2014-06-28 16:52 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

On Sat, 28 Jun 2014 17:34:13 +0100, Russell King - ARM Linux wrote:

> > I surely understand that it sometimes takes a lot of time to get some
> > feedback from maintainers, I also had similar troubles with the
> > maintainers of certain subsystems.
> 
> Yes, it takes /months/.  The typical way things work is you send a
> patch set of any size.  You get a number of minor comments on it.  You
> address those comments.  You get another set of minor comments on a
> different selection of patches.  You address those.  So the loop
> continues, wasting lots of time updating the patches and re-posting
> them.

Right, but that's the process we all go through. It indeed takes time,
but I believe it's the price to pay for a project that big with so many
people involved. On the other hand, I agree that some maintainers that
no longer have enough time to maintain their subsystem/area should
probably try harder to pass the role to some other developers.

> That means that the maintainers who are trying to push those patches
> are consumed with this process trying to deal with these other people
> rather than working on the area that they're supposed to be maintaining.

True, but as I said earlier, the process you're going through is the
one every kernel developer is going through, and I'm not sure why the
process should be different for a kernel maintainer or a random
kernel developer.

> I still have a significant quantity of patches (slightly more than
> when I sent my reply on the 17th of June) with only a relatively small
> amount of movement towards getting some of them accepted by maintainers.
> 
> I'm not talking about patches which I've only just created - I'm talking
> about patches which are more than /six/ months old.

I'd say that after a few months without any useful feedback or answer,
patches should get merged by the maintainer. Inaction is terrible, as
it de-motivates developers. But I'm sure not everybody would agree with
such a proposal :)

> > However, I'd really like to see the hardware I/O coherency issue make
> > some progress: it's currently completely broken on Armada 370, causing
> > random corruptions, and the only solution is to be able to properly
> > enable write-allocate. Your patch series makes that partially possible,
> > but there are some remaining problems, which I summarized at
> > http://lists.infradead.org/pipermail/linux-arm-kernel/2014-June/263524.html.
> 
> Yes, I'm aware that there are still some remaining problems, but I don't
> have a solution for it yet.
> 
> What we really need for your case is someway to detect your SoCs in the
> early assembly code - and right now I don't have any idea how to do that.

Well, I'd like to *first* solve the Armada 370 issue, and the Armada
370 is using a specific ARM core, so it can definitely be detected
early on in the assembly code.

So for the Armada 370 most of the problems are solved thanks to your
previous patch, the only remaining problem is the TTB_FLAGS problem I
highlighted in
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-June/263524.html.

It's entirely solvable in the early assembly code, I just need your
input on what you think is the right direction to solve this: currently
TTB_FLAGS are defined globally, with one value for UP and one value for
SMP. I would need to make those TTB_FLAGS part of the proc_info
structure, so that Armada 370 and Armada XP (which use special ARM
cores and therefore have separate proc_info structures).

Would you agree with such a change?

Of course this is only needed if the TTB_FLAGS need to match the cache
policy used in the PMD_FLAGS, but I believe Catalin once said that it
was the case. Maybe Catalin, Will or Albin from ARM could
confirm/infirm?

> The reason we need to do that is because we must have the write-allocate
> cache mode and the shared bits set appropriately from the initial page
> table setup in the early assembly code, because we can't just overwrite
> the existing page tables with differing cache attributes without polluting
> the TLB with those differing cache attributes (which can lead to
> unpredictable behaviour.)
> 
> The Armada 37x/38x problem is a very difficult one to solve... really
> I wish that those spins of the SoC had been chucked in the trash, because
> working around this properly is going to take a lot of time and effort.
> That would've been /far/ better for us.

What about simply making write-allocate + shared attribute the default
on all SMP-capable Cortex-A9 ?

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2014-06-28 16:52 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-20 15:35 [PATCH RFCv2 0/5] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
2014-05-20 15:35 ` [PATCH RFCv2 1/5] ARM: use write allocate by default on ARMv6+ Thomas Petazzoni
2014-05-26 16:26   ` Russell King - ARM Linux
2014-05-27  9:15     ` Thomas Petazzoni
2014-05-27  9:47       ` Russell King - ARM Linux
2014-05-27 11:52         ` Thomas Petazzoni
2014-05-27 12:18           ` Russell King - ARM Linux
2014-05-27 12:37             ` Thomas Petazzoni
2014-05-28 11:24               ` Russell King - ARM Linux
2014-05-30  8:08                 ` Thomas Petazzoni
2014-05-30  9:20                   ` Russell King - ARM Linux
2014-06-13 10:11                     ` Thomas Petazzoni
2014-06-13 10:20                       ` Russell King - ARM Linux
2014-06-13 11:34                         ` Thomas Petazzoni
2014-06-13 13:51                           ` Thomas Petazzoni
2014-06-17  8:48                             ` Thomas Petazzoni
2014-06-17 11:26                               ` Russell King - ARM Linux
2014-06-28 15:39                                 ` Thomas Petazzoni
2014-06-28 16:34                                   ` Russell King - ARM Linux
2014-06-28 16:52                                     ` Thomas Petazzoni
2014-05-20 15:35 ` [PATCH RFCv2 2/5] ARM: kernel: allow the SCU to be enabled even on !SMP Thomas Petazzoni
2014-05-20 15:35 ` [PATCH RFCv2 3/5] ARM: allow CONFIG_SMP_ON_UP on non-SMP configurations Thomas Petazzoni
2014-05-26 16:35   ` Russell King - ARM Linux
2014-05-27 11:59     ` Thomas Petazzoni
2014-05-27 12:22       ` Russell King - ARM Linux
2014-05-27 12:47         ` Thomas Petazzoni
2014-05-20 15:35 ` [PATCH RFCv2 4/5] ARM: mvebu: enable coherency only when possible on Armada XP Thomas Petazzoni
2014-05-20 15:35 ` [PATCH RFCv2 5/5] ARM: mvebu: add debugging message indicating the status of I/O coherency Thomas Petazzoni
2014-05-26 13:32 ` [PATCH RFCv2 0/5] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).