linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/1] ARM: mm: cache shareability tweak
@ 2016-04-12  8:14 Tero Kristo
  2016-04-12  8:14 ` [RFC 1/1] ARM: mm: add support for specifying shareability level via cmdline Tero Kristo
  2016-04-12 13:25 ` [RFC 0/1] ARM: mm: cache shareability tweak Mark Rutland
  0 siblings, 2 replies; 5+ messages in thread
From: Tero Kristo @ 2016-04-12  8:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

This RFC patch attempts to implement support for specifying cache
shareability setting via kernel cmdline. This is required at least
for TI keystone2 generation SoCs, where DMA masters are snooping on
the cache maintenance messages to maintain coherency. Currently we
are carrying an internal hack that modifies the macros via #ifdefs,
this is obviously bad as the same kernel image can only work with
keystone2 (or at least might be causing problems with other SoCs.)
It would be very much preferred to replace this hardcoded
implementation with a runtime solution.

Some obvious holes in this implementation:

1) during execution of arch/arm/kernel/head.S, the tweaked MMU shareability
   settings are not in place. However, I am not too sure how much that
   matters, as I am not sure what is mapped at this point. Kernel image
   mapping should not matter at least, as we typically should not be doing
   any DMA transfers from the kernel image. I would like some comments on
   this, if handling during head.S should be fixed also, how can this be
   done? Some hack under compressed/keystone-head.S?

2) the cmdline parameter could be something more descriptive

3) The single RFC patch should probably be split up a bit

- Tero

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC 1/1] ARM: mm: add support for specifying shareability level via cmdline
  2016-04-12  8:14 [RFC 0/1] ARM: mm: cache shareability tweak Tero Kristo
@ 2016-04-12  8:14 ` Tero Kristo
  2016-04-12 13:25 ` [RFC 0/1] ARM: mm: cache shareability tweak Mark Rutland
  1 sibling, 0 replies; 5+ messages in thread
From: Tero Kristo @ 2016-04-12  8:14 UTC (permalink / raw)
  To: linux-arm-kernel

pmd_sect_s=x can be used to specify cache coherency level for the CPU.
The value is used to setup cache coherency level for specific MMU
mappings. Currently kernel only supports inner shareable attribute.

This feature is required at least for keystone DMA cache coherency
support, where pages meant to be used by DMA must be marked as outer
shareable also, so the DMA masters can snoop the maintenance messages.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
---
 arch/arm/include/asm/fixmap.h  |    2 +-
 arch/arm/include/asm/pgtable.h |    3 +++
 arch/arm/kernel/setup.c        |   39 +++++++++++++++++++++++++++++++---
 arch/arm/mm/dump.c             |    3 +++
 arch/arm/mm/mmu.c              |   46 +++++++++++++++++++++++++---------------
 5 files changed, 72 insertions(+), 21 deletions(-)

diff --git a/arch/arm/include/asm/fixmap.h b/arch/arm/include/asm/fixmap.h
index 5c17d2d..310981a 100644
--- a/arch/arm/include/asm/fixmap.h
+++ b/arch/arm/include/asm/fixmap.h
@@ -45,7 +45,7 @@ static const enum fixed_addresses __end_of_fixed_addresses =
 #define FIXMAP_PAGE_RO		(FIXMAP_PAGE_NORMAL | L_PTE_RDONLY)
 
 /* Used by set_fixmap_(io|nocache), both meant for mapping a device */
-#define FIXMAP_PAGE_IO		(FIXMAP_PAGE_COMMON | L_PTE_MT_DEV_SHARED | L_PTE_SHARED)
+#define FIXMAP_PAGE_IO		(FIXMAP_PAGE_COMMON | L_PTE_MT_DEV_SHARED | l_pte_shared)
 #define FIXMAP_PAGE_NOCACHE	FIXMAP_PAGE_IO
 
 #define __early_set_fixmap	__set_fixmap
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 348caab..e8a28aa 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -86,6 +86,9 @@ extern pgprot_t		pgprot_hyp_device;
 extern pgprot_t		pgprot_s2;
 extern pgprot_t		pgprot_s2_device;
 
+extern pteval_t		pmd_sect_s;
+extern pteval_t		l_pte_shared;
+
 #define _MOD_PROT(p, b)	__pgprot(pgprot_val(p) | (b))
 
 #define PAGE_NONE		_MOD_PROT(pgprot_user, L_PTE_XN | L_PTE_RDONLY | L_PTE_NONE)
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 139791e..3836c9a 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -33,6 +33,7 @@
 #include <linux/compiler.h>
 #include <linux/sort.h>
 #include <linux/psci.h>
+#include <linux/moduleparam.h>
 
 #include <asm/unified.h>
 #include <asm/cp15.h>
@@ -668,6 +669,8 @@ static void __init smp_build_mpidr_hash(void)
 }
 #endif
 
+static unsigned long cpu_mm_mmu_flags;
+
 static void __init setup_processor(void)
 {
 	struct proc_info_list *list;
@@ -704,6 +707,8 @@ static void __init setup_processor(void)
 		cpu_name, read_cpuid_id(), read_cpuid_id() & 15,
 		proc_arch[cpu_architecture()], get_cr());
 
+	cpu_mm_mmu_flags = list->__cpu_mm_mmu_flags;
+
 	snprintf(init_utsname()->machine, __NEW_UTS_LEN + 1, "%s%c",
 		 list->arch_name, ENDIANNESS);
 	snprintf(elf_platform, ELF_PLATFORM_SIZE, "%s%c",
@@ -716,9 +721,6 @@ static void __init setup_processor(void)
 #ifndef CONFIG_ARM_THUMB
 	elf_hwcap &= ~(HWCAP_THUMB | HWCAP_IDIVT);
 #endif
-#ifdef CONFIG_MMU
-	init_default_cache_policy(list->__cpu_mm_mmu_flags);
-#endif
 	erratum_a15_798181_init();
 
 	elf_hwcap_fixup();
@@ -1002,6 +1004,36 @@ void __init hyp_mode_check(void)
 #endif
 }
 
+pteval_t pmd_sect_s = PMD_SECT_S;
+pteval_t l_pte_shared = L_PTE_SHARED;
+
+static int __init parse_mmu_setup(char *str)
+{
+	u32 new_val;
+
+	if (!get_option(&str, &new_val))
+		return -EINVAL;
+
+	if (new_val < 2 || new_val > 3) {
+		pr_err("%s: bad value: %d\n", __func__, new_val);
+		return -EINVAL;
+	}
+
+	new_val = _AT(pteval_t, new_val) << 8;
+
+	pr_info("%s: setting PMD_SECT_S to %08x, old-val%08x\n", __func__,
+		new_val, (u32)pmd_sect_s);
+
+	pmd_sect_s = new_val;
+	l_pte_shared = new_val;
+
+	cpu_mm_mmu_flags = ~(u32)PMD_SECT_S;
+	cpu_mm_mmu_flags |= pmd_sect_s;
+
+	return 0;
+}
+early_param("pmd_sect_s", parse_mmu_setup);
+
 void __init setup_arch(char **cmdline_p)
 {
 	const struct machine_desc *mdesc;
@@ -1032,6 +1064,7 @@ void __init setup_arch(char **cmdline_p)
 	parse_early_param();
 
 #ifdef CONFIG_MMU
+	init_default_cache_policy(cpu_mm_mmu_flags);
 	early_paging_init(mdesc);
 #endif
 	setup_dma_zone(mdesc);
diff --git a/arch/arm/mm/dump.c b/arch/arm/mm/dump.c
index 9fe8e24..319412e 100644
--- a/arch/arm/mm/dump.c
+++ b/arch/arm/mm/dump.c
@@ -346,6 +346,9 @@ static int ptdump_init(void)
 	struct dentry *pe;
 	unsigned i, j;
 
+	if (l_pte_shared != L_PTE_SHARED)
+		pte_bits[3].val = l_pte_shared;
+
 	for (i = 0; i < ARRAY_SIZE(pg_level); i++)
 		if (pg_level[i].bits)
 			for (j = 0; j < pg_level[i].num; j++)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 62f4d01..4140acb 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -454,9 +454,9 @@ static void __init build_mem_type_table(void)
 			pr_warn("Forcing write-allocate cache policy for SMP\n");
 			cachepolicy = CPOLICY_WRITEALLOC;
 		}
-		if (!(initial_pmd_value & PMD_SECT_S)) {
+		if ((initial_pmd_value & PMD_SECT_S) != pmd_sect_s) {
 			pr_warn("Forcing shared mappings for SMP\n");
-			initial_pmd_value |= PMD_SECT_S;
+			initial_pmd_value |= pmd_sect_s;
 		}
 	}
 
@@ -591,27 +591,39 @@ static void __init build_mem_type_table(void)
 		mem_types[MT_CACHECLEAN].prot_sect |= PMD_SECT_APX|PMD_SECT_AP_WRITE;
 #endif
 
+		mem_types[MT_DEVICE].prot_sect &= ~PMD_SECT_S;
+		mem_types[MT_DEVICE].prot_sect |= pmd_sect_s;
+
+		mem_types[MT_DEVICE].prot_pte &= ~L_PTE_SHARED;
+		mem_types[MT_DEVICE].prot_pte |= l_pte_shared;
+
+		mem_types[MT_DEVICE].prot_pte_s2 &= ~L_PTE_SHARED;
+		mem_types[MT_DEVICE].prot_pte_s2 |= l_pte_shared;
+
+		mem_types[MT_MEMORY_RW_SO].prot_sect &= ~PMD_SECT_S;
+		mem_types[MT_MEMORY_RW_SO].prot_sect |= pmd_sect_s;
+
 		/*
 		 * If the initial page tables were created with the S bit
 		 * set, then we need to do the same here for the same
 		 * reasons given in early_cachepolicy().
 		 */
 		if (initial_pmd_value & PMD_SECT_S) {
-			user_pgprot |= L_PTE_SHARED;
-			kern_pgprot |= L_PTE_SHARED;
-			vecs_pgprot |= L_PTE_SHARED;
-			s2_pgprot |= L_PTE_SHARED;
-			mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
-			mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
-			mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
-			mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
-			mem_types[MT_MEMORY_RWX].prot_sect |= PMD_SECT_S;
-			mem_types[MT_MEMORY_RWX].prot_pte |= L_PTE_SHARED;
-			mem_types[MT_MEMORY_RW].prot_sect |= PMD_SECT_S;
-			mem_types[MT_MEMORY_RW].prot_pte |= L_PTE_SHARED;
-			mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
-			mem_types[MT_MEMORY_RWX_NONCACHED].prot_sect |= PMD_SECT_S;
-			mem_types[MT_MEMORY_RWX_NONCACHED].prot_pte |= L_PTE_SHARED;
+			user_pgprot |= l_pte_shared;
+			kern_pgprot |= l_pte_shared;
+			vecs_pgprot |= l_pte_shared;
+			s2_pgprot |= l_pte_shared;
+			mem_types[MT_DEVICE_WC].prot_sect |= pmd_sect_s;
+			mem_types[MT_DEVICE_WC].prot_pte |= l_pte_shared;
+			mem_types[MT_DEVICE_CACHED].prot_sect |= pmd_sect_s;
+			mem_types[MT_DEVICE_CACHED].prot_pte |= l_pte_shared;
+			mem_types[MT_MEMORY_RWX].prot_sect |= pmd_sect_s;
+			mem_types[MT_MEMORY_RWX].prot_pte |= l_pte_shared;
+			mem_types[MT_MEMORY_RW].prot_sect |= pmd_sect_s;
+			mem_types[MT_MEMORY_RW].prot_pte |= l_pte_shared;
+			mem_types[MT_MEMORY_DMA_READY].prot_pte |= l_pte_shared;
+			mem_types[MT_MEMORY_RWX_NONCACHED].prot_sect |= pmd_sect_s;
+			mem_types[MT_MEMORY_RWX_NONCACHED].prot_pte |= l_pte_shared;
 		}
 	}
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC 0/1] ARM: mm: cache shareability tweak
  2016-04-12  8:14 [RFC 0/1] ARM: mm: cache shareability tweak Tero Kristo
  2016-04-12  8:14 ` [RFC 1/1] ARM: mm: add support for specifying shareability level via cmdline Tero Kristo
@ 2016-04-12 13:25 ` Mark Rutland
  2016-04-12 14:06   ` Tero Kristo
  1 sibling, 1 reply; 5+ messages in thread
From: Mark Rutland @ 2016-04-12 13:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 12, 2016 at 11:14:39AM +0300, Tero Kristo wrote:
> Hi,
> 
> This RFC patch attempts to implement support for specifying cache
> shareability setting via kernel cmdline. This is required at least
> for TI keystone2 generation SoCs, where DMA masters are snooping on
> the cache maintenance messages to maintain coherency. Currently we
> are carrying an internal hack that modifies the macros via #ifdefs,
> this is obviously bad as the same kernel image can only work with
> keystone2 (or at least might be causing problems with other SoCs.)

The de-facto semantics (which we should codify) for dma-coherent with
ARMv7 is that a device makes accesses which are coherent with Normal,
Inner Shareable, Inner Write-Back, Outer Write-Back.

In arch/arm/boot/dts/keystone.dtsi I see that /soc/usb at 2680000 has a
dma-coherent flag. Is that device coherent today with upstream? Or is
that misleading currently?

If the device isn't coherent with that, then dma-coherent isn't strictly
true (and should go), and we need additional properties to correctly
describe this case.

> It would be very much preferred to replace this hardcoded
> implementation with a runtime solution.
> 
> Some obvious holes in this implementation:
> 
> 1) during execution of arch/arm/kernel/head.S, the tweaked MMU shareability
>    settings are not in place. However, I am not too sure how much that
>    matters, as I am not sure what is mapped at this point. Kernel image
>    mapping should not matter at least, as we typically should not be doing
>    any DMA transfers from the kernel image.

Strictly speaking, changing the shareability can result in a loss of
coherency, even if all accesses are made by the same CPU. See
"Mismatched memory attributes" in section A3.5.7 of the ARMv7-AR
Reference Manual (ARM DDI 0406C.c).

It's not just DMA that matters. I believe we may have page tables as
part of the kernel image, for instance, and those need to be accessed
with consistent attributes by the MMU when doing page table walks.

You can avoid issues so long as you have appropriate cache maintenance,
but that's both expensive (all memory previously mapped must be
Clean+Invalidated by VA) and painful (as you can't reliably use any of
said memory until after the maintenance).

>    I would like some comments on this, if handling during head.S
>    should be fixed also, how can this be done? Some hack under
>    compressed/keystone-head.S?

If you need to do this, you need consistent attributes from the outset,
or you need to disable the MMU, perform cache maintenance, and re-enter
the kernel.

> 2) the cmdline parameter could be something more descriptive
> 
> 3) The single RFC patch should probably be split up a bit

4) It isn't possible to use dma-coherent to describe this without
   weakening the semantics so as to be meaningless in general. So if we
   go for this approach we need a mechanism to accurately describe the
   coherency guarantees of masters in the system beyond a boolean.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC 0/1] ARM: mm: cache shareability tweak
  2016-04-12 13:25 ` [RFC 0/1] ARM: mm: cache shareability tweak Mark Rutland
@ 2016-04-12 14:06   ` Tero Kristo
  2016-04-12 15:00     ` Mark Rutland
  0 siblings, 1 reply; 5+ messages in thread
From: Tero Kristo @ 2016-04-12 14:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/12/2016 04:25 PM, Mark Rutland wrote:
> On Tue, Apr 12, 2016 at 11:14:39AM +0300, Tero Kristo wrote:
>> Hi,
>>
>> This RFC patch attempts to implement support for specifying cache
>> shareability setting via kernel cmdline. This is required at least
>> for TI keystone2 generation SoCs, where DMA masters are snooping on
>> the cache maintenance messages to maintain coherency. Currently we
>> are carrying an internal hack that modifies the macros via #ifdefs,
>> this is obviously bad as the same kernel image can only work with
>> keystone2 (or at least might be causing problems with other SoCs.)
>
> The de-facto semantics (which we should codify) for dma-coherent with
> ARMv7 is that a device makes accesses which are coherent with Normal,
> Inner Shareable, Inner Write-Back, Outer Write-Back.
>
> In arch/arm/boot/dts/keystone.dtsi I see that /soc/usb at 2680000 has a
> dma-coherent flag. Is that device coherent today with upstream? Or is
> that misleading currently?

Good question, Murali, can you comment on this? What peripherals are 
actually requiring the DMA coherency on K2?

>
> If the device isn't coherent with that, then dma-coherent isn't strictly
> true (and should go), and we need additional properties to correctly
> describe this case.
>
>> It would be very much preferred to replace this hardcoded
>> implementation with a runtime solution.
>>
>> Some obvious holes in this implementation:
>>
>> 1) during execution of arch/arm/kernel/head.S, the tweaked MMU shareability
>>     settings are not in place. However, I am not too sure how much that
>>     matters, as I am not sure what is mapped at this point. Kernel image
>>     mapping should not matter at least, as we typically should not be doing
>>     any DMA transfers from the kernel image.
>
> Strictly speaking, changing the shareability can result in a loss of
> coherency, even if all accesses are made by the same CPU. See
> "Mismatched memory attributes" in section A3.5.7 of the ARMv7-AR
> Reference Manual (ARM DDI 0406C.c).

Basically we are not attempting to change shareability in-the-fly, but 
instead configure a different shareability value that is going to be 
used always.

>
> It's not just DMA that matters. I believe we may have page tables as
> part of the kernel image, for instance, and those need to be accessed
> with consistent attributes by the MMU when doing page table walks.
>
> You can avoid issues so long as you have appropriate cache maintenance,
> but that's both expensive (all memory previously mapped must be
> Clean+Invalidated by VA) and painful (as you can't reliably use any of
> said memory until after the maintenance).

The hack we have internally just maps all the DMA pages as outer 
shareable. I think maybe adding the original hack might help 
understanding the issue, so added inline in the end as reference. We 
just attempt to change the shareability value from 3 (the current) to 2.

>
>>     I would like some comments on this, if handling during head.S
>>     should be fixed also, how can this be done? Some hack under
>>     compressed/keystone-head.S?
>
> If you need to do this, you need consistent attributes from the outset,
> or you need to disable the MMU, perform cache maintenance, and re-enter
> the kernel.
>
>> 2) the cmdline parameter could be something more descriptive
>>
>> 3) The single RFC patch should probably be split up a bit
>
> 4) It isn't possible to use dma-coherent to describe this without
>     weakening the semantics so as to be meaningless in general. So if we
>     go for this approach we need a mechanism to accurately describe the
>     coherency guarantees of masters in the system beyond a boolean.
>
> Thanks,
> Mark.
>


diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h 
b/arch/arm/include/asm/pgtable-3level-hwdef.h
index f8f1cff..62adf21 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -44,7 +44,11 @@
  #define PMD_SECT_CACHEABLE	(_AT(pmdval_t, 1) << 3)
  #define PMD_SECT_USER		(_AT(pmdval_t, 1) << 6)		/* AP[1] */
  #define PMD_SECT_AP2		(_AT(pmdval_t, 1) << 7)		/* read only */
+#ifdef CONFIG_KEYSTONE2_DMA_COHERENT
+#define PMD_SECT_S		(_AT(pmdval_t, 2) << 8)
+#else
  #define PMD_SECT_S		(_AT(pmdval_t, 3) << 8)
+#endif
  #define PMD_SECT_AF		(_AT(pmdval_t, 1) << 10)
  #define PMD_SECT_nG		(_AT(pmdval_t, 1) << 11)
  #define PMD_SECT_PXN		(_AT(pmdval_t, 1) << 53)
@@ -73,7 +77,12 @@
  #define PTE_BUFFERABLE		(_AT(pteval_t, 1) << 2)		/* AttrIndx[0] */
  #define PTE_CACHEABLE		(_AT(pteval_t, 1) << 3)		/* AttrIndx[1] */
  #define PTE_AP2			(_AT(pteval_t, 1) << 7)		/* AP[2] */
+#ifdef CONFIG_KEYSTONE2_DMA_COHERENT
+/* SH[1:0], outer shareable */
+#define PTE_EXT_SHARED		(_AT(pteval_t, 2) << 8)
+#else
  #define PTE_EXT_SHARED		(_AT(pteval_t, 3) << 8)		/* SH[1:0], inner 
shareable */
+#endif
  #define PTE_EXT_AF		(_AT(pteval_t, 1) << 10)	/* Access Flag */
  #define PTE_EXT_NG		(_AT(pteval_t, 1) << 11)	/* nG */
  #define PTE_EXT_PXN		(_AT(pteval_t, 1) << 53)	/* PXN */
diff --git a/arch/arm/include/asm/pgtable-3level.h 
b/arch/arm/include/asm/pgtable-3level.h
index a745a2a..b4090b1 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -78,7 +78,12 @@
  #define L_PTE_VALID		(_AT(pteval_t, 1) << 0)		/* Valid */
  #define L_PTE_PRESENT		(_AT(pteval_t, 3) << 0)		/* Present */
  #define L_PTE_USER		(_AT(pteval_t, 1) << 6)		/* AP[1] */
+#ifdef CONFIG_KEYSTONE2_DMA_COHERENT
+/* SH[1:0], outer shareable */
+#define L_PTE_SHARED		(_AT(pteval_t, 2) << 8)
+#else
  #define L_PTE_SHARED		(_AT(pteval_t, 3) << 8)		/* SH[1:0], inner 
shareable */
+#endif
  #define L_PTE_YOUNG		(_AT(pteval_t, 1) << 10)	/* AF */
  #define L_PTE_XN		(_AT(pteval_t, 1) << 54)	/* XN */
  #define L_PTE_DIRTY		(_AT(pteval_t, 1) << 55)
diff --git a/arch/arm/mach-keystone/Kconfig b/arch/arm/mach-keystone/Kconfig
index ea955f6db..558385e 100644
--- a/arch/arm/mach-keystone/Kconfig
+++ b/arch/arm/mach-keystone/Kconfig
@@ -11,6 +11,10 @@ config ARCH_KEYSTONE
  	select ZONE_DMA if ARM_LPAE
  	select MIGHT_HAVE_PCI
  	select PCI_DOMAINS if PCI
+	select KEYSTONE2_DMA_COHERENT
  	help
  	  Support for boards based on the Texas Instruments Keystone family of
  	  SoCs.
+
+config KEYSTONE2_DMA_COHERENT
+	bool

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC 0/1] ARM: mm: cache shareability tweak
  2016-04-12 14:06   ` Tero Kristo
@ 2016-04-12 15:00     ` Mark Rutland
  0 siblings, 0 replies; 5+ messages in thread
From: Mark Rutland @ 2016-04-12 15:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 12, 2016 at 05:06:49PM +0300, Tero Kristo wrote:
> On 04/12/2016 04:25 PM, Mark Rutland wrote:
> >On Tue, Apr 12, 2016 at 11:14:39AM +0300, Tero Kristo wrote:
> >>Some obvious holes in this implementation:
> >>
> >>1) during execution of arch/arm/kernel/head.S, the tweaked MMU shareability
> >>    settings are not in place. However, I am not too sure how much that
> >>    matters, as I am not sure what is mapped at this point. Kernel image
> >>    mapping should not matter at least, as we typically should not be doing
> >>    any DMA transfers from the kernel image.
> >
> >Strictly speaking, changing the shareability can result in a loss of
> >coherency, even if all accesses are made by the same CPU. See
> >"Mismatched memory attributes" in section A3.5.7 of the ARMv7-AR
> >Reference Manual (ARM DDI 0406C.c).
> 
> Basically we are not attempting to change shareability in-the-fly,
> but instead configure a different shareability value that is going
> to be used always.

I understood that this was a one-time transition; the problem still
applies, per the architecture. Your implementation _may_ provide
stronger guarantees than architecturally required.

Caches lines allocated as a result of accesses with one set of
attributes are not necessarily coherent with accesses with differing
attributes (even if only differing in terms of shareability). Until the
cache maintenance mandated by the ARM ARM is performed, you may
encounter a number of issues.

Thus changing attributes once during the boot process is potentially
problematic.

> >It's not just DMA that matters. I believe we may have page tables as
> >part of the kernel image, for instance, and those need to be accessed
> >with consistent attributes by the MMU when doing page table walks.
> >
> >You can avoid issues so long as you have appropriate cache maintenance,
> >but that's both expensive (all memory previously mapped must be
> >Clean+Invalidated by VA) and painful (as you can't reliably use any of
> >said memory until after the maintenance).
> 
> The hack we have internally just maps all the DMA pages as outer
> shareable.

I'm not sure precisely what you mean by "DMA pages" here. Surely this
applies to any mappings created using the usual page table accessors?

> I think maybe adding the original hack might help understanding the
> issue, so added inline in the end as reference. We just attempt to
> change the shareability value from 3 (the current) to 2.

I understand that you only wish to change the shareability.

What I am describing are the caveats that come with doing do, which are
not necessarily intuitive, but are laid out in the ARM ARM.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-04-12 15:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-12  8:14 [RFC 0/1] ARM: mm: cache shareability tweak Tero Kristo
2016-04-12  8:14 ` [RFC 1/1] ARM: mm: add support for specifying shareability level via cmdline Tero Kristo
2016-04-12 13:25 ` [RFC 0/1] ARM: mm: cache shareability tweak Mark Rutland
2016-04-12 14:06   ` Tero Kristo
2016-04-12 15:00     ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).