* [PATCH v2 00/12] Make use of v7 barrier variants in Linux
@ 2013-06-20 14:21 Will Deacon
2013-06-20 14:21 ` [PATCH v2 01/12] ARM: mm: remove redundant dsb() prior to range TLB invalidation Will Deacon
` (11 more replies)
0 siblings, 12 replies; 16+ messages in thread
From: Will Deacon @ 2013-06-20 14:21 UTC (permalink / raw)
To: linux-arm-kernel
Hello again,
This is the second version of the patches I originally posted here:
http://lists.infradead.org/pipermail/linux-arm-kernel/2013-June/173952.html
Changes since v1 include:
- Fixed broken TLB invalidation with SMP_ON_UP (reported by Jonny)
- Relaxed some additional barriers (l2x0 and flush_cache_vmap)
- Based on 3.10-rc6
These are targetting 3.12, so no need to panic just yet!
All feedback welcome,
Will
Will Deacon (12):
ARM: mm: remove redundant dsb() prior to range TLB invalidation
ARM: tlb: don't perform inner-shareable invalidation for local TLB ops
ARM: tlb: don't bother with barriers for branch predictor maintenance
ARM: tlb: don't perform inner-shareable invalidation for local BP ops
ARM: barrier: allow options to be passed to memory barrier
instructions
ARM: tlb: reduce scope of barrier domains for TLB invalidation
ARM: mm: use inner-shareable barriers for TLB and user cache
operations
ARM: spinlock: use inner-shareable dsb variant prior to sev
instruction
ARM: kvm: use inner-shareable barriers after TLB flushing
ARM: mcpm: use -st dsb option prior to sev instructions
ARM: l2x0: use -st dsb option for ordering writel_relaxed with unlock
ARM: cacheflush: use -ishst dsb variant for ensuring flush completion
arch/arm/common/mcpm_head.S | 2 +-
arch/arm/common/vlock.S | 4 +-
arch/arm/include/asm/assembler.h | 4 +-
arch/arm/include/asm/barrier.h | 32 +++----
arch/arm/include/asm/cacheflush.h | 2 +-
arch/arm/include/asm/spinlock.h | 2 +-
arch/arm/include/asm/switch_to.h | 10 +++
arch/arm/include/asm/tlbflush.h | 181 +++++++++++++++++++++++++++++++-------
arch/arm/kernel/smp_tlb.c | 10 +--
arch/arm/kvm/init.S | 2 +-
arch/arm/kvm/interrupts.S | 4 +-
arch/arm/mm/cache-l2x0.c | 2 +-
arch/arm/mm/cache-v7.S | 4 +-
arch/arm/mm/context.c | 6 +-
arch/arm/mm/dma-mapping.c | 1 -
arch/arm/mm/proc-v7.S | 2 +-
arch/arm/mm/tlb-v7.S | 8 +-
17 files changed, 199 insertions(+), 77 deletions(-)
--
1.8.2.2
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 01/12] ARM: mm: remove redundant dsb() prior to range TLB invalidation
2013-06-20 14:21 [PATCH v2 00/12] Make use of v7 barrier variants in Linux Will Deacon
@ 2013-06-20 14:21 ` Will Deacon
2013-06-20 14:21 ` [PATCH v2 02/12] ARM: tlb: don't perform inner-shareable invalidation for local TLB ops Will Deacon
` (10 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2013-06-20 14:21 UTC (permalink / raw)
To: linux-arm-kernel
The kernel TLB range invalidation functions already contain dsb
instructions before and after the maintenance, so there is no need to
introduce additional barriers.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/mm/dma-mapping.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index ef3e0f3..5baabf7 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -455,7 +455,6 @@ static void __dma_remap(struct page *page, size_t size, pgprot_t prot)
unsigned end = start + size;
apply_to_page_range(&init_mm, start, size, __dma_update_pte, &prot);
- dsb();
flush_tlb_kernel_range(start, end);
}
--
1.8.2.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 02/12] ARM: tlb: don't perform inner-shareable invalidation for local TLB ops
2013-06-20 14:21 [PATCH v2 00/12] Make use of v7 barrier variants in Linux Will Deacon
2013-06-20 14:21 ` [PATCH v2 01/12] ARM: mm: remove redundant dsb() prior to range TLB invalidation Will Deacon
@ 2013-06-20 14:21 ` Will Deacon
2013-06-20 14:21 ` [PATCH v2 03/12] ARM: tlb: don't bother with barriers for branch predictor maintenance Will Deacon
` (9 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2013-06-20 14:21 UTC (permalink / raw)
To: linux-arm-kernel
Inner-shareable TLB invalidation is typically more expensive than local
(non-shareable) invalidation, so performing the broadcasting for
local_flush_tlb_* operations is a waste of cycles and needlessly
clobbers entries in the TLBs of other CPUs.
This patch introduces __flush_tlb_* versions for many of the TLB
invalidation functions, which only respect inner-shareable variants of
the invalidation instructions when presented with the TLB_V7_UIS_FULL
flag. The local version is also inlined to prevent SMP_ON_UP kernels
from missing flushes, where the __flush variant would be called with
the UP flags.
This gains us around 0.5% in hackbench scores for a dual-core A15, but I
would expect this to improve as more cores (and clusters) are added to
the equation.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Albin Tonnerre <Albin.Tonnerre@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/include/asm/tlbflush.h | 138 ++++++++++++++++++++++++++++++++++------
arch/arm/kernel/smp_tlb.c | 8 +--
arch/arm/mm/context.c | 6 +-
3 files changed, 123 insertions(+), 29 deletions(-)
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index a3625d1..91b2922 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -319,6 +319,16 @@ extern struct cpu_tlb_fns cpu_tlb;
#define tlb_op(f, regs, arg) __tlb_op(f, "p15, 0, %0, " regs, arg)
#define tlb_l2_op(f, regs, arg) __tlb_op(f, "p15, 1, %0, " regs, arg)
+static inline void __local_flush_tlb_all(void)
+{
+ const int zero = 0;
+ const unsigned int __tlb_flag = __cpu_tlb_flags;
+
+ tlb_op(TLB_V4_U_FULL | TLB_V6_U_FULL, "c8, c7, 0", zero);
+ tlb_op(TLB_V4_D_FULL | TLB_V6_D_FULL, "c8, c6, 0", zero);
+ tlb_op(TLB_V4_I_FULL | TLB_V6_I_FULL, "c8, c5, 0", zero);
+}
+
static inline void local_flush_tlb_all(void)
{
const int zero = 0;
@@ -327,10 +337,8 @@ static inline void local_flush_tlb_all(void)
if (tlb_flag(TLB_WB))
dsb();
- tlb_op(TLB_V4_U_FULL | TLB_V6_U_FULL, "c8, c7, 0", zero);
- tlb_op(TLB_V4_D_FULL | TLB_V6_D_FULL, "c8, c6, 0", zero);
- tlb_op(TLB_V4_I_FULL | TLB_V6_I_FULL, "c8, c5, 0", zero);
- tlb_op(TLB_V7_UIS_FULL, "c8, c3, 0", zero);
+ __local_flush_tlb_all();
+ tlb_op(TLB_V7_UIS_FULL, "c8, c7, 0", zero);
if (tlb_flag(TLB_BARRIER)) {
dsb();
@@ -338,31 +346,69 @@ static inline void local_flush_tlb_all(void)
}
}
-static inline void local_flush_tlb_mm(struct mm_struct *mm)
+static inline void __flush_tlb_all(void)
{
const int zero = 0;
- const int asid = ASID(mm);
const unsigned int __tlb_flag = __cpu_tlb_flags;
if (tlb_flag(TLB_WB))
dsb();
+ __local_flush_tlb_all();
+ tlb_op(TLB_V7_UIS_FULL, "c8, c3, 0", zero);
+
+ if (tlb_flag(TLB_BARRIER)) {
+ dsb();
+ isb();
+ }
+}
+
+static inline void __local_flush_tlb_mm(struct mm_struct *mm)
+{
+ const int zero = 0;
+ const int asid = ASID(mm);
+ const unsigned int __tlb_flag = __cpu_tlb_flags;
+
if (possible_tlb_flags & (TLB_V4_U_FULL|TLB_V4_D_FULL|TLB_V4_I_FULL)) {
- if (cpumask_test_cpu(get_cpu(), mm_cpumask(mm))) {
+ if (cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm))) {
tlb_op(TLB_V4_U_FULL, "c8, c7, 0", zero);
tlb_op(TLB_V4_D_FULL, "c8, c6, 0", zero);
tlb_op(TLB_V4_I_FULL, "c8, c5, 0", zero);
}
- put_cpu();
}
tlb_op(TLB_V6_U_ASID, "c8, c7, 2", asid);
tlb_op(TLB_V6_D_ASID, "c8, c6, 2", asid);
tlb_op(TLB_V6_I_ASID, "c8, c5, 2", asid);
+}
+
+static inline void local_flush_tlb_mm(struct mm_struct *mm)
+{
+ const int asid = ASID(mm);
+ const unsigned int __tlb_flag = __cpu_tlb_flags;
+
+ if (tlb_flag(TLB_WB))
+ dsb();
+
+ __local_flush_tlb_mm(mm);
+ tlb_op(TLB_V7_UIS_ASID, "c8, c7, 2", asid);
+
+ if (tlb_flag(TLB_BARRIER))
+ dsb();
+}
+
+static inline void __flush_tlb_mm(struct mm_struct *mm)
+{
+ const unsigned int __tlb_flag = __cpu_tlb_flags;
+
+ if (tlb_flag(TLB_WB))
+ dsb();
+
+ __local_flush_tlb_mm(mm);
#ifdef CONFIG_ARM_ERRATA_720789
- tlb_op(TLB_V7_UIS_ASID, "c8, c3, 0", zero);
+ tlb_op(TLB_V7_UIS_ASID, "c8, c3, 0", 0);
#else
- tlb_op(TLB_V7_UIS_ASID, "c8, c3, 2", asid);
+ tlb_op(TLB_V7_UIS_ASID, "c8, c3, 2", ASID(mm));
#endif
if (tlb_flag(TLB_BARRIER))
@@ -370,16 +416,13 @@ static inline void local_flush_tlb_mm(struct mm_struct *mm)
}
static inline void
-local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
+__local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
{
const int zero = 0;
const unsigned int __tlb_flag = __cpu_tlb_flags;
uaddr = (uaddr & PAGE_MASK) | ASID(vma->vm_mm);
- if (tlb_flag(TLB_WB))
- dsb();
-
if (possible_tlb_flags & (TLB_V4_U_PAGE|TLB_V4_D_PAGE|TLB_V4_I_PAGE|TLB_V4_I_FULL) &&
cpumask_test_cpu(smp_processor_id(), mm_cpumask(vma->vm_mm))) {
tlb_op(TLB_V4_U_PAGE, "c8, c7, 1", uaddr);
@@ -392,6 +435,36 @@ local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
tlb_op(TLB_V6_U_PAGE, "c8, c7, 1", uaddr);
tlb_op(TLB_V6_D_PAGE, "c8, c6, 1", uaddr);
tlb_op(TLB_V6_I_PAGE, "c8, c5, 1", uaddr);
+}
+
+static inline void
+local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
+{
+ const unsigned int __tlb_flag = __cpu_tlb_flags;
+
+ uaddr = (uaddr & PAGE_MASK) | ASID(vma->vm_mm);
+
+ if (tlb_flag(TLB_WB))
+ dsb();
+
+ __local_flush_tlb_page(vma, uaddr);
+ tlb_op(TLB_V7_UIS_PAGE, "c8, c7, 1", uaddr);
+
+ if (tlb_flag(TLB_BARRIER))
+ dsb();
+}
+
+static inline void
+__flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
+{
+ const unsigned int __tlb_flag = __cpu_tlb_flags;
+
+ uaddr = (uaddr & PAGE_MASK) | ASID(vma->vm_mm);
+
+ if (tlb_flag(TLB_WB))
+ dsb();
+
+ __local_flush_tlb_page(vma, uaddr);
#ifdef CONFIG_ARM_ERRATA_720789
tlb_op(TLB_V7_UIS_PAGE, "c8, c3, 3", uaddr & PAGE_MASK);
#else
@@ -402,16 +475,11 @@ local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
dsb();
}
-static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
+static inline void __local_flush_tlb_kernel_page(unsigned long kaddr)
{
const int zero = 0;
const unsigned int __tlb_flag = __cpu_tlb_flags;
- kaddr &= PAGE_MASK;
-
- if (tlb_flag(TLB_WB))
- dsb();
-
tlb_op(TLB_V4_U_PAGE, "c8, c7, 1", kaddr);
tlb_op(TLB_V4_D_PAGE, "c8, c6, 1", kaddr);
tlb_op(TLB_V4_I_PAGE, "c8, c5, 1", kaddr);
@@ -421,6 +489,36 @@ static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
tlb_op(TLB_V6_U_PAGE, "c8, c7, 1", kaddr);
tlb_op(TLB_V6_D_PAGE, "c8, c6, 1", kaddr);
tlb_op(TLB_V6_I_PAGE, "c8, c5, 1", kaddr);
+}
+
+static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
+{
+ const unsigned int __tlb_flag = __cpu_tlb_flags;
+
+ kaddr &= PAGE_MASK;
+
+ if (tlb_flag(TLB_WB))
+ dsb();
+
+ __local_flush_tlb_kernel_page(kaddr);
+ tlb_op(TLB_V7_UIS_PAGE, "c8, c7, 1", kaddr);
+
+ if (tlb_flag(TLB_BARRIER)) {
+ dsb();
+ isb();
+ }
+}
+
+static inline void __flush_tlb_kernel_page(unsigned long kaddr)
+{
+ const unsigned int __tlb_flag = __cpu_tlb_flags;
+
+ kaddr &= PAGE_MASK;
+
+ if (tlb_flag(TLB_WB))
+ dsb();
+
+ __local_flush_tlb_kernel_page(kaddr);
tlb_op(TLB_V7_UIS_PAGE, "c8, c3, 1", kaddr);
if (tlb_flag(TLB_BARRIER)) {
diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
index 9a52a07..cc299b5 100644
--- a/arch/arm/kernel/smp_tlb.c
+++ b/arch/arm/kernel/smp_tlb.c
@@ -135,7 +135,7 @@ void flush_tlb_all(void)
if (tlb_ops_need_broadcast())
on_each_cpu(ipi_flush_tlb_all, NULL, 1);
else
- local_flush_tlb_all();
+ __flush_tlb_all();
broadcast_tlb_a15_erratum();
}
@@ -144,7 +144,7 @@ void flush_tlb_mm(struct mm_struct *mm)
if (tlb_ops_need_broadcast())
on_each_cpu_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, 1);
else
- local_flush_tlb_mm(mm);
+ __flush_tlb_mm(mm);
broadcast_tlb_mm_a15_erratum(mm);
}
@@ -157,7 +157,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page,
&ta, 1);
} else
- local_flush_tlb_page(vma, uaddr);
+ __flush_tlb_page(vma, uaddr);
broadcast_tlb_mm_a15_erratum(vma->vm_mm);
}
@@ -168,7 +168,7 @@ void flush_tlb_kernel_page(unsigned long kaddr)
ta.ta_start = kaddr;
on_each_cpu(ipi_flush_tlb_kernel_page, &ta, 1);
} else
- local_flush_tlb_kernel_page(kaddr);
+ __flush_tlb_kernel_page(kaddr);
broadcast_tlb_a15_erratum();
}
diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
index 2ac3737..62c1ec5 100644
--- a/arch/arm/mm/context.c
+++ b/arch/arm/mm/context.c
@@ -134,10 +134,7 @@ static void flush_context(unsigned int cpu)
}
/* Queue a TLB invalidate and flush the I-cache if necessary. */
- if (!tlb_ops_need_broadcast())
- cpumask_set_cpu(cpu, &tlb_flush_pending);
- else
- cpumask_setall(&tlb_flush_pending);
+ cpumask_setall(&tlb_flush_pending);
if (icache_is_vivt_asid_tagged())
__flush_icache_all();
@@ -215,7 +212,6 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
if (cpumask_test_and_clear_cpu(cpu, &tlb_flush_pending)) {
local_flush_bp_all();
local_flush_tlb_all();
- dummy_flush_tlb_a15_erratum();
}
atomic64_set(&per_cpu(active_asids, cpu), asid);
--
1.8.2.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 03/12] ARM: tlb: don't bother with barriers for branch predictor maintenance
2013-06-20 14:21 [PATCH v2 00/12] Make use of v7 barrier variants in Linux Will Deacon
2013-06-20 14:21 ` [PATCH v2 01/12] ARM: mm: remove redundant dsb() prior to range TLB invalidation Will Deacon
2013-06-20 14:21 ` [PATCH v2 02/12] ARM: tlb: don't perform inner-shareable invalidation for local TLB ops Will Deacon
@ 2013-06-20 14:21 ` Will Deacon
2013-06-20 14:21 ` [PATCH v2 04/12] ARM: tlb: don't perform inner-shareable invalidation for local BP ops Will Deacon
` (8 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2013-06-20 14:21 UTC (permalink / raw)
To: linux-arm-kernel
Branch predictor maintenance is only required when we are either
changing the kernel's view of memory (switching tables completely) or
dealing with ASID rollover.
Both of these use-cases require subsequent TLB invalidation, which has
the relevant barrier instructions to ensure completion and visibility
of the maintenance, so this patch removes the instruction barrier from
[local_]flush_bp_all.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/include/asm/tlbflush.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index 91b2922..45e3aea 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -527,6 +527,10 @@ static inline void __flush_tlb_kernel_page(unsigned long kaddr)
}
}
+/*
+ * Branch predictor maintenance is paired with full TLB invalidation, so
+ * there is no need for any barriers here.
+ */
static inline void local_flush_bp_all(void)
{
const int zero = 0;
@@ -536,9 +540,6 @@ static inline void local_flush_bp_all(void)
asm("mcr p15, 0, %0, c7, c1, 6" : : "r" (zero));
else if (tlb_flag(TLB_V6_BP))
asm("mcr p15, 0, %0, c7, c5, 6" : : "r" (zero));
-
- if (tlb_flag(TLB_BARRIER))
- isb();
}
#ifdef CONFIG_ARM_ERRATA_798181
--
1.8.2.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 04/12] ARM: tlb: don't perform inner-shareable invalidation for local BP ops
2013-06-20 14:21 [PATCH v2 00/12] Make use of v7 barrier variants in Linux Will Deacon
` (2 preceding siblings ...)
2013-06-20 14:21 ` [PATCH v2 03/12] ARM: tlb: don't bother with barriers for branch predictor maintenance Will Deacon
@ 2013-06-20 14:21 ` Will Deacon
2013-06-20 14:21 ` [PATCH v2 05/12] ARM: barrier: allow options to be passed to memory barrier instructions Will Deacon
` (7 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2013-06-20 14:21 UTC (permalink / raw)
To: linux-arm-kernel
Now that the ASID allocator doesn't require inner-shareable maintenance,
we can convert the local_bp_flush_all function to perform only
non-shareable flushing, in a similar manner to the TLB invalidation
routines.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/include/asm/tlbflush.h | 22 ++++++++++++++++++++--
arch/arm/kernel/smp_tlb.c | 2 +-
2 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index 45e3aea..b9c0811 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -531,17 +531,35 @@ static inline void __flush_tlb_kernel_page(unsigned long kaddr)
* Branch predictor maintenance is paired with full TLB invalidation, so
* there is no need for any barriers here.
*/
+static inline void __local_flush_bp_all(void)
+{
+ const int zero = 0;
+ const unsigned int __tlb_flag = __cpu_tlb_flags;
+
+ if (tlb_flag(TLB_V6_BP))
+ asm("mcr p15, 0, %0, c7, c5, 6" : : "r" (zero));
+}
+
static inline void local_flush_bp_all(void)
{
const int zero = 0;
const unsigned int __tlb_flag = __cpu_tlb_flags;
+ __local_flush_bp_all();
if (tlb_flag(TLB_V7_UIS_BP))
- asm("mcr p15, 0, %0, c7, c1, 6" : : "r" (zero));
- else if (tlb_flag(TLB_V6_BP))
asm("mcr p15, 0, %0, c7, c5, 6" : : "r" (zero));
}
+static inline void __flush_bp_all(void)
+{
+ const int zero = 0;
+ const unsigned int __tlb_flag = __cpu_tlb_flags;
+
+ __local_flush_bp_all();
+ if (tlb_flag(TLB_V7_UIS_BP))
+ asm("mcr p15, 0, %0, c7, c1, 6" : : "r" (zero));
+}
+
#ifdef CONFIG_ARM_ERRATA_798181
static inline void dummy_flush_tlb_a15_erratum(void)
{
diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
index cc299b5..5cb5500 100644
--- a/arch/arm/kernel/smp_tlb.c
+++ b/arch/arm/kernel/smp_tlb.c
@@ -204,5 +204,5 @@ void flush_bp_all(void)
if (tlb_ops_need_broadcast())
on_each_cpu(ipi_flush_bp_all, NULL, 1);
else
- local_flush_bp_all();
+ __flush_bp_all();
}
--
1.8.2.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 05/12] ARM: barrier: allow options to be passed to memory barrier instructions
2013-06-20 14:21 [PATCH v2 00/12] Make use of v7 barrier variants in Linux Will Deacon
` (3 preceding siblings ...)
2013-06-20 14:21 ` [PATCH v2 04/12] ARM: tlb: don't perform inner-shareable invalidation for local BP ops Will Deacon
@ 2013-06-20 14:21 ` Will Deacon
2013-06-21 8:37 ` Ming Lei
2013-06-20 14:21 ` [PATCH v2 06/12] ARM: tlb: reduce scope of barrier domains for TLB invalidation Will Deacon
` (6 subsequent siblings)
11 siblings, 1 reply; 16+ messages in thread
From: Will Deacon @ 2013-06-20 14:21 UTC (permalink / raw)
To: linux-arm-kernel
On ARMv7, the memory barrier instructions take an optional `option'
field which can be used to constrain the effects of a memory barrier
based on shareability and access type.
This patch allows the caller to pass these options if required, and
updates the smp_*() barriers to request inner-shareable barriers,
affecting only stores for the _wmb variant. wmb() is also changed to
use the -st version of dsb.
Reported-by: Albin Tonnerre <albin.tonnerre@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/include/asm/assembler.h | 4 ++--
arch/arm/include/asm/barrier.h | 32 ++++++++++++++++----------------
2 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index 05ee9ee..863b280 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -212,9 +212,9 @@
#ifdef CONFIG_SMP
#if __LINUX_ARM_ARCH__ >= 7
.ifeqs "\mode","arm"
- ALT_SMP(dmb)
+ ALT_SMP(dmb ish)
.else
- ALT_SMP(W(dmb))
+ ALT_SMP(W(dmb) ish)
.endif
#elif __LINUX_ARM_ARCH__ == 6
ALT_SMP(mcr p15, 0, r0, c7, c10, 5) @ dmb
diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index 8dcd9c7..60f15e2 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -14,27 +14,27 @@
#endif
#if __LINUX_ARM_ARCH__ >= 7
-#define isb() __asm__ __volatile__ ("isb" : : : "memory")
-#define dsb() __asm__ __volatile__ ("dsb" : : : "memory")
-#define dmb() __asm__ __volatile__ ("dmb" : : : "memory")
+#define isb(option) __asm__ __volatile__ ("isb " #option : : : "memory")
+#define dsb(option) __asm__ __volatile__ ("dsb " #option : : : "memory")
+#define dmb(option) __asm__ __volatile__ ("dmb " #option : : : "memory")
#elif defined(CONFIG_CPU_XSC3) || __LINUX_ARM_ARCH__ == 6
-#define isb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \
+#define isb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \
: : "r" (0) : "memory")
-#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
+#define dsb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
: : "r" (0) : "memory")
-#define dmb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \
+#define dmb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \
: : "r" (0) : "memory")
#elif defined(CONFIG_CPU_FA526)
-#define isb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \
+#define isb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \
: : "r" (0) : "memory")
-#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
+#define dsb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
: : "r" (0) : "memory")
-#define dmb() __asm__ __volatile__ ("" : : : "memory")
+#define dmb(x) __asm__ __volatile__ ("" : : : "memory")
#else
-#define isb() __asm__ __volatile__ ("" : : : "memory")
-#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
+#define isb(x) __asm__ __volatile__ ("" : : : "memory")
+#define dsb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
: : "r" (0) : "memory")
-#define dmb() __asm__ __volatile__ ("" : : : "memory")
+#define dmb(x) __asm__ __volatile__ ("" : : : "memory")
#endif
#ifdef CONFIG_ARCH_HAS_BARRIERS
@@ -42,7 +42,7 @@
#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
#define mb() do { dsb(); outer_sync(); } while (0)
#define rmb() dsb()
-#define wmb() mb()
+#define wmb() do { dsb(st); outer_sync(); } while (0)
#else
#define mb() barrier()
#define rmb() barrier()
@@ -54,9 +54,9 @@
#define smp_rmb() barrier()
#define smp_wmb() barrier()
#else
-#define smp_mb() dmb()
-#define smp_rmb() dmb()
-#define smp_wmb() dmb()
+#define smp_mb() dmb(ish)
+#define smp_rmb() smp_mb()
+#define smp_wmb() dmb(ishst)
#endif
#define read_barrier_depends() do { } while(0)
--
1.8.2.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 06/12] ARM: tlb: reduce scope of barrier domains for TLB invalidation
2013-06-20 14:21 [PATCH v2 00/12] Make use of v7 barrier variants in Linux Will Deacon
` (4 preceding siblings ...)
2013-06-20 14:21 ` [PATCH v2 05/12] ARM: barrier: allow options to be passed to memory barrier instructions Will Deacon
@ 2013-06-20 14:21 ` Will Deacon
2013-06-20 14:21 ` [PATCH v2 07/12] ARM: mm: use inner-shareable barriers for TLB and user cache operations Will Deacon
` (5 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2013-06-20 14:21 UTC (permalink / raw)
To: linux-arm-kernel
Our TLB invalidation routines may require a barrier before the
maintenance (in order to ensure pending page table writes are visible to
the hardware walker) and barriers afterwards (in order to ensure
completion of the maintenance and visibility in the instruction stream).
Whilst this is expensive, the cost can be reduced somewhat by reducing
the scope of the barrier instructions:
- The barrier before only needs to apply to stores (pte writes)
- Local ops are required only to affect the non-shareable domain
- Global ops are required only to affect the inner-shareable domain
This patch makes these changes for the TLB flushing code.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/include/asm/tlbflush.h | 36 ++++++++++++++++++------------------
1 file changed, 18 insertions(+), 18 deletions(-)
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index b9c0811..39eddeb 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -335,13 +335,13 @@ static inline void local_flush_tlb_all(void)
const unsigned int __tlb_flag = __cpu_tlb_flags;
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(nshst);
__local_flush_tlb_all();
tlb_op(TLB_V7_UIS_FULL, "c8, c7, 0", zero);
if (tlb_flag(TLB_BARRIER)) {
- dsb();
+ dsb(nsh);
isb();
}
}
@@ -352,13 +352,13 @@ static inline void __flush_tlb_all(void)
const unsigned int __tlb_flag = __cpu_tlb_flags;
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(ishst);
__local_flush_tlb_all();
tlb_op(TLB_V7_UIS_FULL, "c8, c3, 0", zero);
if (tlb_flag(TLB_BARRIER)) {
- dsb();
+ dsb(ish);
isb();
}
}
@@ -388,13 +388,13 @@ static inline void local_flush_tlb_mm(struct mm_struct *mm)
const unsigned int __tlb_flag = __cpu_tlb_flags;
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(nshst);
__local_flush_tlb_mm(mm);
tlb_op(TLB_V7_UIS_ASID, "c8, c7, 2", asid);
if (tlb_flag(TLB_BARRIER))
- dsb();
+ dsb(nsh);
}
static inline void __flush_tlb_mm(struct mm_struct *mm)
@@ -402,7 +402,7 @@ static inline void __flush_tlb_mm(struct mm_struct *mm)
const unsigned int __tlb_flag = __cpu_tlb_flags;
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(ishst);
__local_flush_tlb_mm(mm);
#ifdef CONFIG_ARM_ERRATA_720789
@@ -412,7 +412,7 @@ static inline void __flush_tlb_mm(struct mm_struct *mm)
#endif
if (tlb_flag(TLB_BARRIER))
- dsb();
+ dsb(ish);
}
static inline void
@@ -445,13 +445,13 @@ local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
uaddr = (uaddr & PAGE_MASK) | ASID(vma->vm_mm);
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(nshst);
__local_flush_tlb_page(vma, uaddr);
tlb_op(TLB_V7_UIS_PAGE, "c8, c7, 1", uaddr);
if (tlb_flag(TLB_BARRIER))
- dsb();
+ dsb(nsh);
}
static inline void
@@ -462,7 +462,7 @@ __flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
uaddr = (uaddr & PAGE_MASK) | ASID(vma->vm_mm);
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(ishst);
__local_flush_tlb_page(vma, uaddr);
#ifdef CONFIG_ARM_ERRATA_720789
@@ -472,7 +472,7 @@ __flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
#endif
if (tlb_flag(TLB_BARRIER))
- dsb();
+ dsb(ish);
}
static inline void __local_flush_tlb_kernel_page(unsigned long kaddr)
@@ -498,13 +498,13 @@ static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
kaddr &= PAGE_MASK;
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(nshst);
__local_flush_tlb_kernel_page(kaddr);
tlb_op(TLB_V7_UIS_PAGE, "c8, c7, 1", kaddr);
if (tlb_flag(TLB_BARRIER)) {
- dsb();
+ dsb(nsh);
isb();
}
}
@@ -516,13 +516,13 @@ static inline void __flush_tlb_kernel_page(unsigned long kaddr)
kaddr &= PAGE_MASK;
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(ishst);
__local_flush_tlb_kernel_page(kaddr);
tlb_op(TLB_V7_UIS_PAGE, "c8, c3, 1", kaddr);
if (tlb_flag(TLB_BARRIER)) {
- dsb();
+ dsb(ish);
isb();
}
}
@@ -567,7 +567,7 @@ static inline void dummy_flush_tlb_a15_erratum(void)
* Dummy TLBIMVAIS. Using the unmapped address 0 and ASID 0.
*/
asm("mcr p15, 0, %0, c8, c3, 1" : : "r" (0));
- dsb();
+ dsb(ish);
}
#else
static inline void dummy_flush_tlb_a15_erratum(void)
@@ -596,7 +596,7 @@ static inline void flush_pmd_entry(void *pmd)
tlb_l2_op(TLB_L2CLEAN_FR, "c15, c9, 1 @ L2 flush_pmd", pmd);
if (tlb_flag(TLB_WB))
- dsb();
+ dsb(ishst);
}
static inline void clean_pmd_entry(void *pmd)
--
1.8.2.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 07/12] ARM: mm: use inner-shareable barriers for TLB and user cache operations
2013-06-20 14:21 [PATCH v2 00/12] Make use of v7 barrier variants in Linux Will Deacon
` (5 preceding siblings ...)
2013-06-20 14:21 ` [PATCH v2 06/12] ARM: tlb: reduce scope of barrier domains for TLB invalidation Will Deacon
@ 2013-06-20 14:21 ` Will Deacon
2013-06-20 14:21 ` [PATCH v2 08/12] ARM: spinlock: use inner-shareable dsb variant prior to sev instruction Will Deacon
` (4 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2013-06-20 14:21 UTC (permalink / raw)
To: linux-arm-kernel
System-wide barriers aren't required for situations where we only need
to make visibility and ordering guarantees in the inner-shareable domain
(i.e. we are not dealing with devices or potentially incoherent CPUs).
This patch changes the v7 TLB operations, coherent_user_range and
dcache_clean_area functions to user inner-shareable barriers. For cache
maintenance, only the store access type is required to ensure completion.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/mm/cache-v7.S | 4 ++--
arch/arm/mm/proc-v7.S | 2 +-
arch/arm/mm/tlb-v7.S | 8 ++++----
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 15451ee..44f5a6a 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -274,7 +274,7 @@ ENTRY(v7_coherent_user_range)
add r12, r12, r2
cmp r12, r1
blo 1b
- dsb
+ dsb ishst
icache_line_size r2, r3
sub r3, r2, #1
bic r12, r0, r3
@@ -286,7 +286,7 @@ ENTRY(v7_coherent_user_range)
mov r0, #0
ALT_SMP(mcr p15, 0, r0, c7, c1, 6) @ invalidate BTB Inner Shareable
ALT_UP(mcr p15, 0, r0, c7, c5, 6) @ invalidate BTB
- dsb
+ dsb ishst
isb
mov pc, lr
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 2c73a73..d19ddc0 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -82,7 +82,7 @@ ENTRY(cpu_v7_dcache_clean_area)
add r0, r0, r2
subs r1, r1, r2
bhi 1b
- dsb
+ dsb ishst
mov pc, lr
ENDPROC(cpu_v7_dcache_clean_area)
diff --git a/arch/arm/mm/tlb-v7.S b/arch/arm/mm/tlb-v7.S
index ea94765..3553087 100644
--- a/arch/arm/mm/tlb-v7.S
+++ b/arch/arm/mm/tlb-v7.S
@@ -35,7 +35,7 @@
ENTRY(v7wbi_flush_user_tlb_range)
vma_vm_mm r3, r2 @ get vma->vm_mm
mmid r3, r3 @ get vm_mm->context.id
- dsb
+ dsb ish
mov r0, r0, lsr #PAGE_SHIFT @ align address
mov r1, r1, lsr #PAGE_SHIFT
asid r3, r3 @ mask ASID
@@ -56,7 +56,7 @@ ENTRY(v7wbi_flush_user_tlb_range)
add r0, r0, #PAGE_SZ
cmp r0, r1
blo 1b
- dsb
+ dsb ish
mov pc, lr
ENDPROC(v7wbi_flush_user_tlb_range)
@@ -69,7 +69,7 @@ ENDPROC(v7wbi_flush_user_tlb_range)
* - end - end address (exclusive, may not be aligned)
*/
ENTRY(v7wbi_flush_kern_tlb_range)
- dsb
+ dsb ish
mov r0, r0, lsr #PAGE_SHIFT @ align address
mov r1, r1, lsr #PAGE_SHIFT
mov r0, r0, lsl #PAGE_SHIFT
@@ -84,7 +84,7 @@ ENTRY(v7wbi_flush_kern_tlb_range)
add r0, r0, #PAGE_SZ
cmp r0, r1
blo 1b
- dsb
+ dsb ish
isb
mov pc, lr
ENDPROC(v7wbi_flush_kern_tlb_range)
--
1.8.2.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 08/12] ARM: spinlock: use inner-shareable dsb variant prior to sev instruction
2013-06-20 14:21 [PATCH v2 00/12] Make use of v7 barrier variants in Linux Will Deacon
` (6 preceding siblings ...)
2013-06-20 14:21 ` [PATCH v2 07/12] ARM: mm: use inner-shareable barriers for TLB and user cache operations Will Deacon
@ 2013-06-20 14:21 ` Will Deacon
2013-06-20 14:21 ` [PATCH v2 09/12] ARM: kvm: use inner-shareable barriers after TLB flushing Will Deacon
` (3 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2013-06-20 14:21 UTC (permalink / raw)
To: linux-arm-kernel
When unlocking a spinlock, we use the sev instruction to signal other
CPUs waiting on the lock. Since sev is not a memory access instruction,
we require a dsb in order to ensure that the sev is not issued ahead
of the store placing the lock in an unlocked state.
However, as sev is only concerned with other processors in a
multiprocessor system, we can restrict the scope of the preceding dsb
to the inner-shareable domain. Furthermore, we can restrict the scope to
consider only stores, since there are no independent loads on the unlock
path.
A side-effect of this change is that a spin_unlock operation no longer
forces completion of pending TLB invalidation, something which we rely
on when unlocking runqueues to ensure that CPU migration during TLB
maintenance routines doesn't cause us to continue before the operation
has completed.
This patch adds the -ishst suffix to the ARMv7 definition of dsb_sev()
and adds an inner-shareable dsb to the context-switch path when running
a preemptible, SMP, v7 kernel.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/include/asm/spinlock.h | 2 +-
arch/arm/include/asm/switch_to.h | 10 ++++++++++
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index 6220e9f..5a0261e 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -46,7 +46,7 @@ static inline void dsb_sev(void)
{
#if __LINUX_ARM_ARCH__ >= 7
__asm__ __volatile__ (
- "dsb\n"
+ "dsb ishst\n"
SEV
);
#else
diff --git a/arch/arm/include/asm/switch_to.h b/arch/arm/include/asm/switch_to.h
index fa09e6b..c99e259 100644
--- a/arch/arm/include/asm/switch_to.h
+++ b/arch/arm/include/asm/switch_to.h
@@ -4,6 +4,16 @@
#include <linux/thread_info.h>
/*
+ * For v7 SMP cores running a preemptible kernel we may be pre-empted
+ * during a TLB maintenance operation, so execute an inner-shareable dsb
+ * to ensure that the maintenance completes in case we migrate to another
+ * CPU.
+ */
+#if defined(CONFIG_PREEMPT) && defined(CONFIG_SMP) && defined(CONFIG_CPU_V7)
+#define finish_arch_switch(prev) dsb(ish)
+#endif
+
+/*
* switch_to(prev, next) should switch from task `prev' to `next'
* `prev' will never be the same as `next'. schedule() itself
* contains the memory barrier to tell GCC not to cache `current'.
--
1.8.2.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 09/12] ARM: kvm: use inner-shareable barriers after TLB flushing
2013-06-20 14:21 [PATCH v2 00/12] Make use of v7 barrier variants in Linux Will Deacon
` (7 preceding siblings ...)
2013-06-20 14:21 ` [PATCH v2 08/12] ARM: spinlock: use inner-shareable dsb variant prior to sev instruction Will Deacon
@ 2013-06-20 14:21 ` Will Deacon
2013-06-20 14:21 ` [PATCH v2 10/12] ARM: mcpm: use -st dsb option prior to sev instructions Will Deacon
` (2 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2013-06-20 14:21 UTC (permalink / raw)
To: linux-arm-kernel
When flushing the TLB at PL2 in response to remapping at stage-2 or VMID
rollover, we have a dsb instruction to ensure completion of the command
before continuing.
Since we only care about other processors for TLB invalidation, use the
inner-shareable variant of the dsb instruction instead.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/kvm/init.S | 2 +-
arch/arm/kvm/interrupts.S | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index f048338..1b9844d 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -142,7 +142,7 @@ target: @ We're now in the trampoline code, switch page tables
@ Invalidate the old TLBs
mcr p15, 4, r0, c8, c7, 0 @ TLBIALLH
- dsb
+ dsb ish
eret
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index f7793df..dfb5dcc 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -54,7 +54,7 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
mcrr p15, 6, r2, r3, c2 @ Write VTTBR
isb
mcr p15, 0, r0, c8, c3, 0 @ TLBIALLIS (rt ignored)
- dsb
+ dsb ish
isb
mov r2, #0
mov r3, #0
@@ -78,7 +78,7 @@ ENTRY(__kvm_flush_vm_context)
mcr p15, 4, r0, c8, c3, 4
/* Invalidate instruction caches Inner Shareable (ICIALLUIS) */
mcr p15, 0, r0, c7, c1, 0
- dsb
+ dsb ish
isb @ Not necessary if followed by eret
bx lr
--
1.8.2.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 10/12] ARM: mcpm: use -st dsb option prior to sev instructions
2013-06-20 14:21 [PATCH v2 00/12] Make use of v7 barrier variants in Linux Will Deacon
` (8 preceding siblings ...)
2013-06-20 14:21 ` [PATCH v2 09/12] ARM: kvm: use inner-shareable barriers after TLB flushing Will Deacon
@ 2013-06-20 14:21 ` Will Deacon
2013-06-20 14:21 ` [PATCH v2 11/12] ARM: l2x0: use -st dsb option for ordering writel_relaxed with unlock Will Deacon
2013-06-20 14:21 ` [PATCH v2 12/12] ARM: cacheflush: use -ishst dsb variant for ensuring flush completion Will Deacon
11 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2013-06-20 14:21 UTC (permalink / raw)
To: linux-arm-kernel
In a similar manner to our spinlock implementation, mcpm uses sev to
wake up cores waiting on a lock when the lock is unlocked. In order to
ensure that the final write unlocking the lock is visible, a dsb
instruction is executed immediately prior to the sev.
This patch changes these dsbs to use the -st option, since we only
require that the store unlocking the lock is made visible.
Acked-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/common/mcpm_head.S | 2 +-
arch/arm/common/vlock.S | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/arm/common/mcpm_head.S b/arch/arm/common/mcpm_head.S
index 8178705..5cdf619 100644
--- a/arch/arm/common/mcpm_head.S
+++ b/arch/arm/common/mcpm_head.S
@@ -151,7 +151,7 @@ mcpm_setup_leave:
mov r0, #INBOUND_NOT_COMING_UP
strb r0, [r8, #MCPM_SYNC_CLUSTER_INBOUND]
- dsb
+ dsb st
sev
mov r0, r11
diff --git a/arch/arm/common/vlock.S b/arch/arm/common/vlock.S
index ff19858..8b7df28 100644
--- a/arch/arm/common/vlock.S
+++ b/arch/arm/common/vlock.S
@@ -42,7 +42,7 @@
dmb
mov \rscratch, #0
strb \rscratch, [\rbase, \rcpu]
- dsb
+ dsb st
sev
.endm
@@ -102,7 +102,7 @@ ENTRY(vlock_unlock)
dmb
mov r1, #VLOCK_OWNER_NONE
strb r1, [r0, #VLOCK_OWNER_OFFSET]
- dsb
+ dsb st
sev
bx lr
ENDPROC(vlock_unlock)
--
1.8.2.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 11/12] ARM: l2x0: use -st dsb option for ordering writel_relaxed with unlock
2013-06-20 14:21 [PATCH v2 00/12] Make use of v7 barrier variants in Linux Will Deacon
` (9 preceding siblings ...)
2013-06-20 14:21 ` [PATCH v2 10/12] ARM: mcpm: use -st dsb option prior to sev instructions Will Deacon
@ 2013-06-20 14:21 ` Will Deacon
2013-06-20 14:21 ` [PATCH v2 12/12] ARM: cacheflush: use -ishst dsb variant for ensuring flush completion Will Deacon
11 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2013-06-20 14:21 UTC (permalink / raw)
To: linux-arm-kernel
writel_relaxed and spin_unlock are both store operations, so we only
need to enforce store ordering in the dsb.
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/mm/cache-l2x0.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index c465fac..7a3db10 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -290,7 +290,7 @@ static void l2x0_disable(void)
raw_spin_lock_irqsave(&l2x0_lock, flags);
__l2x0_flush_all();
writel_relaxed(0, l2x0_base + L2X0_CTRL);
- dsb();
+ dsb(st);
raw_spin_unlock_irqrestore(&l2x0_lock, flags);
}
--
1.8.2.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 12/12] ARM: cacheflush: use -ishst dsb variant for ensuring flush completion
2013-06-20 14:21 [PATCH v2 00/12] Make use of v7 barrier variants in Linux Will Deacon
` (10 preceding siblings ...)
2013-06-20 14:21 ` [PATCH v2 11/12] ARM: l2x0: use -st dsb option for ordering writel_relaxed with unlock Will Deacon
@ 2013-06-20 14:21 ` Will Deacon
11 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2013-06-20 14:21 UTC (permalink / raw)
To: linux-arm-kernel
flush_cache_vmap contains a dsb to ensure that any cacheflushing
operations to flush out newly written ptes have completed.
This patch adds the -ishst option to the dsb, since that is all that is
required for completing cacheflushing in the inner-shareable domain.
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm/include/asm/cacheflush.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index bff7138..38defc4 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -354,7 +354,7 @@ static inline void flush_cache_vmap(unsigned long start, unsigned long end)
* set_pte_at() called from vmap_pte_range() does not
* have a DSB after cleaning the cache line.
*/
- dsb();
+ dsb(ishst);
}
static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
--
1.8.2.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 05/12] ARM: barrier: allow options to be passed to memory barrier instructions
2013-06-20 14:21 ` [PATCH v2 05/12] ARM: barrier: allow options to be passed to memory barrier instructions Will Deacon
@ 2013-06-21 8:37 ` Ming Lei
2013-06-21 8:51 ` Will Deacon
0 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2013-06-21 8:37 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Jun 20, 2013 at 10:21 PM, Will Deacon <will.deacon@arm.com> wrote:
> On ARMv7, the memory barrier instructions take an optional `option'
> field which can be used to constrain the effects of a memory barrier
> based on shareability and access type.
>
> This patch allows the caller to pass these options if required, and
> updates the smp_*() barriers to request inner-shareable barriers,
> affecting only stores for the _wmb variant. wmb() is also changed to
> use the -st version of dsb.
>
> Reported-by: Albin Tonnerre <albin.tonnerre@arm.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
> arch/arm/include/asm/assembler.h | 4 ++--
> arch/arm/include/asm/barrier.h | 32 ++++++++++++++++----------------
> 2 files changed, 18 insertions(+), 18 deletions(-)
>
> diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
> index 05ee9ee..863b280 100644
> --- a/arch/arm/include/asm/assembler.h
> +++ b/arch/arm/include/asm/assembler.h
> @@ -212,9 +212,9 @@
> #ifdef CONFIG_SMP
> #if __LINUX_ARM_ARCH__ >= 7
> .ifeqs "\mode","arm"
> - ALT_SMP(dmb)
> + ALT_SMP(dmb ish)
> .else
> - ALT_SMP(W(dmb))
> + ALT_SMP(W(dmb) ish)
> .endif
> #elif __LINUX_ARM_ARCH__ == 6
> ALT_SMP(mcr p15, 0, r0, c7, c10, 5) @ dmb
> diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
> index 8dcd9c7..60f15e2 100644
> --- a/arch/arm/include/asm/barrier.h
> +++ b/arch/arm/include/asm/barrier.h
> @@ -14,27 +14,27 @@
> #endif
>
> #if __LINUX_ARM_ARCH__ >= 7
> -#define isb() __asm__ __volatile__ ("isb" : : : "memory")
> -#define dsb() __asm__ __volatile__ ("dsb" : : : "memory")
> -#define dmb() __asm__ __volatile__ ("dmb" : : : "memory")
> +#define isb(option) __asm__ __volatile__ ("isb " #option : : : "memory")
> +#define dsb(option) __asm__ __volatile__ ("dsb " #option : : : "memory")
> +#define dmb(option) __asm__ __volatile__ ("dmb " #option : : : "memory")
> #elif defined(CONFIG_CPU_XSC3) || __LINUX_ARM_ARCH__ == 6
> -#define isb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \
> +#define isb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \
> : : "r" (0) : "memory")
> -#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
> +#define dsb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
> : : "r" (0) : "memory")
> -#define dmb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \
> +#define dmb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \
> : : "r" (0) : "memory")
> #elif defined(CONFIG_CPU_FA526)
> -#define isb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \
> +#define isb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \
> : : "r" (0) : "memory")
> -#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
> +#define dsb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
> : : "r" (0) : "memory")
> -#define dmb() __asm__ __volatile__ ("" : : : "memory")
> +#define dmb(x) __asm__ __volatile__ ("" : : : "memory")
> #else
> -#define isb() __asm__ __volatile__ ("" : : : "memory")
> -#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
> +#define isb(x) __asm__ __volatile__ ("" : : : "memory")
> +#define dsb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
> : : "r" (0) : "memory")
> -#define dmb() __asm__ __volatile__ ("" : : : "memory")
> +#define dmb(x) __asm__ __volatile__ ("" : : : "memory")
> #endif
>
> #ifdef CONFIG_ARCH_HAS_BARRIERS
> @@ -42,7 +42,7 @@
> #elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
> #define mb() do { dsb(); outer_sync(); } while (0)
> #define rmb() dsb()
The above two dsb() are missed?
> -#define wmb() mb()
> +#define wmb() do { dsb(st); outer_sync(); } while (0)
> #else
> #define mb() barrier()
> #define rmb() barrier()
> @@ -54,9 +54,9 @@
> #define smp_rmb() barrier()
> #define smp_wmb() barrier()
> #else
> -#define smp_mb() dmb()
> -#define smp_rmb() dmb()
> -#define smp_wmb() dmb()
> +#define smp_mb() dmb(ish)
> +#define smp_rmb() smp_mb()
> +#define smp_wmb() dmb(ishst)
> #endif
>
> #define read_barrier_depends() do { } while(0)
> --
> 1.8.2.2
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Thanks,
--
Ming Lei
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 05/12] ARM: barrier: allow options to be passed to memory barrier instructions
2013-06-21 8:37 ` Ming Lei
@ 2013-06-21 8:51 ` Will Deacon
2013-06-21 8:53 ` Ming Lei
0 siblings, 1 reply; 16+ messages in thread
From: Will Deacon @ 2013-06-21 8:51 UTC (permalink / raw)
To: linux-arm-kernel
Hello,
On Fri, Jun 21, 2013 at 09:37:14AM +0100, Ming Lei wrote:
> On Thu, Jun 20, 2013 at 10:21 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On ARMv7, the memory barrier instructions take an optional `option'
> > field which can be used to constrain the effects of a memory barrier
> > based on shareability and access type.
> >
> > This patch allows the caller to pass these options if required, and
> > updates the smp_*() barriers to request inner-shareable barriers,
> > affecting only stores for the _wmb variant. wmb() is also changed to
> > use the -st version of dsb.
> >
> > Reported-by: Albin Tonnerre <albin.tonnerre@arm.com>
> > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > ---
> > arch/arm/include/asm/assembler.h | 4 ++--
> > arch/arm/include/asm/barrier.h | 32 ++++++++++++++++----------------
> > 2 files changed, 18 insertions(+), 18 deletions(-)
[...]
> > @@ -42,7 +42,7 @@
> > #elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
> > #define mb() do { dsb(); outer_sync(); } while (0)
> > #define rmb() dsb()
>
> The above two dsb() are missed?
I don't think so. What would you suggest changing them to? Remember: there's
no `load' barrier variant in ARMv7, and the non smp_ barriers may be used to
ensure ordering outside of the inner-shareable domain (for example, in I/O
accessors).
Will
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 05/12] ARM: barrier: allow options to be passed to memory barrier instructions
2013-06-21 8:51 ` Will Deacon
@ 2013-06-21 8:53 ` Ming Lei
0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2013-06-21 8:53 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Jun 21, 2013 at 4:51 PM, Will Deacon <will.deacon@arm.com> wrote:
> Hello,
>
> On Fri, Jun 21, 2013 at 09:37:14AM +0100, Ming Lei wrote:
>> On Thu, Jun 20, 2013 at 10:21 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On ARMv7, the memory barrier instructions take an optional `option'
>> > field which can be used to constrain the effects of a memory barrier
>> > based on shareability and access type.
>> >
>> > This patch allows the caller to pass these options if required, and
>> > updates the smp_*() barriers to request inner-shareable barriers,
>> > affecting only stores for the _wmb variant. wmb() is also changed to
>> > use the -st version of dsb.
>> >
>> > Reported-by: Albin Tonnerre <albin.tonnerre@arm.com>
>> > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>> > Signed-off-by: Will Deacon <will.deacon@arm.com>
>> > ---
>> > arch/arm/include/asm/assembler.h | 4 ++--
>> > arch/arm/include/asm/barrier.h | 32 ++++++++++++++++----------------
>> > 2 files changed, 18 insertions(+), 18 deletions(-)
>
> [...]
>> > @@ -42,7 +42,7 @@
>> > #elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
>> > #define mb() do { dsb(); outer_sync(); } while (0)
>> > #define rmb() dsb()
>>
>> The above two dsb() are missed?
>
> I don't think so. What would you suggest changing them to? Remember: there's
> no `load' barrier variant in ARMv7, and the non smp_ barriers may be used to
> ensure ordering outside of the inner-shareable domain (for example, in I/O
> accessors).
SY can be omitted, sorry for the noise.
Thanks,
--
Ming Lei
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2013-06-21 8:53 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-20 14:21 [PATCH v2 00/12] Make use of v7 barrier variants in Linux Will Deacon
2013-06-20 14:21 ` [PATCH v2 01/12] ARM: mm: remove redundant dsb() prior to range TLB invalidation Will Deacon
2013-06-20 14:21 ` [PATCH v2 02/12] ARM: tlb: don't perform inner-shareable invalidation for local TLB ops Will Deacon
2013-06-20 14:21 ` [PATCH v2 03/12] ARM: tlb: don't bother with barriers for branch predictor maintenance Will Deacon
2013-06-20 14:21 ` [PATCH v2 04/12] ARM: tlb: don't perform inner-shareable invalidation for local BP ops Will Deacon
2013-06-20 14:21 ` [PATCH v2 05/12] ARM: barrier: allow options to be passed to memory barrier instructions Will Deacon
2013-06-21 8:37 ` Ming Lei
2013-06-21 8:51 ` Will Deacon
2013-06-21 8:53 ` Ming Lei
2013-06-20 14:21 ` [PATCH v2 06/12] ARM: tlb: reduce scope of barrier domains for TLB invalidation Will Deacon
2013-06-20 14:21 ` [PATCH v2 07/12] ARM: mm: use inner-shareable barriers for TLB and user cache operations Will Deacon
2013-06-20 14:21 ` [PATCH v2 08/12] ARM: spinlock: use inner-shareable dsb variant prior to sev instruction Will Deacon
2013-06-20 14:21 ` [PATCH v2 09/12] ARM: kvm: use inner-shareable barriers after TLB flushing Will Deacon
2013-06-20 14:21 ` [PATCH v2 10/12] ARM: mcpm: use -st dsb option prior to sev instructions Will Deacon
2013-06-20 14:21 ` [PATCH v2 11/12] ARM: l2x0: use -st dsb option for ordering writel_relaxed with unlock Will Deacon
2013-06-20 14:21 ` [PATCH v2 12/12] ARM: cacheflush: use -ishst dsb variant for ensuring flush completion Will Deacon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox