From: jonathan.austin@arm.com (Jonathan Austin)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 02/10] ARM: tlb: don't perform inner-shareable invalidation for local TLB ops
Date: Thu, 13 Jun 2013 18:50:03 +0100 [thread overview]
Message-ID: <51BA064B.7010204@arm.com> (raw)
In-Reply-To: <1370528914-17506-3-git-send-email-will.deacon@arm.com>
Hi Will,
On 06/06/13 15:28, Will Deacon wrote:
> Inner-shareable TLB invalidation is typically more expensive than local
> (non-shareable) invalidation, so performing the broadcasting for
> local_flush_tlb_* operations is a waste of cycles and needlessly
> clobbers entries in the TLBs of other CPUs.
>
> This patch introduces __flush_tlb_* versions for many of the TLB
> invalidation functions, which only respect inner-shareable variants of
> the invalidation instructions. This allows us to modify the v7 SMP TLB
> flags to include *both* inner-shareable and non-shareable operations and
> then check the relevant flags depending on whether the operation is
> local or not.
I think this approach leaves us in trouble for some SMP_ON_UP cores as
the IS versions of the instructions don't exist for them.
Is there something that should be ensuring your new __flush_tlb*
functions don't get called for SMP_ON_UP? If not it looks like we might
need to do some runtime patching with the ALT_SMP/ALT_UP macros...
I've commented on one of the example inline below...
>
> This gains us around 0.5% in hackbench scores for a dual-core A15, but I
> would expect this to improve as more cores (and clusters) are added to
> the equation.
>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Reported-by: Albin Tonnerre <Albin.Tonnerre@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
> arch/arm/include/asm/tlbflush.h | 67 ++++++++++++++++++++++++++++++++++++++---
> arch/arm/kernel/smp_tlb.c | 8 ++---
> arch/arm/mm/context.c | 6 +---
> 3 files changed, 68 insertions(+), 13 deletions(-)
>
> diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
> index a3625d1..55b5e18 100644
> --- a/arch/arm/include/asm/tlbflush.h
> +++ b/arch/arm/include/asm/tlbflush.h
> @@ -167,6 +167,8 @@
> #endif
>
> #define v7wbi_tlb_flags_smp (TLB_WB | TLB_BARRIER | \
> + TLB_V6_U_FULL | TLB_V6_U_PAGE | \
> + TLB_V6_U_ASID | \
> TLB_V7_UIS_FULL | TLB_V7_UIS_PAGE | \
> TLB_V7_UIS_ASID | TLB_V7_UIS_BP)
> #define v7wbi_tlb_flags_up (TLB_WB | TLB_DCLEAN | TLB_BARRIER | \
> @@ -330,6 +332,21 @@ static inline void local_flush_tlb_all(void)
> tlb_op(TLB_V4_U_FULL | TLB_V6_U_FULL, "c8, c7, 0", zero);
> tlb_op(TLB_V4_D_FULL | TLB_V6_D_FULL, "c8, c6, 0", zero);
> tlb_op(TLB_V4_I_FULL | TLB_V6_I_FULL, "c8, c5, 0", zero);
> +
> + if (tlb_flag(TLB_BARRIER)) {
> + dsb();
> + isb();
> + }
> +}
> +
> +static inline void __flush_tlb_all(void)
> +{
> + const int zero = 0;
> + const unsigned int __tlb_flag = __cpu_tlb_flags;
> +
> + if (tlb_flag(TLB_WB))
> + dsb();
> +
> tlb_op(TLB_V7_UIS_FULL, "c8, c3, 0", zero);
I think we can get away with something similar to what we do in the
cache maintenance case here, using ALT_SMP and ALT_UP to do runtime code
patching and use TLB_V6_U_* for the UP case...
A follow on question is whether we still need to keep the *non* unified
TLB maintenance operations (eg DTLBIALL, ITLBIALL). As far as I can see
looking in to old TRMs, the last ARM CPU that didn't automatically treat
those I/D ops to unified ones was ARM10, so not relevant here...
But - do some of the non-ARM cores exploit the (now deprecated) option
to maintain these separately? Also did I miss some more obscure ARM variant?
Jonny
>
> if (tlb_flag(TLB_BARRIER)) {
> @@ -348,21 +365,32 @@ static inline void local_flush_tlb_mm(struct mm_struct *mm)
> dsb();
>
> if (possible_tlb_flags & (TLB_V4_U_FULL|TLB_V4_D_FULL|TLB_V4_I_FULL)) {
> - if (cpumask_test_cpu(get_cpu(), mm_cpumask(mm))) {
> + if (cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm))) {
> tlb_op(TLB_V4_U_FULL, "c8, c7, 0", zero);
> tlb_op(TLB_V4_D_FULL, "c8, c6, 0", zero);
> tlb_op(TLB_V4_I_FULL, "c8, c5, 0", zero);
> }
> - put_cpu();
> }
>
> tlb_op(TLB_V6_U_ASID, "c8, c7, 2", asid);
> tlb_op(TLB_V6_D_ASID, "c8, c6, 2", asid);
> tlb_op(TLB_V6_I_ASID, "c8, c5, 2", asid);
> +
> + if (tlb_flag(TLB_BARRIER))
> + dsb();
> +}
> +
> +static inline void __flush_tlb_mm(struct mm_struct *mm)
> +{
> + const unsigned int __tlb_flag = __cpu_tlb_flags;
> +
> + if (tlb_flag(TLB_WB))
> + dsb();
> +
> #ifdef CONFIG_ARM_ERRATA_720789
> - tlb_op(TLB_V7_UIS_ASID, "c8, c3, 0", zero);
> + tlb_op(TLB_V7_UIS_ASID, "c8, c3, 0", 0);
> #else
> - tlb_op(TLB_V7_UIS_ASID, "c8, c3, 2", asid);
> + tlb_op(TLB_V7_UIS_ASID, "c8, c3, 2", ASID(mm));
> #endif
>
> if (tlb_flag(TLB_BARRIER))
> @@ -392,6 +420,21 @@ local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
> tlb_op(TLB_V6_U_PAGE, "c8, c7, 1", uaddr);
> tlb_op(TLB_V6_D_PAGE, "c8, c6, 1", uaddr);
> tlb_op(TLB_V6_I_PAGE, "c8, c5, 1", uaddr);
> +
> + if (tlb_flag(TLB_BARRIER))
> + dsb();
> +}
> +
> +static inline void
> +__flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
> +{
> + const unsigned int __tlb_flag = __cpu_tlb_flags;
> +
> + uaddr = (uaddr & PAGE_MASK) | ASID(vma->vm_mm);
> +
> + if (tlb_flag(TLB_WB))
> + dsb();
> +
> #ifdef CONFIG_ARM_ERRATA_720789
> tlb_op(TLB_V7_UIS_PAGE, "c8, c3, 3", uaddr & PAGE_MASK);
> #else
> @@ -421,6 +464,22 @@ static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
> tlb_op(TLB_V6_U_PAGE, "c8, c7, 1", kaddr);
> tlb_op(TLB_V6_D_PAGE, "c8, c6, 1", kaddr);
> tlb_op(TLB_V6_I_PAGE, "c8, c5, 1", kaddr);
> +
> + if (tlb_flag(TLB_BARRIER)) {
> + dsb();
> + isb();
> + }
> +}
> +
> +static inline void __flush_tlb_kernel_page(unsigned long kaddr)
> +{
> + const unsigned int __tlb_flag = __cpu_tlb_flags;
> +
> + kaddr &= PAGE_MASK;
> +
> + if (tlb_flag(TLB_WB))
> + dsb();
> +
> tlb_op(TLB_V7_UIS_PAGE, "c8, c3, 1", kaddr);
>
> if (tlb_flag(TLB_BARRIER)) {
> diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
> index 9a52a07..cc299b5 100644
> --- a/arch/arm/kernel/smp_tlb.c
> +++ b/arch/arm/kernel/smp_tlb.c
> @@ -135,7 +135,7 @@ void flush_tlb_all(void)
> if (tlb_ops_need_broadcast())
> on_each_cpu(ipi_flush_tlb_all, NULL, 1);
> else
> - local_flush_tlb_all();
> + __flush_tlb_all();
> broadcast_tlb_a15_erratum();
> }
>
> @@ -144,7 +144,7 @@ void flush_tlb_mm(struct mm_struct *mm)
> if (tlb_ops_need_broadcast())
> on_each_cpu_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, 1);
> else
> - local_flush_tlb_mm(mm);
> + __flush_tlb_mm(mm);
> broadcast_tlb_mm_a15_erratum(mm);
> }
>
> @@ -157,7 +157,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
> on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page,
> &ta, 1);
> } else
> - local_flush_tlb_page(vma, uaddr);
> + __flush_tlb_page(vma, uaddr);
> broadcast_tlb_mm_a15_erratum(vma->vm_mm);
> }
>
> @@ -168,7 +168,7 @@ void flush_tlb_kernel_page(unsigned long kaddr)
> ta.ta_start = kaddr;
> on_each_cpu(ipi_flush_tlb_kernel_page, &ta, 1);
> } else
> - local_flush_tlb_kernel_page(kaddr);
> + __flush_tlb_kernel_page(kaddr);
> broadcast_tlb_a15_erratum();
> }
>
> diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
> index 2ac3737..62c1ec5 100644
> --- a/arch/arm/mm/context.c
> +++ b/arch/arm/mm/context.c
> @@ -134,10 +134,7 @@ static void flush_context(unsigned int cpu)
> }
>
> /* Queue a TLB invalidate and flush the I-cache if necessary. */
> - if (!tlb_ops_need_broadcast())
> - cpumask_set_cpu(cpu, &tlb_flush_pending);
> - else
> - cpumask_setall(&tlb_flush_pending);
> + cpumask_setall(&tlb_flush_pending);
>
> if (icache_is_vivt_asid_tagged())
> __flush_icache_all();
> @@ -215,7 +212,6 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
> if (cpumask_test_and_clear_cpu(cpu, &tlb_flush_pending)) {
> local_flush_bp_all();
> local_flush_tlb_all();
> - dummy_flush_tlb_a15_erratum();
> }
>
> atomic64_set(&per_cpu(active_asids, cpu), asid);
>
next prev parent reply other threads:[~2013-06-13 17:50 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-06 14:28 [PATCH 00/10] Make use of v7 barrier variants in Linux Will Deacon
2013-06-06 14:28 ` [PATCH 01/10] ARM: mm: remove redundant dsb() prior to range TLB invalidation Will Deacon
2013-06-06 14:28 ` [PATCH 02/10] ARM: tlb: don't perform inner-shareable invalidation for local TLB ops Will Deacon
2013-06-13 17:50 ` Jonathan Austin [this message]
2013-06-18 11:32 ` Will Deacon
2013-06-06 14:28 ` [PATCH 03/10] ARM: tlb: don't bother with barriers for branch predictor maintenance Will Deacon
2013-06-06 14:28 ` [PATCH 04/10] ARM: tlb: don't perform inner-shareable invalidation for local BP ops Will Deacon
2013-06-06 14:28 ` [PATCH 05/10] ARM: barrier: allow options to be passed to memory barrier instructions Will Deacon
2013-06-06 14:28 ` [PATCH 06/10] ARM: spinlock: use inner-shareable dsb variant prior to sev instruction Will Deacon
2013-06-06 14:28 ` [PATCH 07/10] ARM: mm: use inner-shareable barriers for TLB and user cache operations Will Deacon
2013-06-06 14:28 ` [PATCH 08/10] ARM: tlb: reduce scope of barrier domains for TLB invalidation Will Deacon
2013-06-06 14:28 ` [PATCH 09/10] ARM: kvm: use inner-shareable barriers after TLB flushing Will Deacon
2013-06-06 14:28 ` [PATCH 10/10] ARM: mcpm: use -st dsb option prior to sev instructions Will Deacon
2013-06-07 4:15 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51BA064B.7010204@arm.com \
--to=jonathan.austin@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.