public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Gang Li <ligang.bdlg@bytedance.com>
Cc: Will Deacon <will@kernel.org>,
	Tomasz Nowicki <tomasz.nowicki@linaro.org>,
	Laura Abbott <lauraa@codeaurora.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Feiyang Chen <chenfeiyang@loongson.cn>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [QUESTION FOR ARM64 TLB] performance issue and implementation difference of TLB flush
Date: Thu, 27 Apr 2023 08:30:32 +0100	[thread overview]
Message-ID: <ZEokfJSM9a4ZZvQv@FVFF77S0Q05N> (raw)
In-Reply-To: <2eb026b8-9e13-2b60-9e14-06417b142ac9@bytedance.com>

Hi,

On Thu, Apr 27, 2023 at 11:26:50AM +0800, Gang Li wrote:
> Hi all,
> 
> I have encountered a performance issue on our ARM64 machine, which seems
> to be caused by the flush_tlb_kernel_range.

Can you please provide a few more details on what you're seeing?

What does your performance issue look like?

Are you sure that the performance issue is caused by flush_tlb_kernel_range()
specifically?

> Here is the stack on the ARM64 machine:
> 
> # ARM64:
> ```
>     ghes_unmap
>     clear_fixmap
>     __set_fixmap
>     flush_tlb_kernel_range
> ```
> 
> As we can see, the ARM64 implementation eventually calls
> flush_tlb_kernel_range, which flushes the TLB on all cores. However, on
> AMD64, the implementation calls flush_tlb_one_kernel instead.
> 
> # AMD64:
> ```
>     ghes_unmap
>     clear_fixmap
>     __set_fixmap
>     mmu.set_fixmap
>     native_set_fixmap
>     __native_set_fixmap
>     set_pte_vaddr
>     set_pte_vaddr_p4d
>     __set_pte_vaddr
>     flush_tlb_one_kernel
> ```
> 
> On our ARM64 machine, flush_tlb_kernel_range is causing a noticeable
> performance degradation.

As above, could you please provide more details on this?

> This arm64 patch said:
> https://lore.kernel.org/all/20161201135112.15396-1-fu.wei@linaro.org/
> (commit 9f9a35a7b654e006250530425eb1fb527f0d32e9)
> 
> ```
> /*
>  * Despite its name, this function must still broadcast the TLB
>  * invalidation in order to ensure other CPUs don't end up with junk
>  * entries as a result of speculation. Unusually, its also called in
>  * IRQ context (ghes_iounmap_irq) so if we ever need to use IPIs for
>  * TLB broadcasting, then we're in trouble here.
>  */
> static inline void arch_apei_flush_tlb_one(unsigned long addr)
> {
>     flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> }
> ```
> 
> 1. I am curious to know the reason behind the design choice of flushing
> the TLB on all cores for ARM64's clear_fixmap, while AMD64 only flushes
> the TLB on a single core. Are there any TLB design details that make a
> difference here?

I don't know why arm64 only clears this on a single CPU.

On arm64 we *must* invalidate the TLB on all CPUs as the kernel page tables are
shared by all CPUs, and the architectural Break-Before-Make rules in require
the TLB to be invalidated between two valid (but distinct) entries.

> 2. Is it possible to let the ARM64 to flush the TLB on just one core,
> similar to the AMD64?

No. If we omitted the broadcast TLB invalidation, then a different CPU may
fetch the old value into a TLB, then fetch the new value. When this happens,
the architecture permits "amalgamation", with UNPREDICTABLE results, which
could result in memory corruption, taking SErrors, etc.

> 3. If so, would there be any potential drawbacks or limitations to
> making such a change?

As above, we must use broadcast TLB invalidation here.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2023-04-27  7:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-27  3:26 [QUESTION FOR ARM64 TLB] performance issue and implementation difference of TLB flush Gang Li
2023-04-27  7:30 ` Mark Rutland [this message]
2023-05-05  9:48   ` Gang Li
2023-05-05 12:28     ` Gang Li
2023-05-16  3:16       ` Gang Li
2023-05-06  2:51     ` Gang Li
     [not found]       ` <ZFpZAGeEXomG/eKS@FVFF77S0Q05N.cambridge.arm.com>
2023-05-16  7:47         ` Gang Li
2023-05-16 11:51           ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZEokfJSM9a4ZZvQv@FVFF77S0Q05N \
    --to=mark.rutland@arm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chenfeiyang@loongson.cn \
    --cc=lauraa@codeaurora.org \
    --cc=ligang.bdlg@bytedance.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tomasz.nowicki@linaro.org \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox