From: Mark Rutland <mark.rutland@arm.com>
To: Gang Li <ligang.bdlg@bytedance.com>
Cc: Will Deacon <will@kernel.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Ard Biesheuvel <ardb@kernel.org>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Kefeng Wang <wangkefeng.wang@huawei.com>,
Feiyang Chen <chenfeiyang@loongson.cn>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [QUESTION FOR ARM64 TLB] performance issue and implementation difference of TLB flush
Date: Tue, 16 May 2023 12:51:46 +0100 [thread overview]
Message-ID: <ZGNuUjG9cnzSkXEN@FVFF77S0Q05N> (raw)
In-Reply-To: <fd8bda4b-32ee-d06d-af77-12e30e70c0bf@bytedance.com>
On Tue, May 16, 2023 at 03:47:16PM +0800, Gang Li wrote:
> Hi,
>
> On 2023/5/9 22:30, Mark Rutland wrote:
> > For example, early in D8.13 we have the rule:
> >
> > | R_SQBCS
> > |
> > | When address translation is enabled, a translation table entry for an
> > | in-context translation regime that does not cause a Translation fault, an
> > | Address size fault, or an Access flag fault is permitted to be cached in a
> > | TLB or intermediate TLB caching structure as the result of an explicit or
> > | speculative access.
> >
>
> Thanks a lot!
>
> I looked up the x86 manual and found that the x86 TLB cache mechanism is
> similar to arm64 (but the x86 guys haven't reply me yet):
>
> Intel® 64 and IA-32 Architectures Software Developer Manuals:
> > 4.10.2.3 Details of TLB Use
> > Subject to the limitations given in the previous paragraph, the
> > processor may cache a translation for any linear address, even if that
> > address is not used to access memory. For example, the processor may
> > cache translations required for prefetches and for accesses that result
> > from speculative execution that would never actually occur in the
> > executed code path.
>
> Both architectures have similar TLB cache policies, why arm64 flush all
> and x86 flush local in ghes_map and ghes_unmap?
>
> I think flush all may be unnecessary.
>
> 1. Before accessing ghes data. Each CPU needs to call ghes_map, which
> will create the mapping and flush their own TLb to make sure the current
> CPU is using the latest mapping.
>
> 2. And there is no need to flush all in ghes_unmap, because the ghes_map
> of other CPUs will flush their own TLBs before accessing the memory.
This is not sufficient. Regardless of whether CPUs *explicitly* access the VA
range, any CPU which can reach the live translation table entry is allowed to
fetch that and allocate it into a TLB at any time.
When a Break-Before-Make sequence isn't followed, the architecture permits a
number of resulting behaviours, including "amalgamation", where the TLB entries
are combined in some arbitrary IMPLEMENTATION DEFINED way. The architecture
isn't very clear here, but doesn't rule out two entries being combined such
that it generates an atbirary physical address and/or such tha the MMU thinks
the entry is from an intermediate walk. In either of those cases, the CPU might
speculative access device memory (which could change the state of the system,
or cause fatal SErrors), and/or allocate further junk into TLBs.
So per the architecture, broadcast maintenance is necessary on arm64. The only
way to avoid it would be to have a local set of translation tables which are
not shared with other CPUs.
I suspect x86 might not have the same issue with amalgamation.
Thanks,
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
prev parent reply other threads:[~2023-05-16 11:52 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-27 3:26 [QUESTION FOR ARM64 TLB] performance issue and implementation difference of TLB flush Gang Li
2023-04-27 7:30 ` Mark Rutland
2023-05-05 9:48 ` Gang Li
2023-05-05 12:28 ` Gang Li
2023-05-16 3:16 ` Gang Li
2023-05-06 2:51 ` Gang Li
[not found] ` <ZFpZAGeEXomG/eKS@FVFF77S0Q05N.cambridge.arm.com>
2023-05-16 7:47 ` Gang Li
2023-05-16 11:51 ` Mark Rutland [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZGNuUjG9cnzSkXEN@FVFF77S0Q05N \
--to=mark.rutland@arm.com \
--cc=anshuman.khandual@arm.com \
--cc=ardb@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=chenfeiyang@loongson.cn \
--cc=ligang.bdlg@bytedance.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox