From: Marc Zyngier <maz@kernel.org>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: "Mikołaj Lenczewski" <miko.lenczewski@arm.com>,
catalin.marinas@arm.com, will@kernel.org, corbet@lwn.net,
oliver.upton@linux.dev, joey.gouly@arm.com,
suzuki.poulose@arm.com, yuzenghui@huawei.com,
linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev
Subject: Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
Date: Thu, 12 Dec 2024 15:48:47 +0000 [thread overview]
Message-ID: <86h678sy00.wl-maz@kernel.org> (raw)
In-Reply-To: <2b1cc228-a8d5-4383-ab25-abbbcccd2e2c@arm.com>
On Thu, 12 Dec 2024 15:05:24 +0000,
Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 12/12/2024 14:26, Marc Zyngier wrote:
> > On Thu, 12 Dec 2024 10:55:45 +0000,
> > Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> On 12/12/2024 08:25, Marc Zyngier wrote:
> >>>> +
> >>>> + local_flush_tlb_all();
> >>>
> >>> The elephant in the room: if TLBs are in such a sorry state, what
> >>> guarantees we can make it this far?
> >>
> >> I'll leave Miko to respond to your other comments, but I wanted to address this
> >> one, since it's pretty fundamental. We went around this loop internally and
> >> concluded that what we are doing is architecturally sound.
> >>
> >> The expectation is that a conflict abort can only be generated as a result of
> >> the change in patch 4 (and patch 5). That change makes it possible for the TLB
> >> to end up with a multihit. But crucially that can only happen for user space
> >> memory because that change only operates on user memory. And while the TLB may
> >> detect the conflict at any time, the conflict abort is only permitted to be
> >> reported when an architectural access is prevented by the conflict. So we never
> >> do anything that would allow a conflict for a kernel memory access and a user
> >> memory conflict abort can never be triggered as a result of accessing kernel memory.
> >>
> >> Copy/pasting comment from AlexC on the topic, which explains it better than I can:
> >>
> >> """
> >> The intent is certainly that in cases where the hardware detects a TLB conflict
> >> abort, it is only permitted to report it (by generating an exception) if it
> >> applies to an access that is being attempted architecturally. ... that property
> >> can be built from the following two properties:
> >>
> >> 1. The TLB conflict can only be reported as an Instruction Abort or a Data Abort
> >>
> >> 2. Those two exception types must be reported synchronously and precisely.
> >> """
> >
> > I totally agree with this. The issue is that nothing says that the
> > abort is in any way related to userspace.
> >
> >>>
> >>> I honestly don't think you can reliably handle a TLB Conflict abort in
> >>> the same translation regime as the original fault, given that we don't
> >>> know the scope of that fault. You are probably making an educated
> >>> guess that it is good enough on the CPUs you know of, but I don't see
> >>> anything in the architecture that indicates the "blast radius" of a
> >>> TLB conflict.
> >>
> >> OK, so I'm claiming that the blast radius is limited to the region of memory
> >> that we are operating on in contpte_collapse() in patch 4. Perhaps we need to go
> >> re-read the ARM and come back with the specific statements that led us to that
> >> conclusion?
>
> From the ARM:
> """
> RFCPSG: If level 1 or level 2 is supported and the Contiguous bit in a set of
> Block descriptors or Page descriptors is changed, then a TLB conflict abort can
> be generated because multiple translation table entries might exist within a TLB
> that translates the same IA.
> """
>
> Although I guess it's not totally explicit, I've interpretted that as saying
> that conflicting TLB entries can only arise for the IA range for which the
> contiguous bits have been modified in the translation tables.
Right, that's reassuring, thanks for digging that one.
> Given we are only fiddling with the contiguous bits for user space mappings in
> this way, that's why I'm asserting we will only get a conflict abort for user
> space mappings... assuming the absence of kernel bugs, anyway...
For now. But if you dare scanning the list, you'll find a lot of
people willing to do far more than just that. Including changing the
shape of the linear map.
>
> >
> > But we don't know for sure what caused this conflict by the time we
> > arrive in the handler. It could equally be because we have a glaring
> > bug somewhere on the kernel side, even if you are *now* only concerned
> > with userspace.
>
> OK I see what you are saying; previously a conflict abort would have led to
> calling do_bad(), which returns 1, which causes do_mem_abort() to either kill
> the kernel or the process depending on the origin of the abort. (although if it
> came from kernel due to bug, we're just hoping that the conflict doesn't affect
> the path through the handler). With this change, we always assume we can fix it
> with the TLBI.
>
> How about this change to ensure we still die for issues originating from the kernel?
>
> if (!user_mode(regs) || !system_supports_bbml2())
> return do_bad(far, esr, regs);
That wouldn't catch a TLB conflict on get_user(), would it?
> > If anything, this should absolutely check for FAR_EL1 and assert that
> > this is indeed caused by such change.
>
> I'm not really sure how we would check this reliably? Without patch 5, the
> problem is somewhat constrained; we could have as many changes in flight as
> there are CPUs so we could keep a list of all the {mm_struct, VA-range} that are
> being modified. But if patch 5 is confirmed to be architecturally sound, then
> there is no "terminating tlbi" so there is no bound on the set of {mm_struct,
> VA-range}'s that could legitimately cause a conflict abort.
I didn't mean to imply that we should identify the exact cause of the
abort. I was hoping to simply check that FAR_EL1 reports a userspace
VA. Why wouldn't that work?
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2024-12-12 15:48 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-11 16:01 [RESEND RFC PATCH v1 0/5] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
2024-12-11 16:01 ` [RESEND RFC PATCH v1 1/5] arm64: Add TLB Conflict Abort Exception handler to KVM Mikołaj Lenczewski
2024-12-11 17:40 ` Marc Zyngier
2024-12-12 9:23 ` Ryan Roberts
2024-12-12 9:57 ` Marc Zyngier
2024-12-12 10:37 ` Ryan Roberts
2024-12-13 16:24 ` Mikołaj Lenczewski
2024-12-11 16:01 ` [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
2024-12-12 8:25 ` Marc Zyngier
2024-12-12 10:55 ` Ryan Roberts
2024-12-12 14:26 ` Marc Zyngier
2024-12-12 15:05 ` Ryan Roberts
2024-12-12 15:48 ` Marc Zyngier [this message]
2024-12-12 16:03 ` Ryan Roberts
2024-12-19 16:45 ` Will Deacon
2025-01-02 12:07 ` Jonathan Cameron
2025-01-02 12:30 ` Marc Zyngier
2025-01-03 15:35 ` Will Deacon
2025-01-03 16:00 ` Ryan Roberts
2025-01-03 18:18 ` Jonathan Cameron
2024-12-13 16:53 ` Mikołaj Lenczewski
2024-12-13 16:49 ` Mikołaj Lenczewski
2024-12-11 16:01 ` [RESEND RFC PATCH v1 3/5] arm64: Add errata and workarounds for systems with broken BBML2 Mikołaj Lenczewski
2024-12-11 16:01 ` [RESEND RFC PATCH v1 4/5] arm64/mm: Delay tlbi in contpte_convert() under BBML2 Mikołaj Lenczewski
2024-12-19 16:36 ` Will Deacon
2024-12-11 16:01 ` [RESEND RFC PATCH v1 5/5] arm64/mm: Elide " Mikołaj Lenczewski
2024-12-19 16:37 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86h678sy00.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=corbet@lwn.net \
--cc=joey.gouly@arm.com \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miko.lenczewski@arm.com \
--cc=oliver.upton@linux.dev \
--cc=ryan.roberts@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=will@kernel.org \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).