From: Marc Zyngier <maz@kernel.org>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: "Mikołaj Lenczewski" <miko.lenczewski@arm.com>,
catalin.marinas@arm.com, will@kernel.org, corbet@lwn.net,
oliver.upton@linux.dev, joey.gouly@arm.com,
suzuki.poulose@arm.com, yuzenghui@huawei.com,
linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev
Subject: Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
Date: Thu, 12 Dec 2024 15:48:47 +0000 [thread overview]
Message-ID: <86h678sy00.wl-maz@kernel.org> (raw)
In-Reply-To: <2b1cc228-a8d5-4383-ab25-abbbcccd2e2c@arm.com>
On Thu, 12 Dec 2024 15:05:24 +0000,
Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 12/12/2024 14:26, Marc Zyngier wrote:
> > On Thu, 12 Dec 2024 10:55:45 +0000,
> > Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> On 12/12/2024 08:25, Marc Zyngier wrote:
> >>>> +
> >>>> + local_flush_tlb_all();
> >>>
> >>> The elephant in the room: if TLBs are in such a sorry state, what
> >>> guarantees we can make it this far?
> >>
> >> I'll leave Miko to respond to your other comments, but I wanted to address this
> >> one, since it's pretty fundamental. We went around this loop internally and
> >> concluded that what we are doing is architecturally sound.
> >>
> >> The expectation is that a conflict abort can only be generated as a result of
> >> the change in patch 4 (and patch 5). That change makes it possible for the TLB
> >> to end up with a multihit. But crucially that can only happen for user space
> >> memory because that change only operates on user memory. And while the TLB may
> >> detect the conflict at any time, the conflict abort is only permitted to be
> >> reported when an architectural access is prevented by the conflict. So we never
> >> do anything that would allow a conflict for a kernel memory access and a user
> >> memory conflict abort can never be triggered as a result of accessing kernel memory.
> >>
> >> Copy/pasting comment from AlexC on the topic, which explains it better than I can:
> >>
> >> """
> >> The intent is certainly that in cases where the hardware detects a TLB conflict
> >> abort, it is only permitted to report it (by generating an exception) if it
> >> applies to an access that is being attempted architecturally. ... that property
> >> can be built from the following two properties:
> >>
> >> 1. The TLB conflict can only be reported as an Instruction Abort or a Data Abort
> >>
> >> 2. Those two exception types must be reported synchronously and precisely.
> >> """
> >
> > I totally agree with this. The issue is that nothing says that the
> > abort is in any way related to userspace.
> >
> >>>
> >>> I honestly don't think you can reliably handle a TLB Conflict abort in
> >>> the same translation regime as the original fault, given that we don't
> >>> know the scope of that fault. You are probably making an educated
> >>> guess that it is good enough on the CPUs you know of, but I don't see
> >>> anything in the architecture that indicates the "blast radius" of a
> >>> TLB conflict.
> >>
> >> OK, so I'm claiming that the blast radius is limited to the region of memory
> >> that we are operating on in contpte_collapse() in patch 4. Perhaps we need to go
> >> re-read the ARM and come back with the specific statements that led us to that
> >> conclusion?
>
> From the ARM:
> """
> RFCPSG: If level 1 or level 2 is supported and the Contiguous bit in a set of
> Block descriptors or Page descriptors is changed, then a TLB conflict abort can
> be generated because multiple translation table entries might exist within a TLB
> that translates the same IA.
> """
>
> Although I guess it's not totally explicit, I've interpretted that as saying
> that conflicting TLB entries can only arise for the IA range for which the
> contiguous bits have been modified in the translation tables.
Right, that's reassuring, thanks for digging that one.
> Given we are only fiddling with the contiguous bits for user space mappings in
> this way, that's why I'm asserting we will only get a conflict abort for user
> space mappings... assuming the absence of kernel bugs, anyway...
For now. But if you dare scanning the list, you'll find a lot of
people willing to do far more than just that. Including changing the
shape of the linear map.
>
> >
> > But we don't know for sure what caused this conflict by the time we
> > arrive in the handler. It could equally be because we have a glaring
> > bug somewhere on the kernel side, even if you are *now* only concerned
> > with userspace.
>
> OK I see what you are saying; previously a conflict abort would have led to
> calling do_bad(), which returns 1, which causes do_mem_abort() to either kill
> the kernel or the process depending on the origin of the abort. (although if it
> came from kernel due to bug, we're just hoping that the conflict doesn't affect
> the path through the handler). With this change, we always assume we can fix it
> with the TLBI.
>
> How about this change to ensure we still die for issues originating from the kernel?
>
> if (!user_mode(regs) || !system_supports_bbml2())
> return do_bad(far, esr, regs);
That wouldn't catch a TLB conflict on get_user(), would it?
> > If anything, this should absolutely check for FAR_EL1 and assert that
> > this is indeed caused by such change.
>
> I'm not really sure how we would check this reliably? Without patch 5, the
> problem is somewhat constrained; we could have as many changes in flight as
> there are CPUs so we could keep a list of all the {mm_struct, VA-range} that are
> being modified. But if patch 5 is confirmed to be architecturally sound, then
> there is no "terminating tlbi" so there is no bound on the set of {mm_struct,
> VA-range}'s that could legitimately cause a conflict abort.
I didn't mean to imply that we should identify the exact cause of the
abort. I was hoping to simply check that FAR_EL1 reports a userspace
VA. Why wouldn't that work?
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2024-12-12 15:48 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-11 16:01 [RESEND RFC PATCH v1 0/5] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
2024-12-11 16:01 ` [RESEND RFC PATCH v1 1/5] arm64: Add TLB Conflict Abort Exception handler to KVM Mikołaj Lenczewski
2024-12-11 17:40 ` Marc Zyngier
2024-12-12 9:23 ` Ryan Roberts
2024-12-12 9:57 ` Marc Zyngier
2024-12-12 10:37 ` Ryan Roberts
2024-12-13 16:24 ` Mikołaj Lenczewski
2024-12-11 16:01 ` [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
2024-12-12 8:25 ` Marc Zyngier
2024-12-12 10:55 ` Ryan Roberts
2024-12-12 14:26 ` Marc Zyngier
2024-12-12 15:05 ` Ryan Roberts
2024-12-12 15:48 ` Marc Zyngier [this message]
2024-12-12 16:03 ` Ryan Roberts
2024-12-19 16:45 ` Will Deacon
2025-01-02 12:07 ` Jonathan Cameron
2025-01-02 12:30 ` Marc Zyngier
2025-01-03 15:35 ` Will Deacon
2025-01-03 16:00 ` Ryan Roberts
2025-01-03 18:18 ` Jonathan Cameron
2024-12-13 16:53 ` Mikołaj Lenczewski
2024-12-13 16:49 ` Mikołaj Lenczewski
2024-12-11 16:01 ` [RESEND RFC PATCH v1 3/5] arm64: Add errata and workarounds for systems with broken BBML2 Mikołaj Lenczewski
2024-12-11 16:01 ` [RESEND RFC PATCH v1 4/5] arm64/mm: Delay tlbi in contpte_convert() under BBML2 Mikołaj Lenczewski
2024-12-19 16:36 ` Will Deacon
2024-12-11 16:01 ` [RESEND RFC PATCH v1 5/5] arm64/mm: Elide " Mikołaj Lenczewski
2024-12-19 16:37 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86h678sy00.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=corbet@lwn.net \
--cc=joey.gouly@arm.com \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miko.lenczewski@arm.com \
--cc=oliver.upton@linux.dev \
--cc=ryan.roberts@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=will@kernel.org \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.