linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
To: Marc Zyngier <maz@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	Oliver Upton <oliver.upton@linux.dev>,
	Joey Gouly <joey.gouly@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>, Will Deacon <will@kernel.org>,
	Suzuki K Poulose <Suzuki.Poulose@arm.com>,
	Steven Price <steven.price@arm.com>,
	Peter Collingbourne <pcc@google.com>
Subject: Re: [PATCH] KVM: arm64: Drop mte_allowed check during memslot creation
Date: Wed, 26 Feb 2025 15:28:26 +0530	[thread overview]
Message-ID: <yq5a8qptauyl.fsf@kernel.org> (raw)
In-Reply-To: <86ikozqmsl.wl-maz@kernel.org>

Marc Zyngier <maz@kernel.org> writes:

> On Mon, 24 Feb 2025 16:44:06 +0000,
> Aneesh Kumar K.V <aneesh.kumar@kernel.org> wrote:
>> 
>> Marc Zyngier <maz@kernel.org> writes:
>> 
>> > On Mon, 24 Feb 2025 14:39:16 +0000,
>> > Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >>
>> >> On Mon, Feb 24, 2025 at 12:24:14PM +0000, Marc Zyngier wrote:
>> >> > On Mon, 24 Feb 2025 11:05:33 +0000,
>> >> > Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >> > > On Mon, Feb 24, 2025 at 03:09:38PM +0530, Aneesh Kumar K.V (Arm) wrote:
>> >> > > > This change is needed because, without it, users are not able to use MTE
>> >> > > > with VFIO passthrough (currently the mapping is either Device or
>> >> > > > NonCacheable for which tag access check is not applied.), as shown
>> >> > > > below (kvmtool VMM).
>> >> > >
>> >> > > Another nit: "users are not able to user VFIO passthrough when MTE is
>> >> > > enabled". At a first read, the above sounded to me like one wants to
>> >> > > enable MTE for VFIO passthrough mappings.
>> >> >
>> >> > What the commit message doesn't spell out is how MTE and VFIO are
>> >> > interacting here. I also don't understand the reference to Device or
>> >> > NC memory here.
>> >>
>> >> I guess it's saying that the guest cannot turn MTE on (Normal Tagged)
>> >> for these ranges anyway since Stage 2 is Device or Normal NC. So we
>> >> don't break any use-case specific to VFIO.
>> >>
>> >> > Isn't the issue that DMA doesn't check/update tags, and therefore it
>> >> > makes little sense to prevent non-tagged memory being associated with
>> >> > a memslot?
>> >>
>> >> The issue is that some MMIO memory range that does not support MTE
>> >> (well, all MMIO) could be mapped by the guest as Normal Tagged and we
>> >> have no clue what the hardware does as tag accesses, hence we currently
>> >> prevent it altogether. It's not about DMA.
>> >>
>> >> This patch still prevents such MMIO+MTE mappings but moves the decision
>> >> to user_mem_abort() and it's slightly more relaxed - only rejecting it
>> >> if !VM_MTE_ALLOWED _and_ the Stage 2 is cacheable. The side-effect is
>> >> that it allows device assignment into the guest since Stage 2 is not
>> >> Normal Cacheable (at least for now, we have some patches Ankit but they
>> >> handle the MTE case).
>> >
>> > The other side effect is that it also allows non-tagged cacheable
>> > memory to be given to the MTE-enabled guest, and the guest has no way
>> > to distinguish between what is tagged and what's not.
>> >
>> >>
>> >> > My other concern is that this gives pretty poor consistency to the
>> >> > guest, which cannot know what can be tagged and what cannot, and
>> >> > breaks a guarantee that the guest should be able to rely on.
>> >>
>> >> The guest should not set Normal Tagged on anything other than what it
>> >> gets as standard RAM. We are not changing this here. KVM than needs to
>> >> prevent a broken/malicious guest from setting MTE on other (physical)
>> >> ranges that don't support MTE. Currently it can only do this by forcing
>> >> Device or Normal NC (or disable MTE altogether). Later we'll add
>> >> FEAT_MTE_PERM to permit Stage 2 Cacheable but trap on tag accesses.
>> >>
>> >> The ABI change is just for the VMM, the guest shouldn't be aware as
>> >> long as it sticks to the typical recommendations for MTE - only enable
>> >> on standard RAM.
>> >
>> > See above. You fall into the same trap with standard memory, since you
>> > now allow userspace to mix things at will, and only realise something
>> > has gone wrong on access (and -EFAULT is not very useful).
>> >
>> >>
>> >> Does any VMM rely on the memory slot being rejected on registration if
>> >> it does not support MTE? After this change, we'd get an exit to the VMM
>> >> on guest access with MTE turned on (even if it's not mapped as such at
>> >> Stage 1).
>> >
>> > I really don't know what userspace expects w.r.t. mixing tagged and
>> > non-tagged memory. But I don't expect anything good to come out of it,
>> > given that we provide zero information about the fault context.
>> >
>> > Honestly, if we are going to change this, then let's make sure we give
>> > enough information for userspace to go and fix the mess. Not just "it
>> > all went wrong".
>> >
>> 
>> What if we trigger a memory fault exit with the TAGACCESS flag, allowing
>> the VMM to use the GPA to retrieve additional details and print extra
>> information to aid in analysis? BTW, we will do this on the first fault
>> in cacheable, non-tagged memory even if there is no tagaccess in that
>> region. This can be further improved using the NoTagAccess series I
>> posted earlier, which ensures the memory fault exit occurs only on
>> actual tag access
>> 
>> Something like below?
>
> Something like that, only with:
>
> - a capability informing userspace of this behaviour
>
> - a per-VM (or per-VMA) flag as a buy-in for that behaviour
>

If we’re looking for a capability based control, could we tie that up to
FEAT_MTE_PERM? That’s what I did here:

https://lore.kernel.org/all/20250110110023.2963795-1-aneesh.kumar@kernel.org

That patch set also addresses the issue mentioned here. Let me know if
you think this is a better approach

> - the relaxation is made conditional on the memslot not being memory
> (i.e. really MMIO-only).
>
> and keep the current behaviour otherwise.
>
> Thanks,

-aneesh


  parent reply	other threads:[~2025-02-26 10:01 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-24  9:39 [PATCH] KVM: arm64: Drop mte_allowed check during memslot creation Aneesh Kumar K.V (Arm)
2025-02-24 10:32 ` Suzuki K Poulose
2025-02-24 11:05 ` Catalin Marinas
2025-02-24 12:24   ` Marc Zyngier
2025-02-24 14:39     ` Catalin Marinas
2025-02-24 15:02       ` Marc Zyngier
2025-02-24 16:44         ` Aneesh Kumar K.V
2025-02-24 17:23           ` Marc Zyngier
2025-02-26  8:02             ` Oliver Upton
2025-02-26  9:58             ` Aneesh Kumar K.V [this message]
2025-02-26 15:58               ` Catalin Marinas
2025-02-26 16:48                 ` Aneesh Kumar K.V
2025-02-26 18:02                   ` Catalin Marinas
2025-02-24 18:00         ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yq5a8qptauyl.fsf@kernel.org \
    --to=aneesh.kumar@kernel.org \
    --cc=Suzuki.Poulose@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=joey.gouly@arm.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pcc@google.com \
    --cc=steven.price@arm.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).