From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9F5E8C021A4 for ; Mon, 24 Feb 2025 17:25:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: References:In-Reply-To:Subject:Cc:To:From:Message-ID:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=bY2wNWP9rT3ZRCyoWSDfMM1KQA/hqDvZuw8RHNaHB9g=; b=B3HS9FaJCpr6bEVWJXSOX2dYCh TGgfzyKg9aNYF17fUd026aLmsBRJPaQOWmdEh5LyayLH2E2ZEZcFUl/Lon6wrsjLh3CT1yiEy0rjy wdqACL+kKOOXdt/c73x/gya5YYgd4+9TeRCq5UXcUFRtqK5HpHjQfon3mQCwGlYxEPm6b4v299b7w EMOlXI4lCS/VjC547tfF4YIFRxGh/JQnK9B2tX73fcvm1vgM6qQoj3XUeCBl8E/Ser7TuYEX1N+yb DC0tGeQBePtahkRC4XJza9V8KumblL1BSz8SuxIJpQ+8VUIDtOsDy6DOLiBQzrMufPjIKf+kGIv0B 828AmzLw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tmcCj-0000000Ec8n-14Vf; Mon, 24 Feb 2025 17:25:17 +0000 Received: from tor.source.kernel.org ([172.105.4.254]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tmcBC-0000000Ebq6-3q3T for linux-arm-kernel@lists.infradead.org; Mon, 24 Feb 2025 17:23:43 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 1D618611DD; Mon, 24 Feb 2025 17:23:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B1129C4CED6; Mon, 24 Feb 2025 17:23:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1740417821; bh=EOHkScJ21kPJERJ4vHxyi3a9gQDRS0eSBhy4oC7+jlk=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Kq/FbduVIRmEUdzkGlX2ICEd2o6ytA+MHjXmK1DWaFyrK1zFS9k0UcJZqxb25ObgF 3xJIZMDMhj9Lcztzid6pp4CQRz6Gvv8tez2HNTU7iDQ4YciY0DcLIZfIUTMNuMw10J Ne87AKYvOE+OhArM7LokqzeXUGXi1XvjnMTIJW/RNcFl7HodXnQb31P/v9OuZe+6wJ f/ko5bx2uTJjuOg2FsQqIUagjUgje/VLfTdYfuK/UsQk1QdjQwgbIyOuCVOf2TWUb3 uEa7go52Zh3/PiuXwOtO9Iwg8ODDd4tfaIScV2t+EZ2YUDLn6dKU3FkXVtTzAOjGUw olTIbh1VkHXLw== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1tmcB9-007SPs-FT; Mon, 24 Feb 2025 17:23:39 +0000 Date: Mon, 24 Feb 2025 17:23:38 +0000 Message-ID: <86ikozqmsl.wl-maz@kernel.org> From: Marc Zyngier To: Aneesh Kumar K.V Cc: Catalin Marinas , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, Oliver Upton , Joey Gouly , Zenghui Yu , Will Deacon , Suzuki K Poulose , Steven Price , Peter Collingbourne Subject: Re: [PATCH] KVM: arm64: Drop mte_allowed check during memslot creation In-Reply-To: References: <20250224093938.3934386-1-aneesh.kumar@kernel.org> <86ldtvr0nl.wl-maz@kernel.org> <86jz9fqtbk.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.4 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: aneesh.kumar@kernel.org, catalin.marinas@arm.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, oliver.upton@linux.dev, joey.gouly@arm.com, yuzenghui@huawei.com, will@kernel.org, Suzuki.Poulose@arm.com, steven.price@arm.com, pcc@google.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, 24 Feb 2025 16:44:06 +0000, Aneesh Kumar K.V wrote: > > Marc Zyngier writes: > > > On Mon, 24 Feb 2025 14:39:16 +0000, > > Catalin Marinas wrote: > >> > >> On Mon, Feb 24, 2025 at 12:24:14PM +0000, Marc Zyngier wrote: > >> > On Mon, 24 Feb 2025 11:05:33 +0000, > >> > Catalin Marinas wrote: > >> > > On Mon, Feb 24, 2025 at 03:09:38PM +0530, Aneesh Kumar K.V (Arm) wrote: > >> > > > This change is needed because, without it, users are not able to use MTE > >> > > > with VFIO passthrough (currently the mapping is either Device or > >> > > > NonCacheable for which tag access check is not applied.), as shown > >> > > > below (kvmtool VMM). > >> > > > >> > > Another nit: "users are not able to user VFIO passthrough when MTE is > >> > > enabled". At a first read, the above sounded to me like one wants to > >> > > enable MTE for VFIO passthrough mappings. > >> > > >> > What the commit message doesn't spell out is how MTE and VFIO are > >> > interacting here. I also don't understand the reference to Device or > >> > NC memory here. > >> > >> I guess it's saying that the guest cannot turn MTE on (Normal Tagged) > >> for these ranges anyway since Stage 2 is Device or Normal NC. So we > >> don't break any use-case specific to VFIO. > >> > >> > Isn't the issue that DMA doesn't check/update tags, and therefore it > >> > makes little sense to prevent non-tagged memory being associated with > >> > a memslot? > >> > >> The issue is that some MMIO memory range that does not support MTE > >> (well, all MMIO) could be mapped by the guest as Normal Tagged and we > >> have no clue what the hardware does as tag accesses, hence we currently > >> prevent it altogether. It's not about DMA. > >> > >> This patch still prevents such MMIO+MTE mappings but moves the decision > >> to user_mem_abort() and it's slightly more relaxed - only rejecting it > >> if !VM_MTE_ALLOWED _and_ the Stage 2 is cacheable. The side-effect is > >> that it allows device assignment into the guest since Stage 2 is not > >> Normal Cacheable (at least for now, we have some patches Ankit but they > >> handle the MTE case). > > > > The other side effect is that it also allows non-tagged cacheable > > memory to be given to the MTE-enabled guest, and the guest has no way > > to distinguish between what is tagged and what's not. > > > >> > >> > My other concern is that this gives pretty poor consistency to the > >> > guest, which cannot know what can be tagged and what cannot, and > >> > breaks a guarantee that the guest should be able to rely on. > >> > >> The guest should not set Normal Tagged on anything other than what it > >> gets as standard RAM. We are not changing this here. KVM than needs to > >> prevent a broken/malicious guest from setting MTE on other (physical) > >> ranges that don't support MTE. Currently it can only do this by forcing > >> Device or Normal NC (or disable MTE altogether). Later we'll add > >> FEAT_MTE_PERM to permit Stage 2 Cacheable but trap on tag accesses. > >> > >> The ABI change is just for the VMM, the guest shouldn't be aware as > >> long as it sticks to the typical recommendations for MTE - only enable > >> on standard RAM. > > > > See above. You fall into the same trap with standard memory, since you > > now allow userspace to mix things at will, and only realise something > > has gone wrong on access (and -EFAULT is not very useful). > > > >> > >> Does any VMM rely on the memory slot being rejected on registration if > >> it does not support MTE? After this change, we'd get an exit to the VMM > >> on guest access with MTE turned on (even if it's not mapped as such at > >> Stage 1). > > > > I really don't know what userspace expects w.r.t. mixing tagged and > > non-tagged memory. But I don't expect anything good to come out of it, > > given that we provide zero information about the fault context. > > > > Honestly, if we are going to change this, then let's make sure we give > > enough information for userspace to go and fix the mess. Not just "it > > all went wrong". > > > > What if we trigger a memory fault exit with the TAGACCESS flag, allowing > the VMM to use the GPA to retrieve additional details and print extra > information to aid in analysis? BTW, we will do this on the first fault > in cacheable, non-tagged memory even if there is no tagaccess in that > region. This can be further improved using the NoTagAccess series I > posted earlier, which ensures the memory fault exit occurs only on > actual tag access > > Something like below? Something like that, only with: - a capability informing userspace of this behaviour - a per-VM (or per-VMA) flag as a buy-in for that behaviour - the relaxation is made conditional on the memslot not being memory (i.e. really MMIO-only). and keep the current behaviour otherwise. Thanks, M. -- Without deviation from the norm, progress is not possible.