From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E29D3F9D9 for ; Sun, 27 Apr 2025 16:37:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745771832; cv=none; b=V/gnUSOSTLp7GW86C3aKtW5GkxuVMiaRsrI79w/yoTZ32Sb435pXCjzWxggYiy0Lq+NyAdFUE5LEYwtn0UNqPRRbwJl7DWvVhxCQ6Rv/Hzpx2HsP0fcZoAJBVRH2rghcZIQ/Q6j6rA6/ZKCPQzIXYyOkiM5LP1ZETjbf01iow1c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745771832; c=relaxed/simple; bh=e8tAvyM96LcvgN0cnj6mfcT1+7FPRKk6Gs7+YAPcM28=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=oa/4+2/xWuTBAgZxK0OSr1CwWk4mQ8eG/GdML3AwdnT8vD6uprN9ZvnPTMkbI6/e4p9sgMzGqPgoFikpRaSFyqW82Rmi+qaCP+rAdCxxKCjjZkHbNYYfFOJKrnz5l0tziBMQ7gNBsbpSUUrOD1EgKs0HTlcUep3Fb4rdL512Sps= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=oJYGs2h9; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oJYGs2h9" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 574FCC4CEE3; Sun, 27 Apr 2025 16:37:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1745771831; bh=e8tAvyM96LcvgN0cnj6mfcT1+7FPRKk6Gs7+YAPcM28=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=oJYGs2h9lbWAiyJlVkCd4GqYGnqcG4Zzz8H4ceZNQs07OTfI4tSEAc1Pqx5SxZL7s wnNXOuyNFchJcDt0GnWTlHjCmAUGo2GtPtpej51H1d1bWznd4P074WM7DtymGeEY/H Dn9O/m1OXI9i0gQKpj/tIP1grACweZnUEsTWrT01610DHllLKXZxscfPcvZ+fUlDCl 89Q2vgf8ooV2W4u17lC3FRHu6kgUUe5NP8bos2jUCct+3KdPhnkC7eR6KApk7cmzo1 /+lqd15fLaPkLziPIEXuZ7sINtn4sHzW+jt0/ap8Tx2blY0v48+roBvSBPbPOm3XiM SQqdS4FPh6wYw== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1u9508-009JRM-Op; Sun, 27 Apr 2025 17:37:08 +0100 Date: Sun, 27 Apr 2025 17:37:08 +0100 Message-ID: <86ecxdk1cb.wl-maz@kernel.org> From: Marc Zyngier To: David Woodhouse Cc: Ilias Stamatis , kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, oliver.upton@linux.dev, joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, eric.auger@redhat.com, andre.przywara@arm.com, will@kernel.org, jgrall@amazon.co.uk, ugurus@amazon.co.uk, nh-open-source@amazon.com Subject: Re: [RFC] ARM vGIC-ITS tables serialization when running protected VMs In-Reply-To: <94fb9d4adb81f824912ee23a296776aa07873354.camel@infradead.org> References: <20250414111244.153528-1-ilstam@amazon.com> <867c3llt43.wl-maz@kernel.org> <94fb9d4adb81f824912ee23a296776aa07873354.camel@infradead.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.4 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: dwmw2@infradead.org, ilstam@amazon.com, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, oliver.upton@linux.dev, joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, eric.auger@redhat.com, andre.przywara@arm.com, will@kernel.org, jgrall@amazon.co.uk, ugurus@amazon.co.uk, nh-open-source@amazon.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Tue, 15 Apr 2025 10:44:39 +0100, David Woodhouse wrote: > > [1 ] > On Tue, 2025-04-15 at 09:35 +0100, Marc Zyngier wrote: > > On Mon, 14 Apr 2025 12:12:43 +0100, > > Ilias Stamatis wrote: > > > > > > # The problem > > > > > > KVM's ARM Virtual Interrupt Translation Service (ITS) interface supports the > > > KVM_DEV_ARM_ITS_SAVE_TABLES and KVM_DEV_ARM_ITS_RESTORE_TABLES operations. > > > These operations save and restore a set of tables (Device Tables, Interrupt > > > Translation Tables, Collection Table) to and from guest memory. > > > > > > This can be a problem when running a protected VM on top of pKVM or another > > > lowvisor since the host kernel (running at EL1) cannot access guest memory. > > > > > > > pKVM doesn't allow a guest to be saved/restored, full stop. > > Yet. Either it's going to need to learn to support live update, or > it'll remain a toy solution. Toy solution to what problem? > > > > # Page declassification and why ITTs are special > > > > > > The Collection and Device tables are page aligned and their sizes must be a > > > multiple of page size. If the lowvisor knows where these tables live, it is > > > possible to "declassify" the corresponding pages and configure the MMU such as > > > that the EL1 host can write to guest memory directly. > > > > > > The ITTs (Interrupt Translation Tables) are different. They are NOT page > > > aligned, they are 256 byte aligned and their size is variable. That means that > > > the lowvisor can't declassify pages containing ITTs and configure the MMU > > > giving the host direct access as above since those pages may contain unrelated > > > data. > > > > And it is the responsibility of the guest to make these page aligned > > if it intend to let the hypervisor use them. To sum it up, the ITT > > isn't special at all. > > The ITT has nothing to do with virtualization, does it? And despite > this being logically "DMA", I don't believe it's possible to advertise > it as being behind the SMMU, which would have allowed for access > control (and would indeed have meant that the guest would be expected > to grant access to full pages). > > What exactly are you suggesting? That the GIC specification should be > changed to require page alignment, or to document that in a > confidential compute setup, the remainder of any page which contains > ITTs will be implicitly made non-confidential and shared with the > hypervisor? The GIC architecture is of course absolutely perfect, and there is nothing to change there. I'm suggesting you do what any OS designed to run under a confidential infrastructure do. Which is to expose page-sized data to the non-trusted infrastructure. Linux has done that for a while (as part of both CCA and pKVM enablement), and I don't see why your toy guests can't do the same. It's not like using a page pool for ITT allocations is rocket science, is it? > And then the lowvisor would also have to snoop the ITS command queues > to even find out which pages to implicitly allow access to? Why should it? as long as you only expose pages that only contain GIC-related data, you should be safe. However, if your hypervisor doesn't fully validate the interaction of the *host* with the HW, then you're dead in the water. > > > If the lowvisor knows where the ITTs live in guest memory it could instead > > > perform the guest memory accesses on behalf of the host. I.e. the EL1 host > > > would attempt to save the ITTs to guest memory like it does today, that would > > > generate a data abort, and then the EL2 lowvisor could perform the copy after > > > validating that the faulty address belongs to an ITT in guest memory. > > > > > > One issue with the above is that the ITS save/restore happens at hypervisor > > > live update which is a time sensitive operation and the extra traps (one per > > > interrupt mapping?) can introduce significant additional overhead > > > there. > > > > I don't believe this for a second. > > You don't believe that every millisecond of live update downtime, > perceived by the guest as unwanted steal time of a hypervisor that's > generally trying to be as quiescent as possible, is an issue? I absolutely don't. Certainly not for something that has no tangible existence, with no performance numbers whatsoever, and based on shaky premises. > > > Another issue is that it's actually hard for the lowvisor to know where these > > > tables live without trusting the EL1 host which virtualizes the ITS. It is > > > especially hard knowing the locations of the ITTs (compared to > > > Collection/Device tables) because that probably means having to parse the ITS > > > command queue from EL2 which is complex and undesirable. > > > > > > # An alternative: Serializing ITTs into a userspace buffer > > > > NAK. > > > > Share the page-aligned memory with the rest of the hypervisor, and use > > the existing API. > > That seems like a bad choice. All this is just using guest memory to > store KVM's state. The architecture *mandates* the memory allocation. KVM uses this memory for the purpose described in the architecture. If you don't like it, invent your own interrupt architecture. Trust me, it's real fun! > Yes, the guest provides a buffer which the virtual hardware *may* use > if it wants, but with no IOMMU or access control defined in the > specification. > > It seems like it would be much cleaner just to let KVM pass its state > up to userspace for serialization like we do for all *other* KVM state, > which is what Ilias is proposing. Sure. You could also decide that SMMU page tables should be extracted separately, because that's the exact same rationale. You could also build your own hypervisor instead of inventing new ways to make the KVM API even more of a terrible mess. M. -- Without deviation from the norm, progress is not possible.