From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD28AC04A6A for ; Mon, 14 Aug 2023 21:46:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232959AbjHNVpk (ORCPT ); Mon, 14 Aug 2023 17:45:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51646 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232918AbjHNVpQ (ORCPT ); Mon, 14 Aug 2023 17:45:16 -0400 Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3568FE7E for ; Mon, 14 Aug 2023 14:45:14 -0700 (PDT) Received: by mail-pf1-x42d.google.com with SMTP id d2e1a72fcca58-686f8614ce5so4522892b3a.3 for ; Mon, 14 Aug 2023 14:45:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692049513; x=1692654313; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=05HbnIo80trrCFK3jXhlcJkX75MNBwCQ0epMdFi+1gQ=; b=LsdA0kOIAm91MVBz1+MWHFN7paCqnA3DItTPkReECqOSlQCP3UD8ca9JEH8K8j16NU PjlcJ6RZe2mUzfZKjRGPnm/XooBH4bBGslUGP2FCqE2fkrczn9KY1XRSON4kcEQ51Eos geebEyAYHm351W1X6nLwl/1HEB9/SMLBOr9Zky/YsRq04OtmshQ56rbOxyuyyvFL+Sos Gv3eof7UZtCM5nJ/iaLCp5ZQFjbTWbFXBhoQoNkLdSM07QWnADdKMNrxM0SPtiLs5QEa YENb/UIbb1AHgRbv14gXJdxBNM+yybBEckX7FOGW/uPaBQEU5p2ILcUVNTJ35zhoIP4z ht8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692049513; x=1692654313; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=05HbnIo80trrCFK3jXhlcJkX75MNBwCQ0epMdFi+1gQ=; b=NRVZx0p/tCvQ+QI9GPoTh7TfirYTnv/K7OtiliT0OomSAkZR99JeAtBfXaHQTs1jrr nWhSEEnlZ1g34fgSDtu229mY9W86XXQBSYs4HSs1g+BdQE5pKsIlj45u+tdUr3jyAcU8 fdowT8FWZDCxr/u0iqBoqWtCxqiObQhTURTPIu2otrKGireaGn62xX1nrlG0TcBgJXMD KD2nYJsyLtSnSfKhu6I/mS4rEETvXxXkh7vyWRROQ7XR9w5T8mDyudsdjNJtmR+jBOTZ NAvtDbXteO5JeCdB12Ea6a2dFKga7izPwZFroe5WzX3hq4PRmr8fHqUlIl+XiyjPv/P2 gRFQ== X-Gm-Message-State: AOJu0Yw+J/nnlsKg5DO4/PWU4Jd0RBodLCiCNcy0uXDgDklpg28KpkuU B7D2TxWgkwfqqX+P+IvdjBE= X-Google-Smtp-Source: AGHT+IE6XR5hEkCSGhgyQluYXyWANfP6XGgpfUeqyacV6TXD051wXFamvRF3xEJr/RUC2WjXRi4Abg== X-Received: by 2002:a05:6a21:2716:b0:140:94b8:3b76 with SMTP id rm22-20020a056a21271600b0014094b83b76mr12191486pzb.20.1692049513554; Mon, 14 Aug 2023 14:45:13 -0700 (PDT) Received: from localhost ([192.55.55.51]) by smtp.gmail.com with ESMTPSA id e23-20020a62aa17000000b006877b0b31c2sm8669128pff.147.2023.08.14.14.45.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Aug 2023 14:45:13 -0700 (PDT) Date: Mon, 14 Aug 2023 14:45:11 -0700 From: Isaku Yamahata To: Michael Roth Cc: Xiaoyao Li , Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= , Paolo Bonzini , Sean Christopherson , David Hildenbrand , Igor Mammedov , "Michael S. Tsirkin" , Marcel Apfelbaum , Richard Henderson , Marcelo Tosatti , Markus Armbruster , Eric Blake , Philippe =?utf-8?Q?Mathieu-Daud=C3=A9?= , Peter Xu , Chao Peng , isaku.yamahata@gmail.com, qemu-devel@nongnu.org, kvm@vger.kernel.org Subject: Re: [RFC PATCH 00/19] QEMU gmem implemention Message-ID: <20230814214511.GG1807130@ls.amr.corp.intel.com> References: <20230731162201.271114-1-xiaoyao.li@intel.com> <9b3a3e88-21f4-bfd2-a9c3-60a25832e698@intel.com> <20230810155808.eqne5zzb53mirgcw@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230810155808.eqne5zzb53mirgcw@amd.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Thu, Aug 10, 2023 at 10:58:09AM -0500, Michael Roth via wrote: > On Tue, Aug 01, 2023 at 09:45:41AM +0800, Xiaoyao Li wrote: > > On 8/1/2023 12:51 AM, Daniel P. Berrangé wrote: > > > On Mon, Jul 31, 2023 at 12:21:42PM -0400, Xiaoyao Li wrote: > > > > This is the first RFC version of enabling KVM gmem[1] as the backend for > > > > private memory of KVM_X86_PROTECTED_VM. > > > > > > > > It adds the support to create a specific KVM_X86_PROTECTED_VM type VM, > > > > and introduces 'private' property for memory backend. When the vm type > > > > is KVM_X86_PROTECTED_VM and memory backend has private enabled as below, > > > > it will call KVM gmem ioctl to allocate private memory for the backend. > > > > > > > > $qemu -object memory-backend-ram,id=mem0,size=1G,private=on \ > > > > -machine q35,kvm-type=sw-protected-vm,memory-backend=mem0 \ > > > > ... > > > > > > > > Unfortunately this patch series fails the boot of OVMF at very early > > > > stage due to triple fault because KVM doesn't support emulate string IO > > > > to private memory. We leave it as an open to be discussed. > > > > > > > > There are following design opens that need to be discussed: > > > > > > > > 1. how to determine the vm type? > > > > > > > > a. like this series, specify the vm type via machine property > > > > 'kvm-type' > > > > b. check the memory backend, if any backend has 'private' property > > > > set, the vm-type is set to KVM_X86_PROTECTED_VM. > > > > > > > > 2. whether 'private' property is needed if we choose 1.b as design > > > > > > > > with 1.b, QEMU can decide whether the memory region needs to be > > > > private (allocates gmem fd for it) or not, on its own. > > > > > > > > 3. What is KVM_X86_SW_PROTECTED_VM going to look like? What's the > > > > purose of it and what's the requirement on it. I think it's the > > > > questions for KVM folks than QEMU folks. > > > > > > > > Any other idea/open/question is welcomed. > > > > > > > > > > > > Beside, TDX QEMU implemetation is based on this series to provide > > > > private gmem for TD private memory, which can be found at [2]. > > > > And it can work corresponding KVM [3] to boot TDX guest. > > > > > > We already have a general purpose configuration mechanism for > > > confidential guests. The -machine argument has a property > > > confidential-guest-support=$OBJECT-ID, for pointing to an > > > object that implements the TYPE_CONFIDENTIAL_GUEST_SUPPORT > > > interface in QEMU. This is implemented with SEV, PPC PEF > > > mode, and s390 protvirt. > > > > > > I would expect TDX to follow this same design ie > > > > > > qemu-system-x86_64 \ > > > -object tdx-guest,id=tdx0,..... \ > > > -machine q35,confidential-guest-support=tdx0 \ > > > ... > > > > > > and not require inventing the new 'kvm-type' attribute at least. > > > > yes. > > > > TDX is initialized exactly as the above. > > > > This RFC series introduces the 'kvm-type' for KVM_X86_SW_PROTECTED_VM. It's > > my fault that forgot to list the option of introducing sw_protected_vm > > object with CONFIDENTIAL_GUEST_SUPPORT interface. > > Thanks for Isaku to raise it https://lore.kernel.org/qemu-devel/20230731171041.GB1807130@ls.amr.corp.intel.com/ > > > > we can specify KVM_X86_SW_PROTECTED_VM this way: > > > > qemu \ > > -object sw-protected,id=swp0,... \ > > -machine confidential-guest-support=swp0 \ > > ... > > > > > For the memory backend though, I'm not so sure - possibly that > > > might be something that still wants an extra property to identify > > > the type of memory to allocate, since we use memory-backend-ram > > > for a variety of use cases. Or it could be an entirely new object > > > type such as "memory-backend-gmem" > > > > What I want to discuss is whether providing the interface to users to allow > > them configuring which memory is/can be private. For example, QEMU can do it > > internally. If users wants a confidential guest, QEMU allocates private gmem > > for normal RAM automatically. > > I think handling it automatically simplifies things a good deal on the > QEMU side. I think it's still worthwhile to still allow: > > -object memory-backend-memfd-private,... > > because it provides a nice mechanism to set up a pair of shared/private > memfd's to enable hole-punching via fallocate() to avoid doubling memory > allocations for shared/private. It's also a nice place to control > potentially-configurable things like: > > - whether or not to enable discard/hole-punching > - if discard is enabled, whether or not to register the range via > RamDiscardManager interface so that VFIO/IOMMU mappings get updated > when doing PCI passthrough. SNP relies on this for PCI passthrough > when discard is enabled, otherwise DMA occurs to stale mappings of > discarded bounce-buffer pages: > > https://github.com/AMDESE/qemu/blob/snp-latest/backends/hostmem-memfd-private.c#L449 > > But for other memory ranges, it doesn't do a lot of good to rely on > users to control those via -object memory-backend-memfd-private, since > QEMU will set up some regions internally, like the UEFI ROM. > > It also isn't ideal for QEMU itself to internally control what > should/shouldn't be set up with a backing guest_memfd, because some > guest kernels do weird stuff, like scan for ROM regions in areas that > guest kernels might have mapped as encrypted in guest page table. You > can consider them to be guest bugs, but even current SNP-capable > kernels exhibit this behavior and if the guest wants to do dumb stuff > QEMU should let it. > > But for these latter 2 cases, it doesn't make sense to attempt to do > any sort of discarding of backing pages since it doesn't make sense to > discard ROM pages. > > So I think it makes sense to just set up the gmemfd automatically across > the board internally, and keep memory-backend-memfd-private around > purely as a way to control/configure discardable memory. I'm looking at the repo and 31a7c7e36684 ("*hostmem-memfd-private: Initial discard manager support") Do we have to implement RAM_DISCARD_MANGER at memory-backend-memfd-private? Can't we implement it at host_mem? The interface callbacks can have check "if (!private) return". Then we can support any host-mem backend. -- Isaku Yamahata