From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22047C83F0A for ; Tue, 8 Jul 2025 16:23:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A3EEB6B0096; Tue, 8 Jul 2025 12:23:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A16546B0098; Tue, 8 Jul 2025 12:23:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8DE4D6B0099; Tue, 8 Jul 2025 12:23:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 790376B0096 for ; Tue, 8 Jul 2025 12:23:05 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 250F758F1F for ; Tue, 8 Jul 2025 16:23:05 +0000 (UTC) X-FDA: 83641616730.14.5410BD4 Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) by imf30.hostedemail.com (Postfix) with ESMTP id 48DD580005 for ; Tue, 8 Jul 2025 16:23:03 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=V4z2kFce; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of tabba@google.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751991783; a=rsa-sha256; cv=none; b=szEUgkJWGFGVjoZWxx0u9PcoFpUiLcGlHd7LN7qzM0t/VFsNlfosN0LwOQl9jtHYVn6eER HN8MByQUW2Utn0HTSplYoBaK2XxoJhae20KX8ZVVI9yDIZt51InkU7Gp/SDyeegHTmucv1 Gvuwij09w7QGboCtF7lm/JvA+6ySdHg= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=V4z2kFce; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of tabba@google.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751991783; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WXzt9vd5Q0HkrVVprucwZQ4sITBiPzYABNC21pbVYVY=; b=YlGzUUMFPXm2vqor/JtO9ftaDAUghOc88HLRMLtiQ4m7+7LHP+RlVivgp4w95Cgqgm6aWZ ma/baeiXJmSCwKSGXotgryu5M+29MyvOMD5ugRPDaHlzijh5J6TAMMNKOwRU6ehq1ktkgX wAcTP7J4eWYa2yMU+FXQ32nkUvBLl+c= Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-4a5ac8fae12so560961cf.0 for ; Tue, 08 Jul 2025 09:23:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1751991782; x=1752596582; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WXzt9vd5Q0HkrVVprucwZQ4sITBiPzYABNC21pbVYVY=; b=V4z2kFcescfR+4Ii2mjyflU1Nraj5cbMfUyN0dtGdydt5YtZf9xXYrq4X5jJ7XZcbP 10S3wfYFTqfmRhzLh6n9yRc0P5cUuK/kU9+EGfYtPrxs1B5jXoI2lT7W6y8UFSX4Dqq0 Nb2PgkjYDmMq1JVqjBlRFgbM0dpJgERAib7RJc6Q4bG/7KEDu6uwss0eURUb1yCqzjLl kaIaApWqfhsJj6PIGB0fmpXXtHVUewmw3kbrSqtr+hkZC6DfkzH6kh+KN6CrcY1JRpzt PHcgZmnK5ptRvDYHkOyArtvwGt7860n5jmU/MXTuQd/4yWZRIE//wVWjyZ9mGIZjIUxP 1ksg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751991782; x=1752596582; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WXzt9vd5Q0HkrVVprucwZQ4sITBiPzYABNC21pbVYVY=; b=e/5ACWyski1/aD/n6puXzy+sm2z2EqMLPDRk6vt/tVrZcJnoivI8eoipa8WH3guKuH ahBTQ6GtEWqt89M1kg8W8kNAu9WG6XwTZHuwBKtd1sbEe1FNI5bfxaMFUmp3f1f+zBzB p6qvcIUMKxA1FAF/BOGDrQyQmNXegsWalaM7iHSgLESknx8LXZb287B14qDesCPQp79b MQGC9bzSnu/PLsiH2ZfWL7qlj6lthkcEdhEaN43dtZNu4BQnp5Z97tEfGf+bihhcSn+Z 3iGUPv+kUkav6z49JWMX/KzJxjXAvMycIQsFowqNvrtzGKkTWKV/e/kk2I3nAVVtGzD4 P1ZQ== X-Forwarded-Encrypted: i=1; AJvYcCXYBStremDMG4Dy0AWZBq1XxJOv4KfdKmDPhIo3+0hQERKNo8S4Dz901I5+LnqIEMTjjNPJ/vnj9g==@kvack.org X-Gm-Message-State: AOJu0YydJYfmTa5sk9ptyJtnCB3OQ9MZj4QKF/F4QhRinIkjJDju3AMm zR2BPPohXC3rzX7a6G0SvK6jjYbmmjJ8rLN16gFxTbcV0MYNkQIWao6JaB/XWyBBSxew1XjHbNm YpZBcRelu+R4XX7dH+EGi6NaK3Ruhjy5HHdRxOzuF X-Gm-Gg: ASbGncuDTd7GH9ABwcsrj/ZNabK4W8Sy7fMNs/zhffO3b4qCU/mW9qRlX3qUnG8SXQj 17P6yGPddM2uWB7DG85d50tG0iVvigUkWaQxkvxyC4q+IQ06EP9gyGCugU+51/wX7Jn99IB6jDf o+aFHNem29X2pjmlH569sMWttQAhiuhpHpM789N0GSOi4= X-Google-Smtp-Source: AGHT+IFBdp4zmhdf7sOV/E1kyNsfw2i3sbcpdZXFP2jgFE4TcTJqwsBu6LVphdeOHM8jnyqNvSf2XaBBrOC5QRw3cSY= X-Received: by 2002:a05:622a:4fca:b0:4a7:bed9:5251 with SMTP id d75a77b69052e-4a9d470e0d3mr2131411cf.9.1751991781275; Tue, 08 Jul 2025 09:23:01 -0700 (PDT) MIME-Version: 1.0 References: <006899ccedf93f45082390460620753090c01914.camel@intel.com> In-Reply-To: From: Fuad Tabba Date: Tue, 8 Jul 2025 17:22:24 +0100 X-Gm-Features: Ac12FXyH8_ZC6Lmmr3oQJsEuVEWXw9GEQ3K2NBG0RWnuVToRnCnb4O5e-GjDi3Q Message-ID: Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd To: Sean Christopherson Cc: Vishal Annapurve , Rick P Edgecombe , "pvorel@suse.cz" , "kvm@vger.kernel.org" , "catalin.marinas@arm.com" , Jun Miao , Kirill Shutemov , "pdurrant@amazon.co.uk" , "vbabka@suse.cz" , "peterx@redhat.com" , "x86@kernel.org" , "amoorthy@google.com" , "jack@suse.cz" , "quic_svaddagi@quicinc.com" , "keirf@google.com" , "palmer@dabbelt.com" , "vkuznets@redhat.com" , "mail@maciej.szmigiero.name" , "anthony.yznaga@oracle.com" , Wei W Wang , "Wieczor-Retman, Maciej" , Yan Y Zhao , "ajones@ventanamicro.com" , "willy@infradead.org" , "rppt@kernel.org" , "quic_mnalajal@quicinc.com" , "aik@amd.com" , "usama.arif@bytedance.com" , Dave Hansen , "fvdl@google.com" , "paul.walmsley@sifive.com" , "bfoster@redhat.com" , "nsaenz@amazon.es" , "anup@brainfault.org" , "quic_eberman@quicinc.com" , "linux-kernel@vger.kernel.org" , "thomas.lendacky@amd.com" , "mic@digikod.net" , "oliver.upton@linux.dev" , "akpm@linux-foundation.org" , "quic_cvanscha@quicinc.com" , "steven.price@arm.com" , "binbin.wu@linux.intel.com" , "hughd@google.com" , Zhiquan1 Li , "rientjes@google.com" , "mpe@ellerman.id.au" , Erdem Aktas , "david@redhat.com" , "jgg@ziepe.ca" , "jhubbard@nvidia.com" , Haibo1 Xu , Fan Du , "maz@kernel.org" , "muchun.song@linux.dev" , Isaku Yamahata , "jthoughton@google.com" , "steven.sistare@oracle.com" , "quic_pheragu@quicinc.com" , "jarkko@kernel.org" , "chenhuacai@kernel.org" , Kai Huang , "shuah@kernel.org" , "dwmw@amazon.co.uk" , Chao P Peng , "pankaj.gupta@amd.com" , Alexander Graf , "nikunj@amd.com" , "viro@zeniv.linux.org.uk" , "pbonzini@redhat.com" , "yuzenghui@huawei.com" , "jroedel@suse.de" , "suzuki.poulose@arm.com" , "jgowans@amazon.com" , Yilun Xu , "liam.merwick@oracle.com" , "michael.roth@amd.com" , "quic_tsoni@quicinc.com" , Xiaoyao Li , "aou@eecs.berkeley.edu" , Ira Weiny , "richard.weiyang@gmail.com" , "kent.overstreet@linux.dev" , "qperret@google.com" , "dmatlack@google.com" , "james.morse@arm.com" , "brauner@kernel.org" , "linux-fsdevel@vger.kernel.org" , "ackerleytng@google.com" , "pgonda@google.com" , "quic_pderrin@quicinc.com" , "hch@infradead.org" , "linux-mm@kvack.org" , "will@kernel.org" , "roypat@amazon.co.uk" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam11 X-Rspam-User: X-Stat-Signature: z41nmsbyrpyrabdzjzjo3g1611sem5jw X-Rspamd-Queue-Id: 48DD580005 X-HE-Tag: 1751991783-966115 X-HE-Meta: U2FsdGVkX1+s57ZHaomfgb/0fH2lrmQc7n4dNxaJ4GCwbOcFBJA3QpLK6E8nzDf3H3Zhjc2BWu8Ji2EJgYeyV4mcB8pa+LQY7ek9GGQCbysqejE8chaaCTDX4+XS2WC9ZuyoVw1G9DXOHFbDA2p7vH1vrxEmny1gLOG+KVIKVrtyxxBQz4IsdWy/tgQU61ovXsqR80hUMiekIjfTXMWWSd786jPh9mBznVUmu6TAlrUk640Jx4ppb4TWCB4iIPg9A43+KAdwK2bIY4pp+10DrfaAQbo3nzLFyXCugQGt372ZbuBhOTUVTleBUUPO1WaEME1F9mkGwxrvWMhUdP4LAEKP8syySbk2Ff18ZJ04MTRh1uCnCyvROiF/j1qISe+y7w4eJn1O2SvMQ5JjCP/pYjSTE+mmHZl15hWNH/dKs05fb1+bO0bl9pUpgQqpnA9E1uwMGsaMHTnKlHcHdmiaNbrL7gYqunUP0pExymuvR4OwSypgkzpfpLj+VplDaDDsbNxczKmu3+E5UtdgJJc/MCDnhTj6SqEHpPbwRU+vgBihmLhEK1+fi6/ezMlZ6vmnDp2IVFW8sg1jGHvLf9aXETJIxegLOQPEva6p0pdqV/OjvVjviurIm5j6op1V9cUOJ0EmaDe/ZW7l3rZq8+lBfWqKXegc3pS25OaU4CWj2jmyXCez/c7CT6IQ0fwTR3cayvbcFe1R9xRKl0bsk/dXezsNkQHIahSQNLcHSB1cVCD/MGBASd8ewn/H71gemZvgKeaJh2hltXkH1PuWzqxjykHMB4l4GfdA4+ko2y87eOGeKaorb/x9afWt1Um4vGdIdzkqpMGEfRBOZ5n8ChRSZVQPVT+mpO4vALMGa7QEvl3KUP55GxzL5II+aABAifXtxzJ5u3U/Dac+6RGt0hL5StwHBZVvZdLuY8nSC7vSIcb+omnjzhUk9TzT9ypjYAZtqh00LRMp+zxu2VkbDx8 A9zrDbiN BFCbLBv+yMO9/Zg1kwPwq/tY9iu5fDuEo9P5UOE2yqpF+pjYkJHqOoWfmDStBFuvIawAnJyMTHHFdRq31vWMPV138kSzxS4X2nqubWRnOG1dWyI5CCJl+OVGgpZlb5FZzHJHUnoAfRnDp2f6cNBScEoQCRQ8r8SyTSC/3QrJ+pmMIuYBTGiHOBf0iLK80tC5d+2E7pIlYRJBd1FCHZOAo7Va4t0QeD4Uvt2a58ReO9RXOmyJLCHApE8kRU/ZIgEMw3DYsxxMsoJZg+SSLkHwbZl/eRbpTgPt0c+rEl7/GRW4CLhU96B7YG54Rx7HVgL2eNF/G81nvfWR7/lLqxpVWdJSpie3P8d0iqLmZbhwOKwoaETtYQ+bmv3JYcoib+WHJnjaMN/fVQStDBh6kgsVtmr9WWg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Sean, On Tue, 8 Jul 2025 at 16:39, Sean Christopherson wrote: > > On Tue, Jul 08, 2025, Vishal Annapurve wrote: > > On Tue, Jul 8, 2025 at 7:52=E2=80=AFAM Edgecombe, Rick P > > wrote: > > > > > > On Tue, 2025-07-08 at 07:20 -0700, Sean Christopherson wrote: > > > > > For TDX if we don't zero on conversion from private->shared we wi= ll be > > > > > dependent > > > > > on behavior of the CPU when reading memory with keyid 0, which wa= s > > > > > previously > > > > > encrypted and has some protection bits set. I don't *think* the b= ehavior is > > > > > architectural. So it might be prudent to either make it so, or ze= ro it in > > > > > the > > > > > kernel in order to not make non-architectual behavior into usersp= ace ABI. > > > > > > > > Ya, by "vendor specific", I was also lumping in cases where the ker= nel would > > > > need to zero memory in order to not end up with effectively undefin= ed > > > > behavior. > > > > > > Yea, more of an answer to Vishal's question about if CC VMs need zero= ing. And > > > the answer is sort of yes, even though TDX doesn't require it. But we= actually > > > don't want to zero memory when reclaiming memory. So TDX KVM code nee= ds to know > > > that the operation is a to-shared conversion and not another type of = private > > > zap. Like a callback from gmem, or maybe more simply a kernel interna= l flag to > > > set in gmem such that it knows it should zero it. > > > > If the answer is that "always zero on private to shared conversions" > > for all CC VMs, > > pKVM VMs *are* CoCo VMs. Just because pKVM doesn't rely on third party f= irmware > to provide confidentiality and integrity doesn't make it any less of a Co= Co VM. > > > > : And maybe a new flag for KVM_GMEM_CONVERT_PRIVATE for user space= to > > > > : explicitly request that the page range is converted to private a= nd the > > > > : content needs to be retained. So that TDX can identify which cas= e needs > > > > : to call in-place TDH.PAGE.ADD. > > > > > > > > If so, I agree with that idea, e.g. add a PRESERVE flag or whatever= . That way > > > > userspace has explicit control over what happens to the data during > > > > conversion, > > > > and KVM can reject unsupported conversions, e.g. PRESERVE is only a= llowed for > > > > shared =3D> private and only for select VM types. > > > > > > Ok, we should POC how it works with TDX. > > > > I don't think we need a flag to preserve memory as I mentioned in [2]. = IIUC, > > 1) Conversions are always content-preserving for pKVM. > > No? Perserving contents on private =3D> shared is a security vulnerabili= ty waiting > to happen. Actually it is one of the requirements for pKVM as well as its current behavior. We would like to preserve contents both ways, private <=3D> shared, since it is required by some of the potential use cases (e.g., guest handling video encoding/decoding). To make it clear, I'm talking about explicit sharing from the guest, not relinquishing memory back to the host. In the case of relinquishing (and guest teardown), relinquished memory is poisoned (zeroed) in pKVM. Cheers, /fuad > > 2) Shared to private conversions are always content-preserving for all > > VMs as far as guest_memfd is concerned. > > There is no "as far as guest_memfd is concerned". Userspace doesn't care= whether > code lives in guest_memfd.c versus arch/xxx/kvm, the only thing that matt= ers is > the behavior that userspace sees. I don't want to end up with userspace = ABI that > is vendor/VM specific. > > > 3) Private to shared conversions are not content-preserving for CC VMs > > as far as guest_memfd is concerned, subject to more discussions. > > > > [2] https://lore.kernel.org/lkml/CAGtprH-Kzn2kOGZ4JuNtUT53Hugw64M-_XMmh= z_gCiDS6BAFtQ@mail.gmail.com/