From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9404BC83F09 for ; Tue, 8 Jul 2025 17:16:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F33DB6B0095; Tue, 8 Jul 2025 13:16:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F0BBE6B0096; Tue, 8 Jul 2025 13:16:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E21A76B0098; Tue, 8 Jul 2025 13:16:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D2C2B6B0095 for ; Tue, 8 Jul 2025 13:16:28 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 42ABE1D0F36 for ; Tue, 8 Jul 2025 17:16:28 +0000 (UTC) X-FDA: 83641751256.06.46B1A43 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf01.hostedemail.com (Postfix) with ESMTP id 22CB940019 for ; Tue, 8 Jul 2025 17:16:24 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=pB2KHBq8; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=vannapurve@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751994985; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=w/vdXYRsw5z4df1ZBivhz/DTkNGF2DjR6MZbntX/+a4=; b=obpd979OiE7g+CVa8Sz7U3D+8KWGBgfXfw+X+1mEU2g5LrLFM4uo3zsxUpcDpiCTpMKe7t ApCq6Bv0ss3MnLLyVXcmFdsgI2XhRk19Nd+97dy+84GfKr1ysiD3Rz3iVdRt5JBAEb1iBV RzLRmDgKrDAGTSv3TcBfLLX8tGyEir8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751994985; a=rsa-sha256; cv=none; b=fZofvUXprK2rPrkEIsKbxVgPj4GKRxz2hB7jeCOYRsX6InxprrYIkOkvCwRrc320sLtbMJ PVNBBZRTOQoXBQA+xehs5kNiOp8qQ9ko1x5B2mDUuhAh70gK+IrJ32UDCTIh7vwofO3O+H HMiar+fV1y0XeWLMsNgTNxIu2es5i3Y= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=pB2KHBq8; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=vannapurve@google.com Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-237f18108d2so13635ad.0 for ; Tue, 08 Jul 2025 10:16:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1751994984; x=1752599784; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=w/vdXYRsw5z4df1ZBivhz/DTkNGF2DjR6MZbntX/+a4=; b=pB2KHBq8uxI/e6PhdDG2kP4BqXAkRRy/i3c2bdGRDgsEZ1bMQJIdYXTU8m+7VRhkUu VkCqkAcdn5fFWd1ou6nBCyewPwYCGDh8WbjphWeH/AaF3/613JbDr3El4XttYZdQGkhM oE5idky4gxKBGrV1Q7cqvCjcJHzGstAUqZutbLjjtMjbtvNWu0VdREJ6BPsrxHZhyWCf G+9HYjaqDcBXZCjmYjonCfP6LPgc7aZxpP6oIzpj7x7Xbek+oxhUFnVI4vx1ouishkcT s23eZNzlNW49oH03vVoct23iYNXIpd3NzsDHpAUehZ258hpTN0sDKODvElD6UqMKHTVx 7d2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751994984; x=1752599784; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=w/vdXYRsw5z4df1ZBivhz/DTkNGF2DjR6MZbntX/+a4=; b=ex404gWzVKgzRfsH4f88bLFH7Yr+b5xZ0Orrvll2UXdBFpYUhLbGNsXY/zjfO96Gzg 0jUp6ZYkEIP7OIxAXDFfpc2zPZ5WuE831W0O1kUT7+1H9OCM0a0EexasXeCH4MO11AkO BdWrEU/MDbto80rTXrh275Mqvu2h4YpcB1gvW26X8i9hqMFF+iuUmGebEDkGt24QMtCp c2BUGXUmcMpRVDbvdxOGY/mquf37Yx5YsigC6YcfiSp225BtHpjyW1I2CuV2wfDleGDi xO2DYqT7iFTwXf/1jgo++mmvx36ZE7j32Ex4EVP2328A+DxL9fvscHasMXnaMn0VrW+3 Tf/g== X-Forwarded-Encrypted: i=1; AJvYcCW+HCCztyZmWic0X/ZiUAzx9jCjP6aNd02vgkRGBI6ZLaEz0i9zg0aaBPOwC69DZH+dFNRyAJYr6Q==@kvack.org X-Gm-Message-State: AOJu0YwnKCEqHwOC7SVPcL7pHRV3n+tIhIyOamyr5EARI3/bv9TxNhrF BzXZKHqyXANJA7nhuN0zr8eahqNKQ6zWht9T0+6HS2muR/3pvSSjuhPErkYAxBuDVREkXN6zzT3 2kw2Tz6XZuEL2NVCv9d6z5qXzC6bTQi3AGz08mWK5 X-Gm-Gg: ASbGncv1OZBaYRUIE/or4SU4Z6BToQjUuCU4KL0N88cvwVQasDsd7YDjndVKdP69m5o 8BX7bCVaNYiA54dI6Gh+B6TKD3nZpnJEnQCp500F/rJkoYw32QZ/ZXakUnAi6dyytZkcO2RPH29 l9DUf9tfm9+EkvOW6lP2ikaZs/bnJ0hN/ZLeWjl3JCC+t9rH4TpLxX2ETqxwgu1ZVGsDwbetsJO A== X-Google-Smtp-Source: AGHT+IGkv22D/tXO6gyejl8aGvoee4XRPTMlaDRFSvv3esIwSJ6T6kZ4z2wkxktBHm+oP1e/SApNkNTiB8aBohhkips= X-Received: by 2002:a17:902:dacd:b0:236:7079:fb10 with SMTP id d9443c01a7336-23dda158b72mr87435ad.3.1751994982637; Tue, 08 Jul 2025 10:16:22 -0700 (PDT) MIME-Version: 1.0 References: <006899ccedf93f45082390460620753090c01914.camel@intel.com> In-Reply-To: From: Vishal Annapurve Date: Tue, 8 Jul 2025 10:16:08 -0700 X-Gm-Features: Ac12FXwUqBw2naxHJ41VuiRvkgNlNHoN7o2IpurfkihYeV5Xw345ab2C4M7MZQQ Message-ID: Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd To: "Edgecombe, Rick P" Cc: "pvorel@suse.cz" , "kvm@vger.kernel.org" , "catalin.marinas@arm.com" , "Miao, Jun" , "palmer@dabbelt.com" , "pdurrant@amazon.co.uk" , "steven.price@arm.com" , "peterx@redhat.com" , "x86@kernel.org" , "amoorthy@google.com" , "tabba@google.com" , "quic_svaddagi@quicinc.com" , "jack@suse.cz" , "vkuznets@redhat.com" , "quic_eberman@quicinc.com" , "keirf@google.com" , "mail@maciej.szmigiero.name" , "anthony.yznaga@oracle.com" , "Wang, Wei W" , "rppt@kernel.org" , "Wieczor-Retman, Maciej" , "Zhao, Yan Y" , "ajones@ventanamicro.com" , "Hansen, Dave" , "paul.walmsley@sifive.com" , "quic_mnalajal@quicinc.com" , "aik@amd.com" , "usama.arif@bytedance.com" , "fvdl@google.com" , "quic_cvanscha@quicinc.com" , "Shutemov, Kirill" , "vbabka@suse.cz" , "anup@brainfault.org" , "thomas.lendacky@amd.com" , "linux-kernel@vger.kernel.org" , "mic@digikod.net" , "oliver.upton@linux.dev" , "Du, Fan" , "akpm@linux-foundation.org" , "muchun.song@linux.dev" , "binbin.wu@linux.intel.com" , "Li, Zhiquan1" , "rientjes@google.com" , "mpe@ellerman.id.au" , "Aktas, Erdem" , "david@redhat.com" , "jgg@ziepe.ca" , "willy@infradead.org" , "hughd@google.com" , "Xu, Haibo1" , "jhubbard@nvidia.com" , "maz@kernel.org" , "Yamahata, Isaku" , "jthoughton@google.com" , "will@kernel.org" , "steven.sistare@oracle.com" , "jarkko@kernel.org" , "quic_pheragu@quicinc.com" , "nsaenz@amazon.es" , "chenhuacai@kernel.org" , "Huang, Kai" , "shuah@kernel.org" , "bfoster@redhat.com" , "dwmw@amazon.co.uk" , "Peng, Chao P" , "pankaj.gupta@amd.com" , "Graf, Alexander" , "nikunj@amd.com" , "viro@zeniv.linux.org.uk" , "pbonzini@redhat.com" , "yuzenghui@huawei.com" , "jroedel@suse.de" , "suzuki.poulose@arm.com" , "jgowans@amazon.com" , "Xu, Yilun" , "liam.merwick@oracle.com" , "michael.roth@amd.com" , "quic_tsoni@quicinc.com" , "Li, Xiaoyao" , "aou@eecs.berkeley.edu" , "Weiny, Ira" , "richard.weiyang@gmail.com" , "kent.overstreet@linux.dev" , "qperret@google.com" , "dmatlack@google.com" , "james.morse@arm.com" , "brauner@kernel.org" , "linux-fsdevel@vger.kernel.org" , "ackerleytng@google.com" , "pgonda@google.com" , "quic_pderrin@quicinc.com" , "hch@infradead.org" , "linux-mm@kvack.org" , "seanjc@google.com" , "roypat@amazon.co.uk" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 22CB940019 X-Stat-Signature: qdzhyxs4mz3so9wzjcyra5s4n6iwa6ak X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1751994984-224260 X-HE-Meta: U2FsdGVkX1/OlcJlcCzJShY0d9cmIRXwjw+k/qnTg2eEgl7SqSltY9mVwDLmU461GKvHTD+dk3U0pNrLwqgDcm2/C+P2o9KLSiK4+2HX/UXaWD5YEgtSkr78QqkEi7b5RcAG2aqOJqNcK4giG0LHfPbV2fqk3CDKc92h5U6qYC9dJd5prvThfUpMX3bWeSjp5vy8dk2xcrVWRkC6ZnHTAlFI28NtB4y0+rkW5GGqUaR5Kp9khesABpzvll2z5LOJhI6iJuj0Y8oFkyKOS9iQGPZT/Gs5mYMbclT80IhySCHCpGSDXmbYy4t9yX6HdiuMcIgfLD/sKU24iNJDUEeR0a5Pzd4p9qf6u+bKai92dicuG3OvGnX3k/8U6TErGoMuzeyNs7snqsBaHvEWBM0BAt7VyjFx8tCPXd+1pj+xj+zqYVgLQu04kGX4RslaqZqTgW+I/7+uk3rSwM4D/CfgvZVPkGqb2wJ0V/IuAHmMnpzN7zfF/IhEvuY5xUqa40f4vNuhES80NQMBEEnZ4J4YJSyNdu5q6IDrJmArWBAME3wws7EV3WqF3ZFVWOJNjzCv7QZqFSiigQpaQK3AlUNGgaNpHaqkQ4A1jn1tVc7oLivDqiQQVfoahSnmrKGkoD2CvvxgFGLeznWz26k48s4eRmaTNbFwrjvfSP9AUMsiwsgdRxmfpUgRinRU8fb4QAt/0UthDxZBt0M+u+6rhrIDI/9U0rXj2xXOQGRuE8LZHXbgajYzW2lAQkp6Pd+wGGpdLQxM6OHs+94kg64phb8fOvW0breJSESRHgcQn70ygCjuJs30xEurh6rFCYs6Hl+fAb8pwypZ3yt0b8M1A3qwTK9crEpM1T7TCAvgJzlgX/fo4kVO5ctR6mL6J+L6JW4WNoylTdPVSkTUgFWY1E7nYEvYu6CKjvU93xa6/d/Vw/mpC55yGoqIBNvXyc8GcJKDAWBCsfujICDanNjXPyv x7QDPph4 XU0f2TjY18e0sWlThP9u1kl1MSW60DdAEiBy05qIECXfRD2jXVAZauelma3p1oK5NXqLIJJR4m/X6HLw3nTYZS2PNkRaGf4bVifxmuY61CfNVAllnFpRL57UOtjg/yFmXXUgdKTxbqp6AE60jxtWwYY4TtFoGAbVh0ANG/yi0bg1O1ldlOdsuw144qC1/bHWASuj1POYMFtGaC+mtCP3gcy7sDla07FWSnqKNR3QVs+ZVXmUo14xhXli3NkX+rEmBhUuX1HQ787makZRi8six7iORBkrY89pxGz9jAtC/loT/c1BEWgJOkwddU3jP9dfc7Lnn+zdAxAb4myNVS9z/N8L2ChX7kDetky3fwEGPWnwrWPY1nmP3M5vQf8Ei/yX2xNha X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 8, 2025 at 8:31=E2=80=AFAM Edgecombe, Rick P wrote: > > On Tue, 2025-07-08 at 08:07 -0700, Vishal Annapurve wrote: > > On Tue, Jul 8, 2025 at 7:52=E2=80=AFAM Edgecombe, Rick P > > wrote: > > > > > > On Tue, 2025-07-08 at 07:20 -0700, Sean Christopherson wrote: > > > > > For TDX if we don't zero on conversion from private->shared we wi= ll be > > > > > dependent > > > > > on behavior of the CPU when reading memory with keyid 0, which wa= s > > > > > previously > > > > > encrypted and has some protection bits set. I don't *think* the b= ehavior is > > > > > architectural. So it might be prudent to either make it so, or ze= ro it in > > > > > the > > > > > kernel in order to not make non-architectual behavior into usersp= ace ABI. > > > > > > > > Ya, by "vendor specific", I was also lumping in cases where the ker= nel would > > > > need to zero memory in order to not end up with effectively undefin= ed > > > > behavior. > > > > > > Yea, more of an answer to Vishal's question about if CC VMs need zero= ing. And > > > the answer is sort of yes, even though TDX doesn't require it. But we= actually > > > don't want to zero memory when reclaiming memory. So TDX KVM code nee= ds to know > > > that the operation is a to-shared conversion and not another type of = private > > > zap. Like a callback from gmem, or maybe more simply a kernel interna= l flag to > > > set in gmem such that it knows it should zero it. > > > > If the answer is that "always zero on private to shared conversions" > > for all CC VMs, then does the scheme outlined in [1] make sense for > > handling the private -> shared conversions? For pKVM, there can be a > > VM type check to avoid the zeroing during conversions and instead just > > zero on allocations. This allows delaying zeroing until the fault time > > for CC VMs and can be done in guest_memfd centrally. We will need more > > inputs from the SEV side for this discussion. > > > > [1] https://lore.kernel.org/lkml/CAGtprH-83EOz8rrUjE+O8m7nUDjt=3DTHyXx= =3Dkfft1xQry65mtQg@mail.gmail.com/ > > It's nice that we don't double zero (since TDX module will do it too) for > private allocation/mapping. Seems ok to me. > > > > > > > > > > > > > > > Up the thread Vishal says we need to support operations that use = in-place > > > > > conversion (overloaded term now I think, btw). Why exactly is pKV= M using > > > > > private/shared conversion for this private data provisioning? > > > > > > > > Because it's literally converting memory from shared to private? A= nd IICU, > > > > it's > > > > not a one-time provisioning, e.g. memory can go: > > > > > > > > shared =3D> fill =3D> private =3D> consume =3D> shared =3D> fill = =3D> private =3D> consume > > > > > > > > > Instead of a special provisioning operation like the others? (Xia= oyao's > > > > > suggestion) > > > > > > > > Are you referring to this suggestion? > > > > > > Yea, in general to make it a specific operation preserving operation. > > > > > > > > > > > : And maybe a new flag for KVM_GMEM_CONVERT_PRIVATE for user space= to > > > > : explicitly request that the page range is converted to private a= nd the > > > > : content needs to be retained. So that TDX can identify which cas= e needs > > > > : to call in-place TDH.PAGE.ADD. > > > > > > > > If so, I agree with that idea, e.g. add a PRESERVE flag or whatever= . That way > > > > userspace has explicit control over what happens to the data during > > > > conversion, > > > > and KVM can reject unsupported conversions, e.g. PRESERVE is only a= llowed for > > > > shared =3D> private and only for select VM types. > > > > > > Ok, we should POC how it works with TDX. > > > > I don't think we need a flag to preserve memory as I mentioned in [2]. = IIUC, > > 1) Conversions are always content-preserving for pKVM. > > 2) Shared to private conversions are always content-preserving for all > > VMs as far as guest_memfd is concerned. > > 3) Private to shared conversions are not content-preserving for CC VMs > > as far as guest_memfd is concerned, subject to more discussions. > > > > [2] https://lore.kernel.org/lkml/CAGtprH-Kzn2kOGZ4JuNtUT53Hugw64M-_XMmh= z_gCiDS6BAFtQ@mail.gmail.com/ > > Right, I read that. I still don't see why pKVM needs to do normal private= /shared > conversion for data provisioning. Vs a dedicated operation/flag to make i= t a > special case. It's dictated by pKVM usecases, memory contents need to be preserved for every conversion not just for initial payload population. > > I'm trying to suggest there could be a benefit to making all gmem VM type= s > behave the same. If conversions are always content preserving for pKVM, w= hy > can't userspace always use the operation that says preserve content? Vs > changing the behavior of the common operations? I don't see a benefit of userspace passing a flag that's kind of default for the VM type (assuming pKVM will use a special VM type). Common operations in guest_memfd will need to either check for the userspace passed flag or the VM type, so no major change in guest_memfd implementation for either mechanism. > > So for all VM types, the user ABI would be: > private->shared - Always zero's page > shared->private - Always destructive > shared->private (w/flag) - Always preserves data or return error if not p= ossible > > > Do you see a problem? >