From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C2FDC7EE30 for ; Tue, 1 Jul 2025 19:48:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C67106B00AC; Tue, 1 Jul 2025 15:48:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C17606B00B0; Tue, 1 Jul 2025 15:48:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2CEE6B00BB; Tue, 1 Jul 2025 15:48:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A07B76B00AC for ; Tue, 1 Jul 2025 15:48:55 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 39D2D14060D for ; Tue, 1 Jul 2025 19:48:55 +0000 (UTC) X-FDA: 83616733830.26.74A635D Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf02.hostedemail.com (Postfix) with ESMTP id 7DFA580004 for ; Tue, 1 Jul 2025 19:48:53 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=v8qI3ecp; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=vannapurve@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751399333; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=H9C1A9ZtlfkyrWjrc9HRSFIAILIDx8xEDJtLSogn8us=; b=MKgchPnx5tGQ5B0ingESrChLA4xxjeEeMlFASQbET2L8TX1YqJmSfeKGH5T8OtZdJyJSxH jMsAPxkW0evWhsp4pw82Tnn+EPoYTRWtlNtSXMWiw9+Xm3SsfdKUsKUShUpWJLyKIKs1Xz rq89eHrT2z3zrh5PkYnhV99ISS31+Lc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751399333; a=rsa-sha256; cv=none; b=DuWKMMgBhwt6mZUamVAxGblc1iUkXWzSxajtSe8p1wNnA7SJnPC+Obl8/iRHyweReNxfeI cf1eBkgPWTIqfY16H2Pt6xJ7rA1trQYRf2pDxlYAn5kRu6V9Hx8uUXZA5SsiJmLJf1hAiW Rk3OpMlmxhOyzMCj7/jwJMw5n3ubV8E= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=v8qI3ecp; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=vannapurve@google.com Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-235e389599fso313145ad.0 for ; Tue, 01 Jul 2025 12:48:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1751399332; x=1752004132; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=H9C1A9ZtlfkyrWjrc9HRSFIAILIDx8xEDJtLSogn8us=; b=v8qI3ecpmoPwxg+jVL+l7wlG426YXovr85dkjVR0yyYpLk2irn0w1WoJ9WIcuemti5 5j8MgAOeSmyAFltobRvbCXZ2bworrJD8tqwJQU+qc214cZGVv4dQOfzIrtcld0NnzHJd t25OPcf11oy0Ir5OOg5yUJoCeLTIX2m3CJuQ2tdD2AfVY+KLbt4ab0/4rgv+ydGEFmt3 aE+k4Rd3Se9hHwI4bKP/gkjr8jG2ZIOYqm782K0a/LARIm59H/Y7w5w87/kLrK6Xu2Cv 9m0/sbiHj5y/gG0uo71mtBCPoN1S/sermgo4ymoFLs3lnLaHwd8g+U46U2BXj6hC0gea 9d/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751399332; x=1752004132; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=H9C1A9ZtlfkyrWjrc9HRSFIAILIDx8xEDJtLSogn8us=; b=ovXTjC1chCM5oOoQ/GkAt5HUIjt9ifNu2jL64qlHFitlSB367h+aOlhdLCgOhkq4ly 2e18vB6MmzMOq1CFTGnyFPBGRbUZojRu2CuIAYWsYC5CAgUl0bg37cmjUaBxFxV7Oeqx /KwR88J15DgN8OadNQJsUBDXML5o0QseB9Z3V+0vxTgdMxjrfd4yQAfiN9kZIvoaT51K VyYEYORKu6ivMy1/PRVC3N80psVzOsgA73uFj+Vbb/Wduqvw9h5HiH0nr1E/I9oIOdsu /NIKe+rmuwibFGhCIgTjdKoS9HxBjILkEwVGv8FpltO0I2805IElZk8+cD2TUO7ceLmo CgGw== X-Forwarded-Encrypted: i=1; AJvYcCUR6DlnCnGVsmp3mhWiA7BeJXk4D5PkirB9QVFJj6aMWYvufc9IWgXkCbLt3meUMp6BFw4pe+914w==@kvack.org X-Gm-Message-State: AOJu0YznIZ+KPo90IGCQY7b8SgRCciKvHI7X6Vy0gz6AX82LSjtolpA+ PymLMI8woWxVDVu7U/LnBOmCKHw+uI5jmRzvFcThfvFafj7sH/Y/FJbHjVkInhsWvXK3L0N94Sp Dvw+I8815HuKPR4bBHLjPp85SVu700/sCHfBlfNCz X-Gm-Gg: ASbGncsUGeWqTbVFKB/6C2SnVmLIV42E/CqjYQHja1qPxfKVrbplYeL3GjQVNaPHCoD kOZiwKFrkU1zngCPBCb+O4CnEwdinkUBaHof1frl3KCzdaSAjzvaGc18DB32E6QEOdcr1obvDAx 1wMXDSoWcXh4LYFPvVFVpKS+blW6FFUijM4efixPgmRN3MaOp+NrtTsMgoHpmQ8OAnzYd/ezlEk Q== X-Google-Smtp-Source: AGHT+IENvYXPgDpUe+vzQE5KuUHBhTP80X2A30Z/Ya++kiVZW1d4u/JfEmH54FnMc8zPFv+a5qX8wSTbmwJULaQEjiE= X-Received: by 2002:a17:903:22c9:b0:235:e8da:8e1 with SMTP id d9443c01a7336-23c60129194mr3562225ad.18.1751399331772; Tue, 01 Jul 2025 12:48:51 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Vishal Annapurve Date: Tue, 1 Jul 2025 12:48:39 -0700 X-Gm-Features: Ac12FXxoJDvGZ3lWm10aNZICDIFBf9oAjAloEEDJd5krCXcYZQlXLEUbQye3aB0 Message-ID: Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd To: Yan Zhao Cc: Xiaoyao Li , Ackerley Tng , kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: dfhqd3rqg7oxajtqcqqp1ar39wcdz7a9 X-Rspamd-Queue-Id: 7DFA580004 X-Rspam-User: X-Rspamd-Server: rspam07 X-HE-Tag: 1751399333-436968 X-HE-Meta: U2FsdGVkX18DxUsDCkAVcB4rzlDUvbp2Z514Bb30ALQXvu2Iv0WFPIGHnNi63ZGe55mTRQTVY80EHaXMOPqG7Z8S9YCdPAi8vmrlzKX16ZEDqHdmX7xtTI3y7sKnRXLnK3nmEXPEf9JV00u5vjxN4Xhz4MB7SKbVZCqTm4RuktA0cR09GdK7EWGTJmIFkMpHcZE58rTI5bUfHzf/D/yGGklynMqL64uElVBFOXG5BF1QnaaMF2MpRjpcQcdSGYT+WD+A2hVFbsjMkWi7dqXdc5oXXnieIA8v3ds2ikTdpMG/rLGhhzZJUi6tDUspnzMjRzSfZkf+dv871iGDOFWRe6jDtFHT96FxdK4TReF/YLgSTEj8JvB/aHHIJSB2l8ip/pIWn5Byga9Y4xSJ1/F2+nLQCwQqtakmYNcOZdV8UezreBl1TrxL2jlWHaqFbwew7uErhihNboAugraOzIXw0YcWrMSOheEObxkOtNnUwZNhfxMU9jM+x0mne5nGp13Y3RgGsjtqp3jRThTs7+TETx6kN8XrDI9kof778kMRfyM8S7L9MrdYJ4eZq5yYTk0XQm1mb3gEcSC3/VOWIj1b+fEqZ1wxelowo0gSqntdKbFs92eNYel+Pe0FozuuYev9LEmYghFJOVA6qB6fd+29y3oWDGIq3u3ILX8qNZkYz7S2U/QWDtfOri+a4VAbAzbQcJZvw+OW4aHItvqxTnztpcDGjLu2jtcY8lGaGvL0dOjK1Hl9xdW8ykXB7F/Nv9zQtRqrWeCgZYaNzqX7mbpHxLJgNrZloE2GlPGGm5dGUzbrZtRWyBc4uveqLnJ26gHdvIOSKWqnZnzo/DfZzSEo6PS4WrXmYZutxLfeuw3WFaXEDJcvbSplDlCiu4QaCq93FCXF69jWzWv3kxwh9Jip8BGwejj6ZuLNmJ6M052KPoh9E1IfHKs+5AgcQEhMzfwCViu2IZVetUmWQ8P01z9 ZffwosGm fbvQfx+db0N95oSc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 30, 2025 at 10:26=E2=80=AFPM Yan Zhao wr= ote: > > On Mon, Jun 30, 2025 at 07:14:07AM -0700, Vishal Annapurve wrote: > > On Sun, Jun 29, 2025 at 8:17=E2=80=AFPM Yan Zhao = wrote: > > > > > > On Sun, Jun 29, 2025 at 11:28:22AM -0700, Vishal Annapurve wrote: > > > > On Thu, Jun 19, 2025 at 1:59=E2=80=AFAM Xiaoyao Li wrote: > > > > > > > > > > On 6/19/2025 4:13 PM, Yan Zhao wrote: > > > > > > On Wed, May 14, 2025 at 04:41:39PM -0700, Ackerley Tng wrote: > > > > > >> Hello, > > > > > >> > > > > > >> This patchset builds upon discussion at LPC 2024 and many gues= t_memfd > > > > > >> upstream calls to provide 1G page support for guest_memfd by t= aking > > > > > >> pages from HugeTLB. > > > > > >> > > > > > >> This patchset is based on Linux v6.15-rc6, and requires the mm= ap support > > > > > >> for guest_memfd patchset (Thanks Fuad!) [1]. > > > > > >> > > > > > >> For ease of testing, this series is also available, stitched t= ogether, > > > > > >> at https://github.com/googleprodkernel/linux-cc/tree/gmem-1g-p= age-support-rfc-v2 > > > > > > > > > > > > Just to record a found issue -- not one that must be fixed. > > > > > > > > > > > > In TDX, the initial memory region is added as private memory du= ring TD's build > > > > > > time, with its initial content copied from source pages in shar= ed memory. > > > > > > The copy operation requires simultaneous access to both shared = source memory > > > > > > and private target memory. > > > > > > > > > > > > Therefore, userspace cannot store the initial content in shared= memory at the > > > > > > mmap-ed VA of a guest_memfd that performs in-place conversion b= etween shared and > > > > > > private memory. This is because the guest_memfd will first unma= p a PFN in shared > > > > > > page tables and then check for any extra refcount held for the = shared PFN before > > > > > > converting it to private. > > > > > > > > > > I have an idea. > > > > > > > > > > If I understand correctly, the KVM_GMEM_CONVERT_PRIVATE of in-pla= ce > > > > > conversion unmap the PFN in shared page tables while keeping the = content > > > > > of the page unchanged, right? > > > > > > > > That's correct. > > > > > > > > > > > > > > So KVM_GMEM_CONVERT_PRIVATE can be used to initialize the private= memory > > > > > actually for non-CoCo case actually, that userspace first mmap() = it and > > > > > ensure it's shared and writes the initial content to it, after it > > > > > userspace convert it to private with KVM_GMEM_CONVERT_PRIVATE. > > > > > > > > I think you mean pKVM by non-coco VMs that care about private memor= y. > > > > Yes, initial memory regions can start as shared which userspace can > > > > populate and then convert the ranges to private. > > > > > > > > > > > > > > For CoCo case, like TDX, it can hook to KVM_GMEM_CONVERT_PRIVATE = if it > > > > > wants the private memory to be initialized with initial content, = and > > > > > just do in-place TDH.PAGE.ADD in the hook. > > > > > > > > I think this scheme will be cleaner: > > > > 1) Userspace marks the guest_memfd ranges corresponding to initial > > > > payload as shared. > > > > 2) Userspace mmaps and populates the ranges. > > > > 3) Userspace converts those guest_memfd ranges to private. > > > > 4) For both SNP and TDX, userspace continues to invoke correspondin= g > > > > initial payload preparation operations via existing KVM ioctls e.g. > > > > KVM_SEV_SNP_LAUNCH_UPDATE/KVM_TDX_INIT_MEM_REGION. > > > > - SNP/TDX KVM logic fetches the right pfns for the target gfns > > > > using the normal paths supported by KVM and passes those pfns direc= tly > > > > to the right trusted module to initialize the "encrypted" memory > > > > contents. > > > > - Avoiding any GUP or memcpy from source addresses. > > > One caveat: > > > > > > when TDX populates the mirror root, kvm_gmem_get_pfn() is invoked. > > > Then kvm_gmem_prepare_folio() is further invoked to zero the folio. > > > > Given that confidential VMs have their own way of initializing private > > memory, I think zeroing makes sense for only shared memory ranges. > > i.e. something like below: > > 1) Don't zero at allocation time. > > 2) If faulting in a shared page and its not uptodate, then zero the > > page and set the page as uptodate. > > 3) Clear uptodate flag on private to shared conversion. > > 4) For faults on private ranges, don't zero the memory. > > > > There might be some other considerations here e.g. pKVM needs > > non-destructive conversion operation, which might need a way to enable > > zeroing at allocation time only. > > > > On a TDX specific note, IIUC, KVM TDX logic doesn't need to clear > > pages on future platforms [1]. > Yes, TDX does not need to clear pages on private page allocation. > But current kvm_gmem_prepare_folio() clears private pages in the common p= ath > for both TDX and SEV-SNP. > > I just wanted to point out that it's a kind of obstacle that need to be r= emoved > to implement the proposed approach. > Proposed approach will work with 4K pages without any additional changes. For huge pages it's easy to prototype this approach by just disabling zeroing logic in guest mem on faulting and instead always doing zeroing on allocation. I would be curious to understand if we need zeroing on conversion for Confidential VMs. If not, then the simple rule of zeroing on allocation only will work for all usecases.