From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99C47C7EE39 for ; Sun, 29 Jun 2025 18:28:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 04E4F6B0092; Sun, 29 Jun 2025 14:28:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F40AF6B0093; Sun, 29 Jun 2025 14:28:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E30086B0095; Sun, 29 Jun 2025 14:28:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CB7506B0092 for ; Sun, 29 Jun 2025 14:28:39 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4C58616022E for ; Sun, 29 Jun 2025 18:28:39 +0000 (UTC) X-FDA: 83609273958.18.FF4D02A Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf06.hostedemail.com (Postfix) with ESMTP id 66389180004 for ; Sun, 29 Jun 2025 18:28:37 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0pQMM3Aq; spf=pass (imf06.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751221717; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=W1l9JIPjHLpLWpV87RrF/MILzYtSbQffXmt07qUOMzw=; b=Ohd4d+HqGKVDm0P1Y/XoGoT4jEA3G6zKl0siuI6lpOXbwRC274/4i5yhuzHQoBh+3IA5Rv FBHxDObsK7YmOzimIzftN/MeY8au9plFkYMy7Md45GhqcZ6zKfqRuCH1CkWsi0L3F+yyjM hxQ18zY40IdJlZQYDG+vyiaRpkO3l54= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0pQMM3Aq; spf=pass (imf06.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751221717; a=rsa-sha256; cv=none; b=sIMjxSVxk90rEAFqhdrU5xEUPNrbh+mhJnu7BEXNVTsoW0q1s5TSRgNHCJl6NHqLIX8Pgk 44Xio9+EBrQWm+f1P9mygb63YmVbEAZal2Bg23nLlx8xU/WIBPUapXrykeF+lpGzG/gd+8 YI6cJ2zyLlKNvABJboirJkEGYaMrUbk= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-2357c61cda7so145845ad.1 for ; Sun, 29 Jun 2025 11:28:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1751221716; x=1751826516; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=W1l9JIPjHLpLWpV87RrF/MILzYtSbQffXmt07qUOMzw=; b=0pQMM3Aq8Z166EcOKID1soPL53O3AE8ltwPyDlUHo4j77pgIwomYFlRwiTyOlq+xca Ia/z7P8dN3h7RqLX8egmM/qtl1oNreheje3RiqMmEN4poVbTuQfhbSbwbA6kLjIQ4Iu2 ptcoPCZ9Yyq5YPMK/fSBXwjQIeasFMLRnbTBBkPbUrZ8rr96YC5qpENsFuVjaw9MWTyl EOGJ6/nQ0B8CuRStO4VmrjEm0ZOxYPQYfe6D5jVIVTqWQQCJmyCN1dY54p/GnCpBwZOu MuOWcYt+PkgRVqJNqDyMjR0g86FeTqixyZFTgwJnn+t8GFwcOr2798rq6IWOYZu0I7tv IppQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751221716; x=1751826516; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W1l9JIPjHLpLWpV87RrF/MILzYtSbQffXmt07qUOMzw=; b=O/xAzOW43/wElhoalF6iDgsLTzvyMXHh1t/cNRneEJ8XOi7GTmARzgQ6mf0ikbpH8H SdewLBpYfyDx1kdNlghXcQqjDHViP4Q7Nvq2MyHC1z0n3/e6JDzxMnP+K51ErHpqR8sF 6Jh6Y+qwtksNKXuUXrLDxqR4UqvW2RAm1nuNfCMtH4+Mzpu+wM5GsTSXe2uLMplFQFWM hPvt2F1AU2No/VYqvp13K8vV5AV8JoI5Fi8VWcNL2WGCjbvCwXLUEOWuYwFJ2FDYRXaN iT75LhhdTeUJfi+vNVcIcfZIng/+zV/p/SGRlXiLpdmpb4skQE1evMoYtSkBe5qSClVX ttlQ== X-Forwarded-Encrypted: i=1; AJvYcCVVfhAW11K2mzJ9gwZ7RJmML4zDFfI1y28MPuPYD4+LQcQYwmzB2h9kSxJxGfCEpDwxeJ3NyWBM+w==@kvack.org X-Gm-Message-State: AOJu0Yxpg/b8yu8jffqPUwEKDpt3RtaPCutm2KjiTo4T0GliZecSegQu Igl8hzmTCYgo8i8zugXvzE7o+jgseb1KS2+HkVHuU85qXGfBjEKzEoQA1bhOnpArukaafA340JA MEY94b0yIndfYNtVXjUJyegdlPkvVBTITjUPmSAmi X-Gm-Gg: ASbGncteUFA2z+t2FhgfPZFiwthQvyJ6Zhss7/1VfranwSQPmP4l7AJYbMuYcGtzESh pLIB8ZhzqCBLWUknlrGLXYwPgRcySCXTyJz67qq8nrlrdSXHrbO8DntVcZNYi2kGh8Gdvx/+Dfm uAge5phOkDMmuVS0nkq2ydGlc1KptBHrWq/e2YfthF556tVeIupE8HJGzaGmenmheoPQ8uZm1qB OFs X-Google-Smtp-Source: AGHT+IHdVA75hLFFAdHVzvO4EeTa+FUWamKY84vMMfGu7CzFa8alTxmn/cBaHHTKvAmeb6d7cowOhMdsvEuUdRooiZc= X-Received: by 2002:a17:902:e84a:b0:234:bca7:2934 with SMTP id d9443c01a7336-23ae8e8bdffmr3053245ad.6.1751221715515; Sun, 29 Jun 2025 11:28:35 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Vishal Annapurve Date: Sun, 29 Jun 2025 11:28:22 -0700 X-Gm-Features: Ac12FXwYsllFJOo6Hodgf2pKamvDOx3QsF86HEiXlTZ2qWaN29U0KjYTlrTOQmg Message-ID: Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd To: Xiaoyao Li Cc: Yan Zhao , Ackerley Tng , kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 66389180004 X-Stat-Signature: tb76zzid5edermjr9psmxmy9cmkxy3rr X-HE-Tag: 1751221717-462548 X-HE-Meta: U2FsdGVkX18FLF7mv9aezMHYtcVRgoy3BDsN/v/7xZdZD+ct/O5oj5AcJQ3/iJaB0Ty/oYc05kXKqb1F/w0uZlhZ6/7ZSKCViUPOHCzUMepio1OSY3lVagGGdg1+L1eS5f+pDyRu/Es0eVZt4U7uBJZbpfGybu0AlOifXA/oEQuNWwMyHbm4FaOZFSTaHxH8vVIbAv2Cd1dvfRBi1+aDWvIBIwYUyOFzOrbVwbQuxrM3sXDuqQhzrfObASkSp5gLy7x45Ckl3vVpsaGxpCs7L4j655Yr2cTihiG+LmnwLloD2GseXxdgeQEYZ04OxWZvG9tKm9JSCZfXr9eSnSpS7UdAi8/ZIzE9NjNfZcJUfKkYBcvwnYBXgMs4Zx0UI1l0izGj2YRsWNI2VTI8LsChN5cl90aGQxbEJBXiMTyBddXayurOmyf/jh0gG1zswXMevnCgMFDXu+G8elSvuCpa2/vEDwo2t38Kln4DnDUxy4xs4DbwEXWT+Vr8bOjXi8B/V+1gPONG3Z4vEChAqdfWQvcm/lXYC2jv5v3qDBVBQpbflEFnv27RDemBHAUs4vPq2jJUIsligg1aV0ai4qzd5Dv7y950FaYLkX4N+AnybxUqVrocmWfdd5b6nNYQiVP9b2Qel9RfBLGjd8asOWIX9m9lgjVe0t1o+fy1w+r605+wr5HyarU5yp5XCOhYniWhiRTBMtCcz4Z0jeLQR2Za510KklQ7iJ0kcll6UdMhZoRjCGmVDG3VoJyenZJ8H9R6CZPOeLEpy+I/r9A3h3qQKCC9RImdCksQz5nJ6bJR6AGKnRlrDMuRcXEInOJ41u74l0W44bteFT40aqQGOiW+zOoLHubBeDYwUXCaKprKWBuFevhhHV7AyOwi0Zx7oUItwjETnQL477HNhFLxm+jdPR2IEs+ZC8/jpuv1mWKr7s3UblBnveRgwhw3Zi2OGCNLDMeqydwDn8fV42plXkI SUKEukuk jTyS2PYcwLn3C9uNFaIRXpnDdBreqSKsuaAdIWs5dem/L8IrT4/2TrHxFwMU5r+PtJv/L2QKZHrg8ymL5AR9gXD2HJBIBhB+SNCQhmDlhodB1sDi1BagAWMSAQQEcGnjbFFw6tnpNS2HeKTeBChQtOWGBCv8Ge5QBmb03N9GMMUKqPWr5W0qJvObm87uuvlz7J07HD7wvuG6b7JK5d8RwYxHRspXIj9kiaYx4BWoaRnZ7lsWgA2rOzIQXpe59jhySz3VUMNF333Q5Pkee6epp+mlnHxXffaHUl869AMP6TlUND0ZyJZd9rFCI/OCnZXxuhyKP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 19, 2025 at 1:59=E2=80=AFAM Xiaoyao Li w= rote: > > On 6/19/2025 4:13 PM, Yan Zhao wrote: > > On Wed, May 14, 2025 at 04:41:39PM -0700, Ackerley Tng wrote: > >> Hello, > >> > >> This patchset builds upon discussion at LPC 2024 and many guest_memfd > >> upstream calls to provide 1G page support for guest_memfd by taking > >> pages from HugeTLB. > >> > >> This patchset is based on Linux v6.15-rc6, and requires the mmap suppo= rt > >> for guest_memfd patchset (Thanks Fuad!) [1]. > >> > >> For ease of testing, this series is also available, stitched together, > >> at https://github.com/googleprodkernel/linux-cc/tree/gmem-1g-page-supp= ort-rfc-v2 > > > > Just to record a found issue -- not one that must be fixed. > > > > In TDX, the initial memory region is added as private memory during TD'= s build > > time, with its initial content copied from source pages in shared memor= y. > > The copy operation requires simultaneous access to both shared source m= emory > > and private target memory. > > > > Therefore, userspace cannot store the initial content in shared memory = at the > > mmap-ed VA of a guest_memfd that performs in-place conversion between s= hared and > > private memory. This is because the guest_memfd will first unmap a PFN = in shared > > page tables and then check for any extra refcount held for the shared P= FN before > > converting it to private. > > I have an idea. > > If I understand correctly, the KVM_GMEM_CONVERT_PRIVATE of in-place > conversion unmap the PFN in shared page tables while keeping the content > of the page unchanged, right? That's correct. > > So KVM_GMEM_CONVERT_PRIVATE can be used to initialize the private memory > actually for non-CoCo case actually, that userspace first mmap() it and > ensure it's shared and writes the initial content to it, after it > userspace convert it to private with KVM_GMEM_CONVERT_PRIVATE. I think you mean pKVM by non-coco VMs that care about private memory. Yes, initial memory regions can start as shared which userspace can populate and then convert the ranges to private. > > For CoCo case, like TDX, it can hook to KVM_GMEM_CONVERT_PRIVATE if it > wants the private memory to be initialized with initial content, and > just do in-place TDH.PAGE.ADD in the hook. I think this scheme will be cleaner: 1) Userspace marks the guest_memfd ranges corresponding to initial payload as shared. 2) Userspace mmaps and populates the ranges. 3) Userspace converts those guest_memfd ranges to private. 4) For both SNP and TDX, userspace continues to invoke corresponding initial payload preparation operations via existing KVM ioctls e.g. KVM_SEV_SNP_LAUNCH_UPDATE/KVM_TDX_INIT_MEM_REGION. - SNP/TDX KVM logic fetches the right pfns for the target gfns using the normal paths supported by KVM and passes those pfns directly to the right trusted module to initialize the "encrypted" memory contents. - Avoiding any GUP or memcpy from source addresses. i.e. for TDX VMs, KVM_TDX_INIT_MEM_REGION still does the in-place TDH.PAGE.= ADD. Since we need to support VMs that will/won't use in-place conversion, I think operations like KVM_TDX_INIT_MEM_REGION can introduce explicit flags to allow userspace to indicate whether to assume in-place conversion or not. Maybe kvm_tdx_init_mem_region.source_addr/kvm_sev_snp_launch_update.uaddr can be null in the scenarios where in-place conversion is used.