From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0026BC83F09 for ; Tue, 8 Jul 2025 15:07:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 853B96B0095; Tue, 8 Jul 2025 11:07:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 82B5E6B0096; Tue, 8 Jul 2025 11:07:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 741036B0098; Tue, 8 Jul 2025 11:07:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6334C6B0095 for ; Tue, 8 Jul 2025 11:07:39 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id AA0CB8063F for ; Tue, 8 Jul 2025 15:07:38 +0000 (UTC) X-FDA: 83641426596.27.04D7E99 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf05.hostedemail.com (Postfix) with ESMTP id BDDC2100019 for ; Tue, 8 Jul 2025 15:07:36 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=KNr1aan0; spf=pass (imf05.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751987256; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7nyvefvRT1ifEMjTDz+E8UGUMF05RlH54ohMwb6CW+w=; b=JmhM5IoZKHjaT1OR/tdHvMoo21HT++UlVdbr1mxNwXMHKKE0R1e5YdgN29i/wSL9KbWtv9 0FfWVSn9Sfd5qrGWJHaCyvH3YN7RziBAje/2X8JQXUtCzixZEllmrbcLMgI0Dw2946uAFm fKCAPWmavxtMFvqlILgopfbS3/y9lpI= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=KNr1aan0; spf=pass (imf05.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751987256; a=rsa-sha256; cv=none; b=AxMozKKN9sUd+jM/02QxRHOVlqJ2VNZmKWukz/KBqZh8A0ko2D8hP5StV7azQsIhWrNcPZ tCHJ8tMyMBkPqTgCEoP4cW7LKsj0IXGrKcJbBAff+Fzj5N2RlXuHgA5njc9uOVchEcOfbs B0BKCkzCz6w+T/JE60sRn24cYFOViiY= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-237f270513bso167095ad.1 for ; Tue, 08 Jul 2025 08:07:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1751987255; x=1752592055; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7nyvefvRT1ifEMjTDz+E8UGUMF05RlH54ohMwb6CW+w=; b=KNr1aan0f2N/Qh3AeaivWT9oFDIyH1oVwT1f/DTJcLh0sUC9m7vcsViZ8dYyyHwXtU rfL6JSSK8GRZA88iWmE9pnUXjzBgIFGZf4y7u8kZnuot4DrDXQy2JyM1joK8oHX7JVvN UFmAsgu3NF299jKD/gP/xJsTGuc1sAzkHopd6DBMqaa70l+E7rNXV6JDyKHhwDKCwCoy Y7/nRS8ZKr2FBOqupycoGLzXj0lMHd+JrPG7mJSzbuZmODbifBe5YxMfYyoQglKruwKI 63xlbkkM1Wx9mY3oQktllbZ2uged35X2+eQZxG9rw7gveJsD9Hqf3q297ih4rFU1u6ou zdZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751987255; x=1752592055; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7nyvefvRT1ifEMjTDz+E8UGUMF05RlH54ohMwb6CW+w=; b=dbPz8BiFNSZwavP7GwNM9r4eRa8ux+x65RPDXcxO/lwO9XBwRa1h1vHeQEb+zgBtgm D3Cil7zoi3rCiXznP/S00Du4rKY1KRLvNrDx+0M2kZrJ/UoVQnPALo4zssUkg9j+DMMu OIzPLz6RoIKtyyjsTFXxCH6hq1aHgnPyervrEVw7JJQUNqYaTF7+SQAf6GFYSZXPXT5J 2buRxAyuo+rIWT28YY1tES0sDGdWAw8++c17vu/H1oza5tMkIu52fp26DsPt+PqeGBUb ZEl5gbt4fJszuF8Vgo2yPmq7TsYpbh1xQ5FJRMlY0fY47wdo3Ihyf933TkL+6i3AxDIZ PYOA== X-Forwarded-Encrypted: i=1; AJvYcCX4+CUc7izRF4jnaEAJFIPhUWKWvt4yOcX3PbbDn3NSyTCSv5SrDKP4gin8wHOzRlDg2od4u7ekVw==@kvack.org X-Gm-Message-State: AOJu0YwdPh/sxqiGeQCqSH0f+mg5mhYkRr8szuEauuh60Jbk+CBmfXnO 0a4sRsoCyhOR232Ciw9VWEZsSqhIMQf9BA/8zlBa9ZFl5uV99CiHDn8SuwXQr5q9kO7CjdvhYLl GW4bObGnBtJEaeGgX4AC8fbhXTnG6nsXodP5diAz0 X-Gm-Gg: ASbGnctNc+hdQdgl4JlpJXQyZQOSxI31Sy4G/F/oNmXauieK6PC9++0vLPOptF6/CZh ae9NSxPsXQlhm7wRPbe44oM4JVsGsFudWlXkyblsWUdLr48g2M/aDk6zQsCLgJ0s4NjgcQPuip+ onc/xZvITMOjmfZkCeC/mdBwtyKxp2zMURw2JdmiAvvP5+Ksc4Tbg9/99b+M8nHELTJJD4Ic7U4 gk= X-Google-Smtp-Source: AGHT+IHIHHkHM+0ytvtoo9gUGofMfuGgGK5gNtq9SZQvZT1yBIy3HUvKR5IUQIruCWxnjDFBfqSG1DsuCiLM4veiAFQ= X-Received: by 2002:a17:903:2b05:b0:23c:7be2:59d0 with SMTP id d9443c01a7336-23dd44dc8e3mr1779615ad.23.1751987255137; Tue, 08 Jul 2025 08:07:35 -0700 (PDT) MIME-Version: 1.0 References: <006899ccedf93f45082390460620753090c01914.camel@intel.com> In-Reply-To: From: Vishal Annapurve Date: Tue, 8 Jul 2025 08:07:21 -0700 X-Gm-Features: Ac12FXyt1XBYeRNZAiYKl1SDg7f_kwaXaQfE6hgO1iEl8fFY0RcZlRb8c6vqHzE Message-ID: Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd To: "Edgecombe, Rick P" Cc: "seanjc@google.com" , "pvorel@suse.cz" , "kvm@vger.kernel.org" , "catalin.marinas@arm.com" , "Miao, Jun" , "Shutemov, Kirill" , "pdurrant@amazon.co.uk" , "vbabka@suse.cz" , "peterx@redhat.com" , "x86@kernel.org" , "amoorthy@google.com" , "jack@suse.cz" , "quic_svaddagi@quicinc.com" , "keirf@google.com" , "palmer@dabbelt.com" , "vkuznets@redhat.com" , "mail@maciej.szmigiero.name" , "anthony.yznaga@oracle.com" , "Wang, Wei W" , "tabba@google.com" , "Wieczor-Retman, Maciej" , "Zhao, Yan Y" , "ajones@ventanamicro.com" , "willy@infradead.org" , "rppt@kernel.org" , "quic_mnalajal@quicinc.com" , "aik@amd.com" , "usama.arif@bytedance.com" , "Hansen, Dave" , "fvdl@google.com" , "paul.walmsley@sifive.com" , "bfoster@redhat.com" , "nsaenz@amazon.es" , "anup@brainfault.org" , "quic_eberman@quicinc.com" , "linux-kernel@vger.kernel.org" , "thomas.lendacky@amd.com" , "mic@digikod.net" , "oliver.upton@linux.dev" , "akpm@linux-foundation.org" , "quic_cvanscha@quicinc.com" , "steven.price@arm.com" , "binbin.wu@linux.intel.com" , "hughd@google.com" , "Li, Zhiquan1" , "rientjes@google.com" , "mpe@ellerman.id.au" , "Aktas, Erdem" , "david@redhat.com" , "jgg@ziepe.ca" , "jhubbard@nvidia.com" , "Xu, Haibo1" , "Du, Fan" , "maz@kernel.org" , "muchun.song@linux.dev" , "Yamahata, Isaku" , "jthoughton@google.com" , "steven.sistare@oracle.com" , "quic_pheragu@quicinc.com" , "jarkko@kernel.org" , "chenhuacai@kernel.org" , "Huang, Kai" , "shuah@kernel.org" , "dwmw@amazon.co.uk" , "Peng, Chao P" , "pankaj.gupta@amd.com" , "Graf, Alexander" , "nikunj@amd.com" , "viro@zeniv.linux.org.uk" , "pbonzini@redhat.com" , "yuzenghui@huawei.com" , "jroedel@suse.de" , "suzuki.poulose@arm.com" , "jgowans@amazon.com" , "Xu, Yilun" , "liam.merwick@oracle.com" , "michael.roth@amd.com" , "quic_tsoni@quicinc.com" , "Li, Xiaoyao" , "aou@eecs.berkeley.edu" , "Weiny, Ira" , "richard.weiyang@gmail.com" , "kent.overstreet@linux.dev" , "qperret@google.com" , "dmatlack@google.com" , "james.morse@arm.com" , "brauner@kernel.org" , "linux-fsdevel@vger.kernel.org" , "ackerleytng@google.com" , "pgonda@google.com" , "quic_pderrin@quicinc.com" , "hch@infradead.org" , "linux-mm@kvack.org" , "will@kernel.org" , "roypat@amazon.co.uk" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: ry87my45n6qr8o64u3tjmckf5hor1f7p X-Rspamd-Queue-Id: BDDC2100019 X-Rspam-User: X-Rspamd-Server: rspam07 X-HE-Tag: 1751987256-835575 X-HE-Meta: U2FsdGVkX1+lHWl8WaM5MJ8NS7Qa25NSsudtv2rX3StvaD2gs6h4y6EKizbBe99S4BC//UB35eVspOmTqYH9KMg/6w9JkUZDE0qDeIAMMCWl9f5nKD3HD2fn28/NEGC6wbPQMTQjltMVA3OFsvoMCiGpr0j2SVJ+PAbWoGJAJn2XaAptFA1uMuzwS/heQV2Mtr7ynpxJDKgeT2klC4Co6ijdfTIoez/WX4enxSCmVUSzIlaRPwBrEg4/7aTA6qYARAwPS+CU8EAv/BXj55RfgpqyVJT5DnH/2GCCrjPkdzGa94Gsp9DNoA3T+X41AmGHQuCgGW66UCpECvs06hUDb0gg4bXBNSaD2pxIVZZlPPJ2zmekghwiBN+ZYLL2jup0KCK/IeEKHWBvjcm6rNlV90pMtLXqLhUSDETxcSGxEJHk7WplJJuV398uIygaLURn5Xv04y93mBERBwpn/9QjEEL+k60FKT+cyYnnLwlwaDVpCYr4/mKToi+H5e6XCA2jZmVdrYNfjGcrmamh80muW2ltI6a/W2vNXBjItKAl7i3oWtqGzUYJfXE1+FDOnVSfitMlhd1/M4McVWXoAX2wkBeI7ZGnQWHIsrYqAq51YGhwi2USr49v04qlOKtDODDRCef9MkYAIaEsZPRhHg6h+FtThe6CCfe3EW6AtDIN1lm78XXsBURkM4nLhVwzLP+7YPt/guIHxmoi4WU3kR8MVkqb/xv6A2Hwk45jAl5wmr9oG9gQdoNZLr0yZKYVaYahRjsxkH8GpGL72Hbdrk1PzjAhdWWDHeywziL433O1jqM2Xuys3W7turPrYRY+dgbmrot/RcBoICZcmsMYk2w/NKKoNOXsuhjB5lK/sKfhmnvhFS+Qx9w+tqJ0oaizTbiZvEZrfatghoqnnx+2qq1taU3RkS9VI9mo5xknwrASGJcllDjW98xqn7/+3QpHRF6zryg9XD4oRlVKoFiKwGV i1b1FcSo FnfA7m9MuU9jB39PCN2y3l7QKEPVvTUwtiDALD5fxgHXSl1Y985BdGxHok4CPRHwk0Etg+fX0WRlM3H4Esj1Fd4INvFDTA8fgFi7CgirxXHEwMC+XHCr+7N0+L6MU+HFP/q9EAd4iZO3A1lxufBWHwT9ergVq5rEZPjmoStO7tsDYk4TLz8yfIbcvMNxxmOIylllagyrlsbzOxN952vyuZ8Lq+1Qqf0q/Bi7Wtuk/c8WorLhRFlUaFGRSbozMia+qmI2SQm/kd90DM7Og1Q+rh2BdHone4AfN7K0m2YIW4LQJhb3xnXB4Nu3x2811SXLAUQVFon/3kWWcg/dAhtFg7LBAzWtDcfEPuHr2N7xaiMCgVUn2scVBa5oSaoAKL2HEpGsx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 8, 2025 at 7:52=E2=80=AFAM Edgecombe, Rick P wrote: > > On Tue, 2025-07-08 at 07:20 -0700, Sean Christopherson wrote: > > > For TDX if we don't zero on conversion from private->shared we will b= e > > > dependent > > > on behavior of the CPU when reading memory with keyid 0, which was > > > previously > > > encrypted and has some protection bits set. I don't *think* the behav= ior is > > > architectural. So it might be prudent to either make it so, or zero i= t in > > > the > > > kernel in order to not make non-architectual behavior into userspace = ABI. > > > > Ya, by "vendor specific", I was also lumping in cases where the kernel = would > > need to zero memory in order to not end up with effectively undefined > > behavior. > > Yea, more of an answer to Vishal's question about if CC VMs need zeroing.= And > the answer is sort of yes, even though TDX doesn't require it. But we act= ually > don't want to zero memory when reclaiming memory. So TDX KVM code needs t= o know > that the operation is a to-shared conversion and not another type of priv= ate > zap. Like a callback from gmem, or maybe more simply a kernel internal fl= ag to > set in gmem such that it knows it should zero it. If the answer is that "always zero on private to shared conversions" for all CC VMs, then does the scheme outlined in [1] make sense for handling the private -> shared conversions? For pKVM, there can be a VM type check to avoid the zeroing during conversions and instead just zero on allocations. This allows delaying zeroing until the fault time for CC VMs and can be done in guest_memfd centrally. We will need more inputs from the SEV side for this discussion. [1] https://lore.kernel.org/lkml/CAGtprH-83EOz8rrUjE+O8m7nUDjt=3DTHyXx=3Dkf= ft1xQry65mtQg@mail.gmail.com/ > > > > > > Up the thread Vishal says we need to support operations that use in-p= lace > > > conversion (overloaded term now I think, btw). Why exactly is pKVM us= ing > > > private/shared conversion for this private data provisioning? > > > > Because it's literally converting memory from shared to private? And I= ICU, > > it's > > not a one-time provisioning, e.g. memory can go: > > > > shared =3D> fill =3D> private =3D> consume =3D> shared =3D> fill =3D>= private =3D> consume > > > > > Instead of a special provisioning operation like the others? (Xiaoyao= 's > > > suggestion) > > > > Are you referring to this suggestion? > > Yea, in general to make it a specific operation preserving operation. > > > > > : And maybe a new flag for KVM_GMEM_CONVERT_PRIVATE for user space to > > : explicitly request that the page range is converted to private and t= he > > : content needs to be retained. So that TDX can identify which case ne= eds > > : to call in-place TDH.PAGE.ADD. > > > > If so, I agree with that idea, e.g. add a PRESERVE flag or whatever. T= hat way > > userspace has explicit control over what happens to the data during > > conversion, > > and KVM can reject unsupported conversions, e.g. PRESERVE is only allow= ed for > > shared =3D> private and only for select VM types. > > Ok, we should POC how it works with TDX. I don't think we need a flag to preserve memory as I mentioned in [2]. IIUC= , 1) Conversions are always content-preserving for pKVM. 2) Shared to private conversions are always content-preserving for all VMs as far as guest_memfd is concerned. 3) Private to shared conversions are not content-preserving for CC VMs as far as guest_memfd is concerned, subject to more discussions. [2] https://lore.kernel.org/lkml/CAGtprH-Kzn2kOGZ4JuNtUT53Hugw64M-_XMmhz_gC= iDS6BAFtQ@mail.gmail.com/