From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C374C2D0CD for ; Wed, 21 May 2025 15:22:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9856E6B0085; Wed, 21 May 2025 11:22:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 935C26B0089; Wed, 21 May 2025 11:22:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8244F6B008A; Wed, 21 May 2025 11:22:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 637056B0085 for ; Wed, 21 May 2025 11:22:28 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 78C1080956 for ; Wed, 21 May 2025 15:22:27 +0000 (UTC) X-FDA: 83467281534.29.437476B Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) by imf23.hostedemail.com (Postfix) with ESMTP id 8BCD614000D for ; Wed, 21 May 2025 15:22:25 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DKljXyc3; spf=pass (imf23.hostedemail.com: domain of tabba@google.com designates 209.85.160.178 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747840945; a=rsa-sha256; cv=none; b=dOtP8x+uJ6XpIwo1rSCO01K9FHpvbhWECJZXu1NcmoiR+q4D4PqnNiVO0OViojq3zUPL6Y LWIw6GQFHmZ1obYTQ6gaYNTgnJyShK2L6X4CCHT3rtu1lzpIp29f7ggHONf2ysi5MqTGoo G0XFuC0F0kSzYFoELW78Xk5fnoepvf0= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DKljXyc3; spf=pass (imf23.hostedemail.com: domain of tabba@google.com designates 209.85.160.178 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747840945; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cyorVbV+bvd3pMFT0Pcn7XPfJZwPAhId89v8L031Ymo=; b=McW/OagtoZGUEtEPdEptRdDy+Q50JV8leVwhhmTf1ANXHekoiZHHfGxEBeIS+mLaoVI/Zl SptH4HWWbNncXfH/5ThTOnpNDF+5wZuGBD8yK4OYhnREqdnnOOOXmw5gIzge8TS1aaZyKt sKcfZRf/P2ZGYskUWE3/xP1kR+uUtko= Received: by mail-qt1-f178.google.com with SMTP id d75a77b69052e-48b7747f881so1488481cf.1 for ; Wed, 21 May 2025 08:22:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747840944; x=1748445744; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=cyorVbV+bvd3pMFT0Pcn7XPfJZwPAhId89v8L031Ymo=; b=DKljXyc3EUpXjKXLAzoPs9KQEjfmFWeJ7F2LuCDjzHgwIaJmGiitv3Ero2ooCE54jV uBVY0Dp3I40lcFf6p+OxfbmzWKQ0OtoEXOsjmOOrwyRSuKGjx6AX0DwUEBdYA2iIMeJ/ DWVcE8VScnIguZ1UvmsBq9jzpmmgnd6j+kdrIUZXfqXLc5yh+pg4sPaWpZAPm2G3pnSD igNnF2l132A8Hc6DTR4tHodgr2Ply8vDHW9+ErsRMAZmig6L0rS7R2CgtZva5Fp11r8b jFt5xGqb1Yr2dRx3i49tBHgVpnHf3U1MSbEpOxt3G8iF2H0gmvkMCrzzO7mNbPNHDEEO r7wQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747840944; x=1748445744; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cyorVbV+bvd3pMFT0Pcn7XPfJZwPAhId89v8L031Ymo=; b=tsmAC6qIqH5WS49YQZmOLzOlF31ypgB0arRquUAmkGj3mIRDL19w65xLBQAXXVnVeB mnm1rQpS1bvW8XZwJ610Cvcu9BMAinVgY4dx/FXdQfsYkQvnkiN9Yai0A3B0KMaMvHQX awKOHJa36+0C3GZy85ByPFbuZV7RmqTo3s4PiiA5OntKV/FsSuit4pjwAp7atJ25qAdj G6OB2wZzvb6n609WC50jzb2o/oVh3LsoFAxya3h5rgIsMSU+39pQ8PWfpQSwlZDmaQhL 2N/G8oUlM25aMqRBIJA5V8d1bNZ7Djt4IjSO7Ec4KHabqVJxz6nXXNO9UdXOnkXFwFDE dh8w== X-Forwarded-Encrypted: i=1; AJvYcCUabv1UmedStizNF/ZSqqcrg79mJZQmSAV/j4nRkzk4Lyg/sALdScR/lb/jMRMyHDE2tx86EF98Jw==@kvack.org X-Gm-Message-State: AOJu0YxD8bLlAsCsanvcgpYmI4mYq9Oc31BIs39Nit4EVuRKPbzDHcHA kDY+GICJ3D+7uquKhCZFqVsqyIE0PQilowJtEJbLDaYD5aFLA1CRGuKYnInLQ6zax4X/3zfDyR3 mDG/zlpnmmsfDQ31Oo0M7ajYcIcW3S9v/mWymGKrB X-Gm-Gg: ASbGncvjV2LsHgS+EjYUeMMGyP9mcYGI/JAljZJK0kKxqBmNOmI/KJ+Wxfchk7Zrf1r sGelDErpPtVNYiuFZWpPvPYBFT5SFSjkp04apBgezAPCeYzyuiq3I35CfZC9FqFXfCssJunkKll +foTY8EP2PJpdWK5ltvySb1ga2WyezDetuXoWjeUdulkChLt6/QKxDJw== X-Google-Smtp-Source: AGHT+IE9sWx0eO9CaSEERbSlz8WHSAH0nRVx3exSAXfCOfLTtTCtFAKZo3qwXtV1LJJtoEI1TnqjZGTpW4mH4zrZQIM= X-Received: by 2002:a05:622a:449:b0:48a:42fa:78fa with SMTP id d75a77b69052e-4958cd26812mr16408901cf.2.1747840944231; Wed, 21 May 2025 08:22:24 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Fuad Tabba Date: Wed, 21 May 2025 16:21:46 +0100 X-Gm-Features: AX0GCFunsX4s_S9rg-xVdpk--w8XHiF75FQSZUzqMS7j0X8kX-1qix4ZgGpOcAc Message-ID: Subject: Re: [RFC PATCH v2 04/51] KVM: guest_memfd: Introduce KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls To: Vishal Annapurve Cc: Ackerley Tng , kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 8BCD614000D X-Stat-Signature: me7d93fwczc5fc1r9x6rs9fqq4xxej9w X-HE-Tag: 1747840945-211393 X-HE-Meta: U2FsdGVkX19QYvONeJ+enuOwbZKKDXQXVcOHK7qUIS/LGbauCMaqQtnEX/oKw2W/xh0UTbL1Y/E1OJBeuJuM5TmDKWH4Gi4/SZz9dMF3YaM5WvXcEcbjNMPAZkJpciDhmDHViu0qwcIJJMM7kS4wzKd+8iBdClBCyknvNgeoORotcsnWi5LP6NCgoLx+kHSYiAPF9+vb9nKwROeEMbFrbaHD/4Qq2z4LsjJcj9z5ghc2fRrLVH/3eaHXapqrDOpI+oRuH06j3L8XG3jleLytx+Zky5fiEiZFuwcIX3BPSzABPjab7n6ErgyYevbnvw5LB90LZ4qGHXI3U4lU/27ZADbw2npJu5Qn1aGNlckUIxfHGlRBW9BjGqAcQqh+cqkSz52m4dZIhyJWXfww6aSMTjmPEGBGmDvTpM0rSdLJRN2rvJFGQR4grLmKQpRx6CdCxmWj3rkWA/M8ZA85+U/LqCvNosxuk4vkJ3EAcwHpF/EBb8b6jBZny8dnb9cUKa0lssyLMEZe2rbqvVVvXvTLm2+Hx5C9UzORHEImwASy+obyIZnznmqgfxFUBGMgF6c7iYCxnPg6eugQtkZVo0XFcrsbjxx7q3lFKduQ3xGzY45Mb9aGy88YKxJtvUV0oDHoH0c/fJOVJRFqHzeBWdRF37jZXBKZh9Opp70U6/f+l1yDW1vFys2lgbJkGKXIShhZy3eW70IbNKnrvsdgTCBpk9NlRQ7HW4HBaIpxnF6xodYAO2WYnqdWvt+omdSESG8TfsT3rbWet9E/wjMn2MSpxar46Yrr08KIKY7OHN9hip7DusvBwsTLxYbxEb2Ae/1pBTb4GsTCtKSTv/w0O00/fYhSVMkGnO6cuG4ifWyLBxlPvwfO+fWMfVeroreajvZv1FOIMjkXGRJWWfqTyrmOvZWYOQX3J0xtlBMKNCXOgp0NtNG6oRfUlyDJIpVxM/Utdit/fVlMEMSiID7lsqb npcni55K fo3Ke3Nb3WvNUB4aUCZEaGnlr+DGGOFZaaQI6UINbzrfAxSGkQHRBNJpgiVEOI8kbcgjrmfxv9YAm2PaO11pOK0FEWbShidGOvUg2WYj54sh3bb3ksCrIBVCckZMi7+7hSYvW28JuE/5yR7bATRzOSS0ATUTiHtZmZociqmzKfAlUVeDSyWjPt135gydij6ul2rGz4RrPKr9hrpV/gQ0UBc58Y0E8PmmSPbw7isruqYxJukdDWtU9Fo0Q9QyhXE39wPBkbJaeSgZP8g7XttQSnXBCYbIk+Gp2YzvEWDPofzrv9N/looRJLWC2Uzi8nJ6heOF3rgOHF8h/8uTfMbgnzLuN1g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Vishal, On Wed, 21 May 2025 at 15:42, Vishal Annapurve wrot= e: > > On Wed, May 21, 2025 at 5:36=E2=80=AFAM Fuad Tabba wro= te: > > .... > > > When rebooting, the memslots may not yet be bound to the guest_memfd, > > > but we want to reset the guest_memfd's to private. If we use > > > KVM_SET_MEMORY_ATTRIBUTES to convert, we'd be forced to first bind, t= hen > > > convert. If we had a direct ioctl, we don't have this restriction. > > > > > > If we do the conversion via vcpu_run() we would be forced to handle > > > conversions only with a vcpu_run() and only the guest can initiate a > > > conversion. > > > > > > On a guest boot for TDX, the memory is assumed to be private. If the = we > > > gave it memory set as shared, we'd just have a bunch of > > > KVM_EXIT_MEMORY_FAULTs that slow down boot. Hence on a guest reboot, = we > > > will want to reset the guest memory to private. > > > > > > We could say the firmware should reset memory to private on guest > > > reboot, but we can't force all guests to update firmware. > > > > Here is where I disagree. I do think that this is the CoCo guest's > > responsibility (and by guest I include its firmware) to fix its own > > state after a reboot. How would the host even know that a guest is > > rebooting if it's a CoCo guest? > > There are a bunch of complexities here, reboot sequence on x86 can be > triggered using multiple ways that I don't fully understand, but few > of them include reading/writing to "reset register" in MMIO/PCI config > space that are emulated by the host userspace directly. Host has to > know when the guest is shutting down to manage it's lifecycle. In that case, I think we need to fully understand these complexities before adding new IOCTLs. It could be that once we understand these issues, we find that we don't need these IOCTLs. It's hard to justify adding an IOCTL for something we don't understand. > x86 CoCo VM firmwares don't support warm/soft reboot and even if it > does in future, guest kernel can choose a different reboot mechanism. > So guest reboot needs to be emulated by always starting from scratch. > This sequence needs initial guest firmware payload to be installed > into private ranges of guest_memfd. > > > > > Either the host doesn't (or cannot even) know that the guest is > > rebooting, in which case I don't see how having an IOCTL would help. > > Host does know that the guest is rebooting. In that case, that (i.e., the host finding out that the guest is rebooting) could trigger the conversion back to private. No need for an IOCTL. > > Or somehow the host does know that, i.e., via a hypercall that > > indicates that. In which case, we could have it so that for that type > > of VM, we would reconvert its pages to private on a reboot. > > This possibly could be solved by resetting the ranges to private when > binding with a memslot of certain VM type. But then Google also has a > usecase to support intrahost migration where a live VM and associated > guest_memfd files are bound to new KVM VM and memslots. > > Otherwise, we need an additional contract between userspace/KVM to > intercept/handle guest_memfd range reset. Then this becomes a migration issue to be solved then, not a huge page support issue. If such IOCTLs are needed for migration, it's too early to add them now. Cheers, /fuad