From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C4D8C54E90 for ; Thu, 22 May 2025 14:52:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A13AF6B0083; Thu, 22 May 2025 10:52:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 99CF26B0088; Thu, 22 May 2025 10:52:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8656D6B0089; Thu, 22 May 2025 10:52:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 653CD6B0083 for ; Thu, 22 May 2025 10:52:47 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0101E161893 for ; Thu, 22 May 2025 14:52:46 +0000 (UTC) X-FDA: 83470835574.02.297638C Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf04.hostedemail.com (Postfix) with ESMTP id 26C9340003 for ; Thu, 22 May 2025 14:52:44 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=fxszEhZM; spf=pass (imf04.hostedemail.com: domain of 3OzovaAYKCLQmYUhdWaiiafY.Wigfchor-ggepUWe.ila@flex--seanjc.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3OzovaAYKCLQmYUhdWaiiafY.Wigfchor-ggepUWe.ila@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747925565; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BZ8YmHyMaz5fBM8/QXNAl/DrlvoX+XAfi4wswaLDq3M=; b=0FkoIWJHZN6wEZjR9Il6dnASsc4RT1c2Biy3cgKba2w71vvl5B5gW4Sj6kkdCL3Qpk3yQl WCQj8D0beEG2ga5Tc77ajOty/Y0OtJ+18LaXouKHojgIMuOgVPSPDXQSPN+jWMZY+Gc+VF HFwvSxRYtpZl6B1dLF0mvTI0s9DSaxI= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=fxszEhZM; spf=pass (imf04.hostedemail.com: domain of 3OzovaAYKCLQmYUhdWaiiafY.Wigfchor-ggepUWe.ila@flex--seanjc.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3OzovaAYKCLQmYUhdWaiiafY.Wigfchor-ggepUWe.ila@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747925565; a=rsa-sha256; cv=none; b=IKK5d1GsgIlHNaOfj4rciFp4uiTliAiKyGhzFDFaqkxskDr1NHU6k2RdX8vgugJaCpSeEi yjRuEIk/RF+ViEOWDtM1ElSbnKPs7Z6sxOTHGgR87qEoL2qrlFVUoBEeSMwH65qgqmXWOX v1usClg0yDU9VWhSaHVHG4IQdGqvD8E= Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30e9e81d4b0so7587238a91.3 for ; Thu, 22 May 2025 07:52:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747925564; x=1748530364; darn=kvack.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=BZ8YmHyMaz5fBM8/QXNAl/DrlvoX+XAfi4wswaLDq3M=; b=fxszEhZMaO+w27szP2oTR3ZscZKouanVNsqlZ2StuPqcSLIWYH6IMYO2Nq7ZD5cdZo Xideo4YV1lvKas6lc7UNiPbdxA5X05icjk+zPjB7822afR5Su1jDyc4zkkBp0SnG/rqu +qdL0FCk5V5y1s3zvV+yd6i0jwRH8UaNdsHDtXnI4hJUD0nhPRTd5Wp7njr7oBql3eEP eb9NY5yBBXWeu9a+EJrufOkXtzPKvSg4CEXvJPALQgD2ePbtXV750Oz4Va7dlDtkWivH HwEN0oJzPIoXZ48UdqQNOadiulD5eJo2TVdhoKczJzJYephdpM3WiLKKszxLFZqoJssi nU/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747925564; x=1748530364; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=BZ8YmHyMaz5fBM8/QXNAl/DrlvoX+XAfi4wswaLDq3M=; b=GYH4ULawcDy7dDHt9aCQRKn73FZ1IBKts24GUaIxCdhJJ2QxLBdcSwfYf+37ZAlahc 0vmHHoghaQjHsmJwakGDVOxQoNkxU/9rovebGvlYyzlCTLW0RcW3VkqbqeVtyDl39NPJ RCDAbrYW+d6vutfmR9JOdsxCxVMdszdEUkQlabuDbXk75yG2OlbhrFFX0ObE/cSY8n+4 AGgVviw4PI/1zy9Vp6Pg4hscKA8PdbyJYu8CqENIa9LYaZw8rd0HTnyu8b8rB1typ43P 5cy1ZAJfPOxnZ2CKy/egbKO9rM838xXBkUxqciqMSWEK/fmcwwvZxVbNy/tvyioU7mLR J3Hg== X-Forwarded-Encrypted: i=1; AJvYcCWBLSWRZpN3k44+H347SJnB4kanCREKtZf6Uew8QoTyQsuYOa42hTaucnc/vAVv7ivlyvRost+L+w==@kvack.org X-Gm-Message-State: AOJu0Yw81FjzO8H3tegS3/W8EcdhdpIyWJOm6thxMJGIPwDqKkXvTscv Z6s0xShDh52MCfIqegQy5KbfPWtneNR08gyb9pvenbyRGsoHf9jJ262ZIAgHcNSd0ZCgKdqnsV+ 2EcGgbg== X-Google-Smtp-Source: AGHT+IHmTGrJIsuX8DX7FPF6xj7anm5a8QzgcrjE2b6L+vmsQ7gy3otN9k2Cq927P88vBTOc0bfdhLfFkXo= X-Received: from pjbnt17.prod.google.com ([2002:a17:90b:2491:b0:310:89d3:b3dd]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:1dc2:b0:2ee:b2e6:4276 with SMTP id 98e67ed59e1d1-30e8322593dmr36667692a91.27.1747925563833; Thu, 22 May 2025 07:52:43 -0700 (PDT) Date: Thu, 22 May 2025 07:52:42 -0700 In-Reply-To: Mime-Version: 1.0 References: Message-ID: Subject: Re: [RFC PATCH v2 04/51] KVM: guest_memfd: Introduce KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls From: Sean Christopherson To: Fuad Tabba Cc: Vishal Annapurve , Ackerley Tng , kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 26C9340003 X-Stat-Signature: 4sqriew8ccmkdpdp91fu4hyzf44a1amh X-Rspam-User: X-Rspamd-Server: rspam07 X-HE-Tag: 1747925564-851691 X-HE-Meta: U2FsdGVkX18vt8ZYg0ct5FFuyGld/p2+hl7Xt2UsDbDIipfDbUHHkLcZx/v0iFgDdaohv3zc+6D5laelsPrPLduWA8A8RfVJKPK2fA7Zw+Bx6UXAlqY1r3No6fDt9gxXMe0QciCT/jx2J28EPkqwvnTAj8oqIB1ZqROrQRK50Wql1FfNUmf7zcWFmpmcrq4XSFlD7hrMEqb76rBUT0mfnFxHPL73pQalyTSm0wO6z7y0UKnybt4aySophwvjt/Sgwt6eNO1t76B2ja7CRWSOhYL8AbqFj6lDAnmH8FHY1DDuTplN6kxp1oGqy4ec97VTV9c6FiG4L/E2wTI9YStNxUSB/8eZVhiIDjFBzcRgIPogC8VyWp3jOPCE8Qj06is3MZNI5Xc3CwpvllbNst6jwBoDqoHQoZz7GXQHFXosaPFcX2HgHGeOS3fijAAdzOOZiVcCMMCs3u7HHOetC6+Kqd249tMzt5RWztqZxnkM7yrGhAKrVIFBDmXskWMlPeSZQUvV5QBC/AjM+TSBJZSBGwelN02YJ66mf1LPlYg+X3Kj3y9v2re8149FSZbp1WQ5Z/L8pNdVh2Qg4XWj3heYwdi4bgjfCZO5OOsKNlLe3xaCUkA7OfS0eAg4QV9H4WwL0MDVtoY/AVRGhGuETrk8YTnHi1dx4Pu+fJkH+2wQTiSw6Wm6hH7ev78/Juxcw+oLAGpZgVayPDgwy1BQwNEpvGvGwuu0AN7jMpvWkPzovgjwFDR+tXF/Ozyvaj0GFF+FSRr2RSwprhTJWaykm8rcdadbWvmpV4JLX4q9no9A0XdLQ4feaa459Y5qjpSK2RaEYBTpgtr9l5Di6izlxcOHidrvXI4akCHx9Eu2uOg1dRoDMI52xfFHkzjmP4RrAd5lVOTrGLbHAUaMR4TI44vLQPhcdtw+N+7g0Sl3eo+/XMeMESIK+G2CA0ghCy5I9baag+/g+4KIZNDUXbYUOyi UfEv/Hcf 5qKZQkd8n/mfa/b6O+srMrfv/aRYo4t3pjR45FyQqdTZ18SozRUOHpPl7JucWMt3NzN+5nuS+UjGTj1Ano4yDWy74EMlMTjWkYEG4mBmVk6rURxmlJBNsAbrQoXx8l/KfgEa8scuJ73aGetV8rwPvc+Pct30EvKdEDT427pM0CANj5hqegGeoLS15G0Jeq+6SsG5oqjVcxZIIAAmq3fLxMFcanPvPGu/qskOhGuwBKaczEDPAtJJI+rVcoeG5+gdqCmNLYOBRdH72d9nqin1eeA+T/70fFDqbZlebDySxvlciOR45pDFsJHuZHjWbP/TVk2WZlBVf25LlNyIu7/tL6AaBnJGfpU5bvw20wW1kIpHf7dQeKj7hFVquPmCHRio+8Xj8N1brpmp9YB6HC3kotxk5d4g3J+jONRh+3J8mWk/49nWyS4uFUjIx+3PRv5IJLP1zzgZsqPsl0uEQF/q4cTyQXJ7W6MY4PH3hFmfZ7CIoeP31Tb98yTbJ+fbpcJgQDAOdphmcDqvGuGUKaLq2R0R1RpyEBlwIB9ulg9EcA0znIfolFqZPdXTtmTnDb2nq3tc+ym132RPt8RWaTaiipNTnuU4PP2EJnl7tu3UVHBCNk7Xvu7MKJzFUUR7tqRvPO9s6MtrsFOCQgmJunIzMGl7lsAJJaWvX+JkSPIdQJxMlse7ehZw2lZdBf2gTGX1PNhIO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, May 21, 2025, Fuad Tabba wrote: > On Wed, 21 May 2025 at 16:51, Vishal Annapurve wr= ote: > > On Wed, May 21, 2025 at 8:22=E2=80=AFAM Fuad Tabba w= rote: > > > On Wed, 21 May 2025 at 15:42, Vishal Annapurve wrote: > > > > On Wed, May 21, 2025 at 5:36=E2=80=AFAM Fuad Tabba wrote: > > > > There are a bunch of complexities here, reboot sequence on x86 can = be > > > > triggered using multiple ways that I don't fully understand, but fe= w > > > > of them include reading/writing to "reset register" in MMIO/PCI con= fig > > > > space that are emulated by the host userspace directly. Host has to > > > > know when the guest is shutting down to manage it's lifecycle. > > > > > > In that case, I think we need to fully understand these complexities > > > before adding new IOCTLs. It could be that once we understand these > > > issues, we find that we don't need these IOCTLs. It's hard to justify > > > adding an IOCTL for something we don't understand. > > > > > > > I don't understand all the ways x86 guest can trigger reboot but I do > > know that x86 CoCo linux guest kernel triggers reset using MMIO/PCI > > config register write that is emulated by host userspace. > > > > > > x86 CoCo VM firmwares don't support warm/soft reboot and even if it > > > > does in future, guest kernel can choose a different reboot mechanis= m. > > > > So guest reboot needs to be emulated by always starting from scratc= h. > > > > This sequence needs initial guest firmware payload to be installed > > > > into private ranges of guest_memfd. > > > > > > > > > > > > > > Either the host doesn't (or cannot even) know that the guest is > > > > > rebooting, in which case I don't see how having an IOCTL would he= lp. > > > > > > > > Host does know that the guest is rebooting. > > > > > > In that case, that (i.e., the host finding out that the guest is > > > rebooting) could trigger the conversion back to private. No need for = an > > > IOCTL. > > > > In the reboot scenarios, it's the host userspace finding out that the g= uest > > kernel wants to reboot. >=20 > How does the host userspace find that out? If the host userspace is capab= le > of finding that out, then surely KVM is also capable of finding out the s= ame. Nope, not on x86. Well, not without userspace invoking a new ioctl, which = would defeat the purpose of adding these ioctls. KVM is only responsible for emulating/virtualizing the "CPU". The chipset,= e.g. the PCI config space, is fully owned by userspace. KVM doesn't even know w= hether or not PCI exists for the VM. And reboot may be emulated by simply creatin= g a new KVM instance, i.e. even if KVM was somehow aware of the reboot request,= the change in state would happen in an entirely new struct kvm. That said, Vishal and Ackerley, this patch is a bit lacking on the document= ation front. The changelog asserts that: A guest_memfd ioctl is used because shareability is a property of the mem= ory, and this property should be modifiable independently of the attached stru= ct kvm but then follows with a very weak and IMO largely irrelevant justification = of: This allows shareability to be modified even if the memory is not yet bou= nd using memslots. Allowing userspace to change shareability without memslots is one relativel= y minor flow in one very specific use case. The real justification for these ioctls is that fundamentally, shareability= for in-place conversions is a property of a guest_memfd instance and not a stru= ct kvm instance, and so needs to owned by guest_memfd. I.e. focus on justifying the change from a design and conceptual perspectiv= e, not from a mechanical perspective of a flow that likely's somewhat unique t= o our specific environment. Y'all are getting deep into the weeds on a random as= pect of x86 platform architecture, instead of focusing on the overall design. The other issue that's likely making this more confusing than it needs to b= e is that this series is actually two completely different series bundled into o= ne, with very little explanation. Moving shared vs. private ownership into guest_memfd isn't a requirement for 1GiB support, it's a requirement for in= -place shared/private conversion in guest_memfd. For the current guest_memfd implementation, shared vs. private is tracked i= n the VM via memory attributes, because a guest_memfd instance is *only* private.= I.e. shared vs. private is a property of the VM, not of the guest_memfd instance= . But when in-place conversion support comes along, ownership of that particular attribute needs to shift to the guest_memfd instance. I know I gave feedback on earlier posting about there being too series flyi= ng around, but shoving two distinct concepts into a single series is not the a= nswer. My complaints about too much noise wasn't that there were multiple series, = it was that there was very little coordination and lots of chaos. If you split this series in two, which should be trivial since you've alrea= dy organized the patches as a split, then sans the selftests (thank you for th= ose!), in-place conversion support will be its own (much smaller!) series that can= focus on that specific aspect of the design, and can provide a cover letter that expounds on the design goals and uAPI. KVM: guest_memfd: Add CAP KVM_CAP_GMEM_CONVERSION KVM: Query guest_memfd for private/shared status KVM: guest_memfd: Skip LRU for guest_memfd folios KVM: guest_memfd: Introduce KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls KVM: guest_memfd: Introduce and use shareability to guard faulting KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymou= s inodes And then you can post the 1GiB series separately. So long as you provide p= ointers to dependencies along with a link to a repo+branch with the kitchen sink, I= won't complain about things being too chaotic :-)