From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 507A4C83F09 for ; Thu, 10 Jul 2025 03:39:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E51AE8D0002; Wed, 9 Jul 2025 23:39:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E02DB8D0001; Wed, 9 Jul 2025 23:39:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC9D28D0002; Wed, 9 Jul 2025 23:39:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B897A8D0001 for ; Wed, 9 Jul 2025 23:39:53 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 87F16160410 for ; Thu, 10 Jul 2025 03:39:53 +0000 (UTC) X-FDA: 83646951066.08.5EF5C0B Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf18.hostedemail.com (Postfix) with ESMTP id 99D7E1C000B for ; Thu, 10 Jul 2025 03:39:51 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=YSpJ0Sab; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=vannapurve@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752118791; a=rsa-sha256; cv=none; b=i1DfEridpqm2I+ZwgLAoGorqISmRA5SJa+VKT46mUUBFCBijwEBe9C9I5KglQW5J/Gj0+Y 06Cvcd1dNEit0Z8zspGBSrEY883UcBzqXZFvhzLh3oaWbDIanI6rPM8UZEtS3MmbhDkNvD SWdxm6lne7mSeJ1RG9M0g7KIWKy1eN4= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=YSpJ0Sab; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=vannapurve@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752118791; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tRlQ79ljXXr6mSwltdgFEe59EkK6pS6NPipORQRa7q4=; b=p9EPzab2BZxSaWT87jWEriueVGgGyAq85Oc0YHClximONsUeWsm7viLDUSf8e58Tk8SHBG noWJAbcMRq3V+iCzKYh7PzRqOuI2liATUE1J68268y1LStNNXA4PohMXZpdY561woyAaSa 5D1E1iCVJbkKGUGLIGNadTcTq2iNjt4= Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-2357c61cda7so61075ad.1 for ; Wed, 09 Jul 2025 20:39:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752118790; x=1752723590; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=tRlQ79ljXXr6mSwltdgFEe59EkK6pS6NPipORQRa7q4=; b=YSpJ0Sabk9VU/szKoCgZxkzbviouBLv8Ht1FO94VmEFX3hxcWuRtMhd30T3mAXRDxA 2fMQc8m9w8zJKY5Jel5HYG+o9RN9OMCtwEtfogNRe1jRH5j5v5O9P4HiNaSbhTwPqDbH mgf/29/0hVJqCUDV+lj7PVRUemQaTgbtqDO1uaf5iforU2trmfatrEHnkBMzof3uVWeq MA4QhUbfuQUAaeMFjARmEwKdNKt4ZhER7yIWAFvEmNd0kgcvldVL08uMWPA2Z2ckx50F I71T5YFEsqctFvbpK62D5dShGTWehlDT+BWVvi9i2LrBuSLRzcb3l5NJGGq15+5EBu4Q Bqog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752118790; x=1752723590; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tRlQ79ljXXr6mSwltdgFEe59EkK6pS6NPipORQRa7q4=; b=I+FCTbhYIIldws7Q4rhhzl/6dJFh/cJ8xbGNHaUPbyFoTXKMrGRu6iYZRC/k1YITPw KJvqxtSqWVwROZys3/LnbkY1V8NYPt8gt6ZbxOMImjQztMFsphFV4h8UU+DcItGErB5r H3xKC9opUSHdQTVRdI83XWzQhdvHhId0wEBnVvYKMxvXhhPIfOw3a3cowK9VEEa6T+MN 4PP9kadUy+tLTBoXXSO8c7RPCZO884nz/AnfMjQ/UueUIDx3kdGhiH9PdnT9t4SjgIZK 4pT7qgsPZQtyGtyeQs1ioDcKhWhTYPUaz3sT8FbIKUayMEn5LMbKjgUDDkq/xNhcVgIq QAYQ== X-Forwarded-Encrypted: i=1; AJvYcCXvQKeXNhQQZ1EoJckwoVSe0W4gl1spq1v8d/uL6UYevG+fDI1FSKW0F9HnquHR4CUUj8lw9RnZTg==@kvack.org X-Gm-Message-State: AOJu0YwpB0VkU2R/gFRUyXO0vfGvBYf1FNpDyTosuSGDEzaN0B1porVf QoyJUAq9tv53IKzAniGQg48E2RQdx6utF6txOJL29ipwZ4RYL0sDcrkn6Z5tmyDW1GHKW/HBFLK b0gtEECgVnSoBQrRvF7j+aRRaY2eQxcL4hiNTiItO X-Gm-Gg: ASbGncsioFlSDlpV+2beqDR9f4xN75EngOAUQ/KgVYD7KjuYh+/q5BEyl7q17r1FwNK VXbkWocorZyIBhI+bxZf4cdPJzcCRWCnCOIj5QYnbLUtb4/skWEtvou77Zol/CmZadjfNLii/hz z5j1RoPRhcUgzsDDQs64fdlps49n/Kp9s0dcwDSLbxhN8Z81UFhKnaiI9tWHcnskQQVgSCYXYpn w== X-Google-Smtp-Source: AGHT+IFNdXMLLMHzYyYMCLYInTnZ0FqKDLOyiXEeczMGTqoYZAimdGifr5hKh7FXeaefcn7hZsZlz0Q2uZ9bsaSqKBg= X-Received: by 2002:a17:902:cec8:b0:231:d0ef:e8ff with SMTP id d9443c01a7336-23de372b919mr1790925ad.8.1752118789639; Wed, 09 Jul 2025 20:39:49 -0700 (PDT) MIME-Version: 1.0 References: <006899ccedf93f45082390460620753090c01914.camel@intel.com> <5decd42b3239d665d5e6c5c23e58c16c86488ca8.camel@intel.com> In-Reply-To: From: Vishal Annapurve Date: Wed, 9 Jul 2025 20:39:36 -0700 X-Gm-Features: Ac12FXzLm-0HLY-D-VuCf-_kpUdqDwZmo7Dl0JNusIPodOTsjQNE_6ODLGzw0B4 Message-ID: Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd To: "Edgecombe, Rick P" Cc: "seanjc@google.com" , "pvorel@suse.cz" , "kvm@vger.kernel.org" , "catalin.marinas@arm.com" , "Miao, Jun" , "palmer@dabbelt.com" , "pdurrant@amazon.co.uk" , "vbabka@suse.cz" , "peterx@redhat.com" , "x86@kernel.org" , "amoorthy@google.com" , "tabba@google.com" , "maz@kernel.org" , "quic_svaddagi@quicinc.com" , "vkuznets@redhat.com" , "anthony.yznaga@oracle.com" , "jack@suse.cz" , "mail@maciej.szmigiero.name" , "quic_eberman@quicinc.com" , "Wang, Wei W" , "keirf@google.com" , "Wieczor-Retman, Maciej" , "Zhao, Yan Y" , "ajones@ventanamicro.com" , "willy@infradead.org" , "paul.walmsley@sifive.com" , "Hansen, Dave" , "aik@amd.com" , "usama.arif@bytedance.com" , "quic_mnalajal@quicinc.com" , "fvdl@google.com" , "rppt@kernel.org" , "quic_cvanscha@quicinc.com" , "nsaenz@amazon.es" , "anup@brainfault.org" , "thomas.lendacky@amd.com" , "linux-kernel@vger.kernel.org" , "mic@digikod.net" , "oliver.upton@linux.dev" , "Du, Fan" , "akpm@linux-foundation.org" , "steven.price@arm.com" , "binbin.wu@linux.intel.com" , "muchun.song@linux.dev" , "Li, Zhiquan1" , "rientjes@google.com" , "Aktas, Erdem" , "mpe@ellerman.id.au" , "david@redhat.com" , "jgg@ziepe.ca" , "hughd@google.com" , "jhubbard@nvidia.com" , "Xu, Haibo1" , "Yamahata, Isaku" , "jthoughton@google.com" , "steven.sistare@oracle.com" , "quic_pheragu@quicinc.com" , "jarkko@kernel.org" , "Shutemov, Kirill" , "chenhuacai@kernel.org" , "Huang, Kai" , "shuah@kernel.org" , "bfoster@redhat.com" , "dwmw@amazon.co.uk" , "Peng, Chao P" , "pankaj.gupta@amd.com" , "Graf, Alexander" , "nikunj@amd.com" , "viro@zeniv.linux.org.uk" , "pbonzini@redhat.com" , "yuzenghui@huawei.com" , "jroedel@suse.de" , "suzuki.poulose@arm.com" , "jgowans@amazon.com" , "Xu, Yilun" , "liam.merwick@oracle.com" , "michael.roth@amd.com" , "quic_tsoni@quicinc.com" , "Li, Xiaoyao" , "aou@eecs.berkeley.edu" , "Weiny, Ira" , "richard.weiyang@gmail.com" , "kent.overstreet@linux.dev" , "qperret@google.com" , "dmatlack@google.com" , "james.morse@arm.com" , "brauner@kernel.org" , "linux-fsdevel@vger.kernel.org" , "ackerleytng@google.com" , "pgonda@google.com" , "quic_pderrin@quicinc.com" , "roypat@amazon.co.uk" , "hch@infradead.org" , "will@kernel.org" , "linux-mm@kvack.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam11 X-Rspam-User: X-Stat-Signature: q8jojx3imfs6ibmdbw5awynyfznmcue5 X-Rspamd-Queue-Id: 99D7E1C000B X-HE-Tag: 1752118791-208574 X-HE-Meta: U2FsdGVkX1/jVxS7xfWiwRsOFt+byxSxK34OJUCzrMiw5OfjV5OV18bW7twf4/4zkx5datOQ/2MccEfJbu/o97Dla6kAoU5HE61eAOPpc4sV6yxk28w+NZFviqc8X3yZxMrtvTIuwMAR02Fi8WXlTKPbY0UfMHMxDA5IB28rIRWMv9+tAsthOMvGAe7MadIW/5DySWDSSP80KsXhM/KJelrIIBzB6Q3iDowbHw+0CxjLEO+aH06IgNL9AGkJPGeIHe2Gf63WAgQ1uJDf00JuSVEHgnUZst66ez9cWkLMYPXy+aQiCzrP6pw0DanxonLhh3wpQegn0sPhyki8K7QaAKS59CZT+FXOO0IU0+tgbiCJmpxjyaHDlLM/KHqIyWqOsDOGHoyl0O1Z9oUSf5Qkw59Qam6MEuwO24GiUDxHpTC7shKaAQ7jfEcuYS+HtFtpTZlL6kc61oDpx7prCQqPCUWA7owzCxZLxMlY0vyHCNH4rq4/4UArHpF7AM5kwmr+pPFZVmY+LeT70ponPVrkqDdH8KKHzGkkHbGJqlkbKBo98J82vGuXx618sNdtg8TvatcCs/DXaoPdJPAqPQlvvqY+usj+8lC/EeRl1z7KSpuLraVda+yTW93z1j8qfDMeUaQ/NVqfGR/GnA1Iv7B+01MOHZqpxpxgzSBzbZjGrOqys5PWjZ8CUmRVcr9bALarCZcGRWc9az9kB/n3LKiPFrkNn4zdX3KJOdDLAELltzl+GLF8bCocyZrfCPvYKPfyQTw5T8HUyklU4gkv5ymv91i6klM6jaFE5JK2j2xqXA+zyat0bdWDYVUzKRJHvZo3hzDm6HITZDdRq1EvvBj8aDtwjGb6tfF7bVtZwnah71jNsPSEBJ2VlqtTZsz2WYIPLLL0eMfkippk8/kNz4yujpfImm3okCmHktnK+yx/+XmPHLOJc+GPBsGL5W7SgpGgqjTmxoRFm2zSw3hzjqz AiNSwSWY kOVMJq4Yv9ztHf2tJ8z6NBPNdUv1xZwD9LkIkXX1nrJzW5r3aKU4TL+FaoFa1gd5tQi9W7PWyryYQMOCrgYYpnTFcdVABQYbjzVSHJEgC+lt7TerwKRKBWklJxX/ColWpoolJ5DOwjjq1FA0M2EK+c3tjdCzge6Ih6ikK8DjYIXKi51TU8MkhJvEnUVswBGcwLtkuWXYoOpggfft91TPhWJD5JvIRb+bTWBE7CVk5+WepVmW5/3HD6AE876Lr6xux+Zo5WpYIf7B9Yu48RbsjHPEd5QVfdVp9ZiKjtt9W14mIUuz0++af08a2aKdFbG7IedQbsFUOeWFKjdtXnjj4li8dSRxFrvl2YkXuX3MgwjM4pBdbItumRKzS+A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 9, 2025 at 8:17=E2=80=AFAM Edgecombe, Rick P wrote: > > On Wed, 2025-07-09 at 07:28 -0700, Vishal Annapurve wrote: > > I think we can simplify the role of guest_memfd in line with discussion= [1]: > > 1) guest_memfd is a memory provider for userspace, KVM, IOMMU. > > - It allows fallocate to populate/deallocate memory > > 2) guest_memfd supports the notion of private/shared faults. > > 3) guest_memfd supports memory access control: > > - It allows shared faults from userspace, KVM, IOMMU > > - It allows private faults from KVM, IOMMU > > 4) guest_memfd supports changing access control on its ranges between > > shared/private. > > - It notifies the users to invalidate their mappings for the > > ranges getting converted/truncated. > > KVM needs to know if a GFN is private/shared. I think it is also intended= to now > be a repository for this information, right? Besides invalidations, it ne= eds to > be queryable. Yeah, that interface can be added as well. Though, if possible KVM can just directly pass the fault type to guest_memfd and it can return an error if the fault type doesn't match the permission. Additionally KVM does query the mapping order for a certain pfn/gfn which will need to be supported as well. > > > > > Responsibilities that ideally should not be taken up by guest_memfd: > > 1) guest_memfd can not initiate pre-faulting on behalf of it's users. > > 2) guest_memfd should not be directly communicating with the > > underlying architecture layers. > > - All communication should go via KVM/IOMMU. > > Maybe stronger, there should be generic gmem behaviors. Not any special > if (vm_type =3D=3D tdx) type logic. > > > 3) KVM should ideally associate the lifetime of backing > > pagetables/protection tables/RMP tables with the lifetime of the > > binding of memslots with guest_memfd. > > - Today KVM SNP logic ties RMP table entry lifetimes with how > > long the folios are mapped in guest_memfd, which I think should be > > revisited. > > I don't understand the problem. KVM needs to respond to user accessible > invalidations, but how long it keeps other resources around could be usef= ul for > various optimizations. Like deferring work to a work queue or something. I don't think it could be deferred to a work queue as the RMP table entries will need to be removed synchronously once the last reference on the guest_memfd drops, unless memory itself is kept around after filemap eviction. I can see benefits of this approach for handling scenarios like intrahost-migration. > > I think it would help to just target the ackerly series goals. We should = get > that code into shape and this kind of stuff will fall out of it. > > > > > Some very early thoughts on how guest_memfd could be laid out for the l= ong term: > > 1) guest_memfd code ideally should be built-in to the kernel. > > 2) guest_memfd instances should still be created using KVM IOCTLs that > > carry specific capabilities/restrictions for its users based on the > > backing VM/arch. > > 3) Any outgoing communication from guest_memfd to it's users like > > userspace/KVM/IOMMU should be via notifiers to invalidate similar to > > how MMU notifiers work. > > 4) KVM and IOMMU can implement intermediate layers to handle > > interaction with guest_memfd. > > - e.g. there could be a layer within kvm that handles: > > - creating guest_memfd files and associating a > > kvm_gmem_context with those files. > > - memslot binding > > - kvm_gmem_context will be used to bind kvm > > memslots with the context ranges. > > - invalidate notifier handling > > - kvm_gmem_context will be used to intercept > > guest_memfd callbacks and > > translate them to the right GPA ranges. > > - linking > > - kvm_gmem_context can be linked to different > > KVM instances. > > We can probably look at the code to decide these. > Agree.