From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3DE1728152B for ; Thu, 10 Jul 2025 03:39:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752118792; cv=none; b=uicfvkDs4ZIUC/IUQuxkVlW//MkB3JriOyN+H1mfa3dDXjPd1X43FAgHlLZ+OZff8duOY/BIdQGbX6GHJ+zjwQYL5TW7BQYveqDDbtESeGcmDXkp2Zap9jsfbMJYaaNf6gTCm19rMy9rKuOWIOvFBEbdr3LS5pVRR7pxHxve+NQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752118792; c=relaxed/simple; bh=ltuIyEGsYKAVkUC8bI6NbuIrfZ6n2CS6dG1w/GUIO1w=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=Ij4QuHQbpiwVm9cOjtyi+UXLvsfRYeQwwmljoKzWtavs0HNLuX/5PDFydrPYfZ9DpqoocHWOTmJR9WRaTlS9psB2crvT5I4RUmPQjheddyooLbfptPoxRJmEVvxcazB5yCXteR6ZZowT2lj324bcwOH+tkHISmftnJu7ElNfMY4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=PK4EnXIu; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PK4EnXIu" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-235e389599fso118705ad.0 for ; Wed, 09 Jul 2025 20:39:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752118790; x=1752723590; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=tRlQ79ljXXr6mSwltdgFEe59EkK6pS6NPipORQRa7q4=; b=PK4EnXIugTaCkLILbQR23ljaZ70cbRK7yDJLqBV25pv2DGAsklGLRirjxi/b96naEm zVck4NAy1lbGXakYhfKMgW9KGPXBrXtw3Qv3/kQ8S2m05KPj6UNQ65Tc6rexKkUrqvvg cK/VkeoOM2z8g2NnDhH1YM2HvvOMbged0AJf+0BudW6Ysfs2+Cj1BhzZ47J+JbuuAu/w 8DstFQdjcFsU/SAoZ5YI4NdWXohacMYH3v+jz9PSPd/+odSN+jgolOKNXewAcZf0vJ8h o6IFgrZ/T2xOoi9wA+qDO2etf9fpuVdT26kBKrW24nQotsSsu6t6br89CjYF/om/ThGI NeYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752118790; x=1752723590; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tRlQ79ljXXr6mSwltdgFEe59EkK6pS6NPipORQRa7q4=; b=M0KJr+obfZvzOoVZyO5FZTD8KmBPLkq0kssaNLnxEXY1T8nLPvRoi+zfGQdiG4lmV3 9+KVoa7g9vQbi/0AYG9ZKdk2g1usCYLStJ0DyVqD+MpD4dLwZZ9Ghn2KnuiuBznW8O2f I/6c2Gz9C/FWFDnsZhuejdtrxgBrJrEj4LkOy3XXm50ToBhelu1K2ttQH2L+urtNIw3n DICHn1w/DmRwfY0ZBffsiCOnSpk4o1XdVOQJyQycXMHhKj9pd4dgf7RZl+JaxTkcIam/ LkXFBEfs31dbXm0svXHbiWggq7yzWPb921zwQ6QADxAZnXeflFa2fCFvVfPOQk8QXow/ s0Kg== X-Forwarded-Encrypted: i=1; AJvYcCW+/3wDX9dRSkPydwrzFDBGeA6ueiySE3dAU4uexQOy11gpWejAxhq8Sx8Pdxri0XfMXrbZC9XUMHMR8io=@vger.kernel.org X-Gm-Message-State: AOJu0YxTiRqe/CIvPQ3GoAnAYAjT4+icm+971LMImSGbkwmsRR+7LEjq eGSQStS+iqXSHBS9D6qQfGzR69kioyNPhJuKaqY4GdoXVStLpkLjq/D+fy+0Zy4AoOSgLXwbYp1 o4oL/5aIS42E14odev2NjqWDM/QC3wfOFbewrdtM/ X-Gm-Gg: ASbGncuhrfZpgDtrcfqZFu6qRJ3lx9fWJOquJOkMRZR8MCyeDKBJ+U+GyutL9XueVnC vI5fW0ySoXZq6WjZsrhErkK0kbY/yeEUbMd2H3WNE/g6Ln43qU35jKGwC8bamt4sQkeZh3PR7eS eXHct6Go7sZDQANdR89iTed9RDyD1I4sedNs0Ot3GXDT1RtRleB6dtciQGP+mHF98cW9HX7txSb A== X-Google-Smtp-Source: AGHT+IFNdXMLLMHzYyYMCLYInTnZ0FqKDLOyiXEeczMGTqoYZAimdGifr5hKh7FXeaefcn7hZsZlz0Q2uZ9bsaSqKBg= X-Received: by 2002:a17:902:cec8:b0:231:d0ef:e8ff with SMTP id d9443c01a7336-23de372b919mr1790925ad.8.1752118789639; Wed, 09 Jul 2025 20:39:49 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <006899ccedf93f45082390460620753090c01914.camel@intel.com> <5decd42b3239d665d5e6c5c23e58c16c86488ca8.camel@intel.com> In-Reply-To: From: Vishal Annapurve Date: Wed, 9 Jul 2025 20:39:36 -0700 X-Gm-Features: Ac12FXzLm-0HLY-D-VuCf-_kpUdqDwZmo7Dl0JNusIPodOTsjQNE_6ODLGzw0B4 Message-ID: Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd To: "Edgecombe, Rick P" Cc: "seanjc@google.com" , "pvorel@suse.cz" , "kvm@vger.kernel.org" , "catalin.marinas@arm.com" , "Miao, Jun" , "palmer@dabbelt.com" , "pdurrant@amazon.co.uk" , "vbabka@suse.cz" , "peterx@redhat.com" , "x86@kernel.org" , "amoorthy@google.com" , "tabba@google.com" , "maz@kernel.org" , "quic_svaddagi@quicinc.com" , "vkuznets@redhat.com" , "anthony.yznaga@oracle.com" , "jack@suse.cz" , "mail@maciej.szmigiero.name" , "quic_eberman@quicinc.com" , "Wang, Wei W" , "keirf@google.com" , "Wieczor-Retman, Maciej" , "Zhao, Yan Y" , "ajones@ventanamicro.com" , "willy@infradead.org" , "paul.walmsley@sifive.com" , "Hansen, Dave" , "aik@amd.com" , "usama.arif@bytedance.com" , "quic_mnalajal@quicinc.com" , "fvdl@google.com" , "rppt@kernel.org" , "quic_cvanscha@quicinc.com" , "nsaenz@amazon.es" , "anup@brainfault.org" , "thomas.lendacky@amd.com" , "linux-kernel@vger.kernel.org" , "mic@digikod.net" , "oliver.upton@linux.dev" , "Du, Fan" , "akpm@linux-foundation.org" , "steven.price@arm.com" , "binbin.wu@linux.intel.com" , "muchun.song@linux.dev" , "Li, Zhiquan1" , "rientjes@google.com" , "Aktas, Erdem" , "mpe@ellerman.id.au" , "david@redhat.com" , "jgg@ziepe.ca" , "hughd@google.com" , "jhubbard@nvidia.com" , "Xu, Haibo1" , "Yamahata, Isaku" , "jthoughton@google.com" , "steven.sistare@oracle.com" , "quic_pheragu@quicinc.com" , "jarkko@kernel.org" , "Shutemov, Kirill" , "chenhuacai@kernel.org" , "Huang, Kai" , "shuah@kernel.org" , "bfoster@redhat.com" , "dwmw@amazon.co.uk" , "Peng, Chao P" , "pankaj.gupta@amd.com" , "Graf, Alexander" , "nikunj@amd.com" , "viro@zeniv.linux.org.uk" , "pbonzini@redhat.com" , "yuzenghui@huawei.com" , "jroedel@suse.de" , "suzuki.poulose@arm.com" , "jgowans@amazon.com" , "Xu, Yilun" , "liam.merwick@oracle.com" , "michael.roth@amd.com" , "quic_tsoni@quicinc.com" , "Li, Xiaoyao" , "aou@eecs.berkeley.edu" , "Weiny, Ira" , "richard.weiyang@gmail.com" , "kent.overstreet@linux.dev" , "qperret@google.com" , "dmatlack@google.com" , "james.morse@arm.com" , "brauner@kernel.org" , "linux-fsdevel@vger.kernel.org" , "ackerleytng@google.com" , "pgonda@google.com" , "quic_pderrin@quicinc.com" , "roypat@amazon.co.uk" , "hch@infradead.org" , "will@kernel.org" , "linux-mm@kvack.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Jul 9, 2025 at 8:17=E2=80=AFAM Edgecombe, Rick P wrote: > > On Wed, 2025-07-09 at 07:28 -0700, Vishal Annapurve wrote: > > I think we can simplify the role of guest_memfd in line with discussion= [1]: > > 1) guest_memfd is a memory provider for userspace, KVM, IOMMU. > > - It allows fallocate to populate/deallocate memory > > 2) guest_memfd supports the notion of private/shared faults. > > 3) guest_memfd supports memory access control: > > - It allows shared faults from userspace, KVM, IOMMU > > - It allows private faults from KVM, IOMMU > > 4) guest_memfd supports changing access control on its ranges between > > shared/private. > > - It notifies the users to invalidate their mappings for the > > ranges getting converted/truncated. > > KVM needs to know if a GFN is private/shared. I think it is also intended= to now > be a repository for this information, right? Besides invalidations, it ne= eds to > be queryable. Yeah, that interface can be added as well. Though, if possible KVM can just directly pass the fault type to guest_memfd and it can return an error if the fault type doesn't match the permission. Additionally KVM does query the mapping order for a certain pfn/gfn which will need to be supported as well. > > > > > Responsibilities that ideally should not be taken up by guest_memfd: > > 1) guest_memfd can not initiate pre-faulting on behalf of it's users. > > 2) guest_memfd should not be directly communicating with the > > underlying architecture layers. > > - All communication should go via KVM/IOMMU. > > Maybe stronger, there should be generic gmem behaviors. Not any special > if (vm_type =3D=3D tdx) type logic. > > > 3) KVM should ideally associate the lifetime of backing > > pagetables/protection tables/RMP tables with the lifetime of the > > binding of memslots with guest_memfd. > > - Today KVM SNP logic ties RMP table entry lifetimes with how > > long the folios are mapped in guest_memfd, which I think should be > > revisited. > > I don't understand the problem. KVM needs to respond to user accessible > invalidations, but how long it keeps other resources around could be usef= ul for > various optimizations. Like deferring work to a work queue or something. I don't think it could be deferred to a work queue as the RMP table entries will need to be removed synchronously once the last reference on the guest_memfd drops, unless memory itself is kept around after filemap eviction. I can see benefits of this approach for handling scenarios like intrahost-migration. > > I think it would help to just target the ackerly series goals. We should = get > that code into shape and this kind of stuff will fall out of it. > > > > > Some very early thoughts on how guest_memfd could be laid out for the l= ong term: > > 1) guest_memfd code ideally should be built-in to the kernel. > > 2) guest_memfd instances should still be created using KVM IOCTLs that > > carry specific capabilities/restrictions for its users based on the > > backing VM/arch. > > 3) Any outgoing communication from guest_memfd to it's users like > > userspace/KVM/IOMMU should be via notifiers to invalidate similar to > > how MMU notifiers work. > > 4) KVM and IOMMU can implement intermediate layers to handle > > interaction with guest_memfd. > > - e.g. there could be a layer within kvm that handles: > > - creating guest_memfd files and associating a > > kvm_gmem_context with those files. > > - memslot binding > > - kvm_gmem_context will be used to bind kvm > > memslots with the context ranges. > > - invalidate notifier handling > > - kvm_gmem_context will be used to intercept > > guest_memfd callbacks and > > translate them to the right GPA ranges. > > - linking > > - kvm_gmem_context can be linked to different > > KVM instances. > > We can probably look at the code to decide these. > Agree.