From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8961A29B8E1 for ; Thu, 25 Sep 2025 14:22:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758810155; cv=none; b=P1j/YkFTq/m62ltxH1Ug4It2igFyqawc9tjz0MLSMPbHUMZ76RcUp+aoczLHIA2TO12B31KAUTy2JPFh156Fl8beYrcpViDWDXiKSBcAagRbAfxnErope9qU3ieptXgqqsXOqiNp2YcKUKNCDUf7ZioTs1BdaGICasbZH6J6peY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758810155; c=relaxed/simple; bh=D7B17HCKz9nacwG7ZH5m7J7E2r4Whs5KBEZsYzJQ9uY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=A78RiWsbFSMNZDBN4ClhItgHGnK8+rGiRewGjhs+ozz1mYXDdR4W12wZnqcS9nyHmDmuRLaqDnoQR/HfZbrcoiQ2kucrSueScDlP1neBxmiDpEZIzjCXHFf8jJdEWEIFzzgzl641s9ygRVrtgK4mSouFArV9C27TXHTiGWGag9c= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=IsWjoexW; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="IsWjoexW" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b54b37ba2d9so1601742a12.0 for ; Thu, 25 Sep 2025 07:22:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1758810152; x=1759414952; darn=lists.linux.dev; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=SblUb3K9vKSfuIkumtnWvH3DVTbw+wxBSl+ZySJr9Gg=; b=IsWjoexWv7GzM+O4sxqnc2awYWhbepCYLjr6wLjSTFytILyGGTLsXO9X2HWLwT0sPP BD9/e0aeraQIf/a1YJqtTIA43JJoacQo4GzWEzrdmCTAKVTWK2xF483g+U523bkjp2th LPZdr/JaYBUB9NXNIezGohuAw0u/Cecb8EWGw+IN4gRW8VynNt2XnyFeisV2fwYjEeq9 ow96OKFddlvfKn3KqrtO0iCGROuHqRoFBku/VAikduFbN/iLtkTAuQfsDOREDrvQZkWs 0W030vTkCvZLEAYS532TZZrdQj4723k5ER4UC7bsrYR5buRPD2icuTjqJD9QgyjLeFSA QR0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758810152; x=1759414952; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=SblUb3K9vKSfuIkumtnWvH3DVTbw+wxBSl+ZySJr9Gg=; b=lPaFs6Knje1k8jZ4C0SfLh95h4VGk4quNjozJpWxQr7i4nTEsWWaDrDpLv5sIWksie SjDetQzjPhqaIYer60wOBNeRjt7ddrDfuQK40mh7wreCkECclIa+6eeo5yJ2Rx2c6kwB tEx+nPDNit4kmfYepU2DlUrpeZOtOHEWRcAFEtMhdUxQBS6WiR4NHUFlvSBwUn/jkQyV VRF9XUTZ3DFXzP0Ev4BABBbV8IMpPFpdOFcagrWP/jny57A7zuJq2zZmSiroa8H84a0z mfQEVflLeWoNRKafPAVr+O11naQ+33tJ66cMRy3mSVgC0UxlTtfUB7KFhyjPgEAAqyAg 9ouQ== X-Forwarded-Encrypted: i=1; AJvYcCXKl64dgnnOQYzZV5AQNH/54Liye6CHEZLaDZpMY5QVw0ZzfN5FG/hS6d490ZVZuvZM0rAzA9zXZYIR@lists.linux.dev X-Gm-Message-State: AOJu0Yw+LNHKIR/zjPQniwnz5TQUIK/oi89CfdRtY/m1xYkfhudAbaKQ nFF5xgi0A1azVmhsXDsU2gNZvkSPR2nlz3jBlMnF4knlcHultEpqfsY1WJ7CeEFFXlqYnQBiTP+ FjK3a6A== X-Google-Smtp-Source: AGHT+IH/f88NvvxTkAVG4DhhjQkTxFzTBsV5SbcgowW0+JrB9EElqsyS6+jWyWSmkD83IUsL47BQyHKSocQ= X-Received: from pjbmj16.prod.google.com ([2002:a17:90b:3690:b0:32e:c154:c2f6]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5107:b0:32b:a311:d1ae with SMTP id 98e67ed59e1d1-334567a1c56mr2743555a91.10.1758810152121; Thu, 25 Sep 2025 07:22:32 -0700 (PDT) Date: Thu, 25 Sep 2025 07:22:30 -0700 In-Reply-To: <20250827175247.83322-9-shivankg@amd.com> Precedence: bulk X-Mailing-List: linux-coco@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250827175247.83322-2-shivankg@amd.com> <20250827175247.83322-9-shivankg@amd.com> Message-ID: Subject: Re: [PATCH kvm-next V11 6/7] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy From: Sean Christopherson To: Shivank Garg Cc: willy@infradead.org, akpm@linux-foundation.org, david@redhat.com, pbonzini@redhat.com, shuah@kernel.org, vbabka@suse.cz, brauner@kernel.org, viro@zeniv.linux.org.uk, dsterba@suse.com, xiang@kernel.org, chao@kernel.org, jaegeuk@kernel.org, clm@fb.com, josef@toxicpanda.com, kent.overstreet@linux.dev, zbestahu@gmail.com, jefflexu@linux.alibaba.com, dhavale@google.com, lihongbo22@huawei.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, rppt@kernel.org, surenb@google.com, mhocko@suse.com, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, tabba@google.com, ackerleytng@google.com, paul@paul-moore.com, jmorris@namei.org, serge@hallyn.com, pvorel@suse.cz, bfoster@redhat.com, vannapurve@google.com, chao.gao@intel.com, bharata@amd.com, nikunj@amd.com, michael.day@amd.com, shdhiman@amd.com, yan.y.zhao@intel.com, Neeraj.Upadhyay@amd.com, thomas.lendacky@amd.com, michael.roth@amd.com, aik@amd.com, jgg@nvidia.com, kalyazin@amazon.com, peterx@redhat.com, jack@suse.cz, hch@infradead.org, cgzones@googlemail.com, ira.weiny@intel.com, rientjes@google.com, roypat@amazon.co.uk, chao.p.peng@intel.com, amit@infradead.org, ddutile@redhat.com, dan.j.williams@intel.com, ashish.kalra@amd.com, gshan@redhat.com, jgowans@amazon.com, pankaj.gupta@amd.com, papaluri@amd.com, yuzhao@google.com, suzuki.poulose@arm.com, quic_eberman@quicinc.com, linux-bcachefs@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-erofs@lists.ozlabs.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-coco@lists.linux.dev Content-Type: text/plain; charset="us-ascii" On Wed, Aug 27, 2025, Shivank Garg wrote: > @@ -26,6 +28,9 @@ static inline struct kvm_gmem_inode_info *KVM_GMEM_I(struct inode *inode) > return container_of(inode, struct kvm_gmem_inode_info, vfs_inode); > } > > +static struct mempolicy *kvm_gmem_get_pgoff_policy(struct kvm_gmem_inode_info *info, > + pgoff_t index); > + > /** > * folio_file_pfn - like folio_file_page, but return a pfn. > * @folio: The folio which contains this index. > @@ -112,7 +117,25 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot, > static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index) > { > /* TODO: Support huge pages. */ > - return filemap_grab_folio(inode->i_mapping, index); > + struct mempolicy *policy; > + struct folio *folio; > + > + /* > + * Fast-path: See if folio is already present in mapping to avoid > + * policy_lookup. > + */ > + folio = __filemap_get_folio(inode->i_mapping, index, > + FGP_LOCK | FGP_ACCESSED, 0); > + if (!IS_ERR(folio)) > + return folio; > + > + policy = kvm_gmem_get_pgoff_policy(KVM_GMEM_I(inode), index); > + folio = __filemap_get_folio_mpol(inode->i_mapping, index, > + FGP_LOCK | FGP_ACCESSED | FGP_CREAT, > + mapping_gfp_mask(inode->i_mapping), policy); > + mpol_cond_put(policy); > + > + return folio; > } > > static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, > @@ -372,8 +395,45 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf) > return ret; > } > > +#ifdef CONFIG_NUMA > +static int kvm_gmem_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol) > +{ > + struct inode *inode = file_inode(vma->vm_file); > + > + return mpol_set_shared_policy(&KVM_GMEM_I(inode)->policy, vma, mpol); > +} > + > +static struct mempolicy *kvm_gmem_get_policy(struct vm_area_struct *vma, > + unsigned long addr, pgoff_t *pgoff) > +{ > + struct inode *inode = file_inode(vma->vm_file); > + > + *pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT); > + return mpol_shared_policy_lookup(&KVM_GMEM_I(inode)->policy, *pgoff); > +} > + > +static struct mempolicy *kvm_gmem_get_pgoff_policy(struct kvm_gmem_inode_info *info, > + pgoff_t index) I keep reading this is "page offset policy", as opposed to "policy given a page offset". Another oddity that is confusing is that this helper explicitly does get_task_policy(current), while kvm_gmem_get_policy() lets the caller do that. The end result is the same, but I think it would be helpful for gmem to be internally consistent. If we have kvm_gmem_get_policy() use this helper, then we can kill two birds with one stone: static struct mempolicy *__kvm_gmem_get_policy(struct gmem_inode *gi, pgoff_t index) { struct mempolicy *mpol; mpol = mpol_shared_policy_lookup(&gi->policy, index); return mpol ? mpol : get_task_policy(current); } static struct mempolicy *kvm_gmem_get_policy(struct vm_area_struct *vma, unsigned long addr, pgoff_t *pgoff) { *pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT); return __kvm_gmem_get_policy(GMEM_I(file_inode(vma->vm_file)), *pgoff); }