From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-173.mta0.migadu.com (out-173.mta0.migadu.com [91.218.175.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A9C17199FBA; Fri, 26 Sep 2025 14:49:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.173 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758898173; cv=none; b=Fie7LPtj5zL+qCSKEwxHWzL8ZTtSph0Daro1RXJHKFeXqquSIcNNy57/KegBnixOjbNVsloxkbvaj2Ej8Vh843c20as9QCYbXo3NLG8gmcROvbF4dFR3snzWhUkHG+20rbc1ejWJRya/BF7Aazz0/E63TMLnI//TtJWMCnk0/HA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758898173; c=relaxed/simple; bh=X4MWoMvZpNFx+OVJEAY7k+/NlF9jfSSaTBDZkjwLetc=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=CmXvn2wVr5z7JzPj343KDVFF0HpDQrZtk4Dol4t2VTz2IpIoRZ3uZxSkiZTwHaN3ra4axQIbVqlP0ZZAk4mzUCciXaStm8XtWO9lZlsDI3ls+xsvbrY6hVoG4Xi+63Poc0WQ8+7ybIxl6lfB7NPD+clKfZKppmhRNgfN3ZxFRlg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=p0uimPCK; arc=none smtp.client-ip=91.218.175.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="p0uimPCK" Message-ID: <32608c1b-6da5-4a06-9790-58dfd4ba2011@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1758898159; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gXM7GNVXlwNM6qgC8RcEz4DIWXulx4en7MnJK64sYug=; b=p0uimPCKzx+0lRZWcpXNlJCkvBNo1Ljba0JQRRiJr1JwGeyRPb/ZvYNiwaZnqmaqBF7Hfc 3c7Khai3reWQ5g+jEsG9PV/jiNF/B/MaQknVMCzw1Eig28PrpuaUFhS2uvhY6e0GXpTSzo qpRv/xQvchLVI0CesjUiAFowJOpaF6Q= Date: Fri, 26 Sep 2025 15:49:13 +0100 Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH v7 05/12] KVM: guest_memfd: Add flag to remove from direct map To: "Roy, Patrick" Cc: "pbonzini@redhat.com" , "corbet@lwn.net" , "maz@kernel.org" , "oliver.upton@linux.dev" , "joey.gouly@arm.com" , "suzuki.poulose@arm.com" , "yuzenghui@huawei.com" , "catalin.marinas@arm.com" , "will@kernel.org" , "tglx@linutronix.de" , "mingo@redhat.com" , "bp@alien8.de" , "dave.hansen@linux.intel.com" , "x86@kernel.org" , "hpa@zytor.com" , "luto@kernel.org" , "peterz@infradead.org" , "willy@infradead.org" , "akpm@linux-foundation.org" , "david@redhat.com" , "lorenzo.stoakes@oracle.com" , "Liam.Howlett@oracle.com" , "vbabka@suse.cz" , "rppt@kernel.org" , "surenb@google.com" , "mhocko@suse.com" , "song@kernel.org" , "jolsa@kernel.org" , "ast@kernel.org" , "daniel@iogearbox.net" , "andrii@kernel.org" , "martin.lau@linux.dev" , "eddyz87@gmail.com" , "yonghong.song@linux.dev" , "john.fastabend@gmail.com" , "kpsingh@kernel.org" , "sdf@fomichev.me" , "haoluo@google.com" , "jgg@ziepe.ca" , "jhubbard@nvidia.com" , "peterx@redhat.com" , "jannh@google.com" , "pfalcato@suse.de" , "shuah@kernel.org" , "seanjc@google.com" , "kvm@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "kvmarm@lists.linux.dev" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , "bpf@vger.kernel.org" , "linux-kselftest@vger.kernel.org" , "Cali, Marco" , "Kalyazin, Nikita" , "Thomson, Jack" , "derekmn@amazon.co.uk" , "tabba@google.com" , "ackerleytng@google.com" References: <20250924151101.2225820-4-patrick.roy@campus.lmu.de> <20250924152214.7292-1-roypat@amazon.co.uk> <20250924152214.7292-2-roypat@amazon.co.uk> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Patrick Roy Content-Language: en-US In-Reply-To: <20250924152214.7292-2-roypat@amazon.co.uk> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT On Wed, 2025-09-24 at 16:22 +0100, "Roy, Patrick" wrote: [...] > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index 55b8d739779f..b7129c4868c5 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -4,6 +4,9 @@ > #include > #include > #include > +#include > + > +#include > > #include "kvm_mm.h" > > @@ -42,6 +45,44 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo > return 0; > } > > +#define KVM_GMEM_FOLIO_NO_DIRECT_MAP BIT(0) > + > +static bool kvm_gmem_folio_no_direct_map(struct folio *folio) > +{ > + return ((u64) folio->private) & KVM_GMEM_FOLIO_NO_DIRECT_MAP; > +} > + > +static int kvm_gmem_folio_zap_direct_map(struct folio *folio) > +{ > + if (kvm_gmem_folio_no_direct_map(folio)) > + return 0; > + > + int r = set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio), > + false); > + > + if (!r) { > + unsigned long addr = (unsigned long) folio_address(folio); > + folio->private = (void *) ((u64) folio->private & KVM_GMEM_FOLIO_NO_DIRECT_MAP); > + flush_tlb_kernel_range(addr, addr + folio_size(folio)); > + } > + > + return r; > +} No idea how I managed to mess this function up so completely, but it should be more like static int kvm_gmem_folio_zap_direct_map(struct folio *folio) { int r = 0; unsigned long addr = (unsigned long) folio_address(folio); u64 gmem_flags = (u64) folio_inode(folio)->i_private; if (kvm_gmem_folio_no_direct_map(folio) || !(gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP)) goto out; r = set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio), false); if (r) goto out; folio->private = (void *) KVM_GMEM_FOLIO_NO_DIRECT_MAP; flush_tlb_kernel_range(addr, addr + folio_size(folio)); out: return r; } the version I sent (a) does not respect the flags passed to guest_memfd on creation, and (b) does not correctly set the bit in folio->private. > +static void kvm_gmem_folio_restore_direct_map(struct folio *folio) > +{ > + /* > + * Direct map restoration cannot fail, as the only error condition > + * for direct map manipulation is failure to allocate page tables > + * when splitting huge pages, but this split would have already > + * happened in set_direct_map_invalid_noflush() in kvm_gmem_folio_zap_direct_map(). > + * Thus set_direct_map_valid_noflush() here only updates prot bits. > + */ > + if (kvm_gmem_folio_no_direct_map(folio)) > + set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio), > + true); > +} > + > static inline void kvm_gmem_mark_prepared(struct folio *folio) > { > folio_mark_uptodate(folio); > @@ -324,13 +365,14 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf) > struct inode *inode = file_inode(vmf->vma->vm_file); > struct folio *folio; > vm_fault_t ret = VM_FAULT_LOCKED; > + int err; > > if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode)) > return VM_FAULT_SIGBUS; > > folio = kvm_gmem_get_folio(inode, vmf->pgoff); > if (IS_ERR(folio)) { > - int err = PTR_ERR(folio); > + err = PTR_ERR(folio); > > if (err == -EAGAIN) > return VM_FAULT_RETRY; > @@ -348,6 +390,13 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf) > kvm_gmem_mark_prepared(folio); > } > > + err = kvm_gmem_folio_zap_direct_map(folio); > + > + if (err) { > + ret = vmf_error(err); > + goto out_folio; > + } > + > vmf->page = folio_file_page(folio, vmf->pgoff); > > out_folio: > @@ -435,6 +484,8 @@ static void kvm_gmem_free_folio(struct folio *folio) > kvm_pfn_t pfn = page_to_pfn(page); > int order = folio_order(folio); > > + kvm_gmem_folio_restore_direct_map(folio); > + > kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order)); > } > > @@ -499,6 +550,9 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) > /* Unmovable mappings are supposed to be marked unevictable as well. */ > WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); > > + if (flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP) > + mapping_set_no_direct_map(inode->i_mapping); > + > kvm_get_kvm(kvm); > gmem->kvm = kvm; > xa_init(&gmem->bindings); > @@ -523,6 +577,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) > if (kvm_arch_supports_gmem_mmap(kvm)) > valid_flags |= GUEST_MEMFD_FLAG_MMAP; > > + if (kvm_arch_gmem_supports_no_direct_map()) > + valid_flags |= GUEST_MEMFD_FLAG_NO_DIRECT_MAP; > + > if (flags & ~valid_flags) > return -EINVAL; > > @@ -687,6 +744,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, > if (!is_prepared) > r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio); > > + kvm_gmem_folio_zap_direct_map(folio); > + > folio_unlock(folio); > > if (!r) [...]