linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/12] Direct Map Removal Support for guest_memfd
@ 2025-08-28  9:39 Roy, Patrick
  2025-08-28  9:39 ` [PATCH v5 01/12] filemap: Pass address_space mapping to ->free_folio() Roy, Patrick
                   ` (12 more replies)
  0 siblings, 13 replies; 38+ messages in thread
From: Roy, Patrick @ 2025-08-28  9:39 UTC (permalink / raw)
  To: david@redhat.com, seanjc@google.com
  Cc: Roy, Patrick, tabba@google.com, ackerleytng@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek

[ based on kvm/next ]

Unmapping virtual machine guest memory from the host kernel's direct map is a
successful mitigation against Spectre-style transient execution issues: If the
kernel page tables do not contain entries pointing to guest memory, then any
attempted speculative read through the direct map will necessarily be blocked
by the MMU before any observable microarchitectural side-effects happen. This
means that Spectre-gadgets and similar cannot be used to target virtual machine
memory. Roughly 60% of speculative execution issues fall into this category [1,
Table 1].

This patch series extends guest_memfd with the ability to remove its memory
from the host kernel's direct map, to be able to attain the above protection
for KVM guests running inside guest_memfd.

=== Design ===

We build on top of guest_memfd's recent support for "non-confidential VMs", in
which all of guest_memfd is mappable to userspace (e.g. considered "shared").
For such VMs, all guest page faults are routed through guest_memfd's special
page fault handler, which due to consuming fd+offset directly, can map direct
map removed memory into the guest. KVM's internal accesses to guest memory are
handled by providing each memslot with a userspace mapping of that memslots
guest_memfd via userspace_addr. Since KVM's internal accesses are almost
exclusively handled via copy_from_user() and friends, this allows KVM to access
direct map removed guest memory for features such as MMIO instruction emulation
on x86 or pvtime support on ARM64.

=== Implementation ===

The KVM_CREATE_GUEST_MEMFD ioctl gains a new flag
GUEST_MEMFD_FLAG_NO_DIRECT_MAP.  If this flag is passed, then guest_memfd
removes direct map entries for its folios are preparation. Upon free-ing of the
memory, direct map entries are restored prior to gmem's arch specific
invalidation callback.

Support for the flag can be discovered via the KVM_CAP_GMEM_NO_DIRECT_MAP
capability, which is only available if direct map modifications at 4k
granularity is architecturally possible / when KVM can successfully map direct
map removed memory into the guest.

=== Testing ===

KVM selftests are extended to cover the above-described non-CoCo workflows,
where guest_memfd with direct map entries removed is used to back all of guest
memory, and exercising some simple MMIO paths.

Additionally, a Firecracker branch with support for these VMs can be found on
GitHub [2].

=== Changes since v4 ===

- Rebase on top of kvm/next
- Stop using PG_private to track direct map removal state
- fix build or KVM-as-a-module by using new EXPORT_SYMBOL_FOR_MODULES

=== FAQ ===

--- why not reuse memfd_secret() / a bespoke guest memory solution? ---

having guest memory be direct map removed means guest page faults cannot be
resolved by GUP-ing userspace mappings of guest memory, as GUP is disabled for
direct map removed memory (as currently GUP has no way to understand that a
specific GUP request will not subsequently dereference page_address()).
guest_memfd already has a special path inside KVM that instead consumed
fd+offset, so it makes sense to reuse this. Additionally, it means that
direct-map-removed VMs can benefit from active development on guest_memfd, such
as huge pages support.

--- why do KVM internal accesses through userspace page tables? ---

For traditional VMs, all KVM internal accesses are done through the
userspace_addr stored in a memslot, meaning no changes to most KVM code are
needed just to allow access to guest_memfd backed / direct map removed guest
memory of non-confidential VMs. Previous iterations of this series tried to
avoid userspace mappings, instead attempting to dynamically restore direct map
entries for internal accesses [RFCv2], but this turned out to have a
significant performance impact, as well as additional complexity due to needing
to refcount direct map reinsertion operations and making them play nicely with
gmem truncations.

--- what doesn't work with direct map removed VMs? ---

The only thing I'm aware of is kvm-clock, since it tries to GUP guest memory
via gfn_to_pfn_cache. Realistically, this is only a problem on AMD, as on Intel
guests can use TSC as a clocksource (Intel allows discovery of TSC frequency
via CPUID, while AMD doesn't).  AMD guests fall back onto some calibration
routine, which fails most of the time though.

[1]: https://download.vusec.net/papers/quarantine_raid23.pdf
[2]: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
[RFCv1]: https://lore.kernel.org/kvm/20240709132041.3625501-1-roypat@amazon.co.uk/
[RFCv2]: https://lore.kernel.org/kvm/20240910163038.1298452-1-roypat@amazon.co.uk/
[RFCv3]: https://lore.kernel.org/kvm/20241030134912.515725-1-roypat@amazon.co.uk/
[v4]: https://lore.kernel.org/kvm/20250221160728.1584559-1-roypat@amazon.co.uk/


Elliot Berman (1):
  filemap: Pass address_space mapping to ->free_folio()

Patrick Roy (11):
  arch: export set_direct_map_valid_noflush to KVM module
  mm: introduce AS_NO_DIRECT_MAP
  KVM: guest_memfd: Add flag to remove from direct map
  KVM: Documentation: describe GUEST_MEMFD_FLAG_NO_DIRECT_MAP
  KVM: selftests: load elf via bounce buffer
  KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd
    != -1
  KVM: selftests: Add guest_memfd based vm_mem_backing_src_types
  KVM: selftests: stuff vm_mem_backing_src_type into vm_shape
  KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in mem conversion
    tests
  KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in
    guest_memfd_test.c
  KVM: selftests: Test guest execution from direct map removed gmem

 Documentation/filesystems/locking.rst         |  2 +-
 Documentation/virt/kvm/api.rst                |  5 ++
 arch/arm64/include/asm/kvm_host.h             | 12 ++++
 arch/arm64/mm/pageattr.c                      |  1 +
 arch/loongarch/mm/pageattr.c                  |  1 +
 arch/riscv/mm/pageattr.c                      |  1 +
 arch/s390/mm/pageattr.c                       |  1 +
 arch/x86/mm/pat/set_memory.c                  |  1 +
 fs/nfs/dir.c                                  | 11 ++--
 fs/orangefs/inode.c                           |  3 +-
 include/linux/fs.h                            |  2 +-
 include/linux/kvm_host.h                      |  7 +++
 include/linux/pagemap.h                       | 16 +++++
 include/linux/secretmem.h                     | 18 ------
 include/uapi/linux/kvm.h                      |  2 +
 lib/buildid.c                                 |  4 +-
 mm/filemap.c                                  |  9 +--
 mm/gup.c                                      | 14 +----
 mm/mlock.c                                    |  2 +-
 mm/secretmem.c                                |  9 +--
 mm/vmscan.c                                   |  4 +-
 .../testing/selftests/kvm/guest_memfd_test.c  |  2 +
 .../testing/selftests/kvm/include/kvm_util.h  | 37 ++++++++---
 .../testing/selftests/kvm/include/test_util.h |  8 +++
 tools/testing/selftests/kvm/lib/elf.c         |  8 +--
 tools/testing/selftests/kvm/lib/io.c          | 23 +++++++
 tools/testing/selftests/kvm/lib/kvm_util.c    | 61 +++++++++++--------
 tools/testing/selftests/kvm/lib/test_util.c   |  8 +++
 tools/testing/selftests/kvm/lib/x86/sev.c     |  1 +
 .../selftests/kvm/pre_fault_memory_test.c     |  1 +
 .../selftests/kvm/set_memory_region_test.c    | 50 +++++++++++++--
 .../kvm/x86/private_mem_conversions_test.c    |  7 ++-
 virt/kvm/guest_memfd.c                        | 32 ++++++++--
 virt/kvm/kvm_main.c                           |  5 ++
 34 files changed, 264 insertions(+), 104 deletions(-)


base-commit: a6ad54137af92535cfe32e19e5f3bc1bb7dbd383
-- 
2.50.1



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v5 01/12] filemap: Pass address_space mapping to ->free_folio()
  2025-08-28  9:39 [PATCH v5 00/12] Direct Map Removal Support for guest_memfd Roy, Patrick
@ 2025-08-28  9:39 ` Roy, Patrick
  2025-08-28  9:39 ` [PATCH v5 02/12] arch: export set_direct_map_valid_noflush to KVM module Roy, Patrick
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Roy, Patrick @ 2025-08-28  9:39 UTC (permalink / raw)
  To: david@redhat.com, seanjc@google.com
  Cc: Elliot Berman, tabba@google.com, ackerleytng@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek, Roy, Patrick

From: Elliot Berman <quic_eberman@quicinc.com>

When guest_memfd removes memory from the host kernel's direct map,
direct map entries must be restored before the memory is freed again. To
do so, ->free_folio() needs to know whether a gmem folio was direct map
removed in the first place though. While possible to keep track of this
information on each individual folio (e.g. via page flags), direct map
removal is an all-or-nothing property of the entire guest_memfd, so it
is less error prone to just check the flag stored in the gmem inode's
private data.  However, by the time ->free_folio() is called,
folio->mapping might be cleared. To still allow access to the address
space from which the folio was just removed, pass it in as an additional
argument to ->free_folio, as the mapping is well-known to all callers.

Link: https://lore.kernel.org/all/15f665b4-2d33-41ca-ac50-fafe24ade32f@redhat.com/
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
[patrick: rewrite shortlog for new usecase]
Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
---
 Documentation/filesystems/locking.rst |  2 +-
 fs/nfs/dir.c                          | 11 ++++++-----
 fs/orangefs/inode.c                   |  3 ++-
 include/linux/fs.h                    |  2 +-
 mm/filemap.c                          |  9 +++++----
 mm/secretmem.c                        |  3 ++-
 mm/vmscan.c                           |  4 ++--
 virt/kvm/guest_memfd.c                |  3 ++-
 8 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index aa287ccdac2f..74c97287ec40 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -262,7 +262,7 @@ prototypes::
 	sector_t (*bmap)(struct address_space *, sector_t);
 	void (*invalidate_folio) (struct folio *, size_t start, size_t len);
 	bool (*release_folio)(struct folio *, gfp_t);
-	void (*free_folio)(struct folio *);
+	void (*free_folio)(struct address_space *, struct folio *);
 	int (*direct_IO)(struct kiocb *, struct iov_iter *iter);
 	int (*migrate_folio)(struct address_space *, struct folio *dst,
 			struct folio *src, enum migrate_mode);
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index d81217923936..644bd54e052c 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -55,7 +55,7 @@ static int nfs_closedir(struct inode *, struct file *);
 static int nfs_readdir(struct file *, struct dir_context *);
 static int nfs_fsync_dir(struct file *, loff_t, loff_t, int);
 static loff_t nfs_llseek_dir(struct file *, loff_t, int);
-static void nfs_readdir_clear_array(struct folio *);
+static void nfs_readdir_clear_array(struct address_space *, struct folio *);
 static int nfs_do_create(struct inode *dir, struct dentry *dentry,
 			 umode_t mode, int open_flags);
 
@@ -218,7 +218,8 @@ static void nfs_readdir_folio_init_array(struct folio *folio, u64 last_cookie,
 /*
  * we are freeing strings created by nfs_add_to_readdir_array()
  */
-static void nfs_readdir_clear_array(struct folio *folio)
+static void nfs_readdir_clear_array(struct address_space *mapping,
+				    struct folio *folio)
 {
 	struct nfs_cache_array *array;
 	unsigned int i;
@@ -233,7 +234,7 @@ static void nfs_readdir_clear_array(struct folio *folio)
 static void nfs_readdir_folio_reinit_array(struct folio *folio, u64 last_cookie,
 					   u64 change_attr)
 {
-	nfs_readdir_clear_array(folio);
+	nfs_readdir_clear_array(folio->mapping, folio);
 	nfs_readdir_folio_init_array(folio, last_cookie, change_attr);
 }
 
@@ -249,7 +250,7 @@ nfs_readdir_folio_array_alloc(u64 last_cookie, gfp_t gfp_flags)
 static void nfs_readdir_folio_array_free(struct folio *folio)
 {
 	if (folio) {
-		nfs_readdir_clear_array(folio);
+		nfs_readdir_clear_array(folio->mapping, folio);
 		folio_put(folio);
 	}
 }
@@ -391,7 +392,7 @@ static void nfs_readdir_folio_init_and_validate(struct folio *folio, u64 cookie,
 	if (folio_test_uptodate(folio)) {
 		if (nfs_readdir_folio_validate(folio, cookie, change_attr))
 			return;
-		nfs_readdir_clear_array(folio);
+		nfs_readdir_clear_array(folio->mapping, folio);
 	}
 	nfs_readdir_folio_init_array(folio, cookie, change_attr);
 	folio_mark_uptodate(folio);
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index a01400cd41fd..37227ba71593 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -452,7 +452,8 @@ static bool orangefs_release_folio(struct folio *folio, gfp_t foo)
 	return !folio_test_private(folio);
 }
 
-static void orangefs_free_folio(struct folio *folio)
+static void orangefs_free_folio(struct address_space *mapping,
+				struct folio *folio)
 {
 	kfree(folio_detach_private(folio));
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d7ab4f96d705..afb0748ffda6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -457,7 +457,7 @@ struct address_space_operations {
 	sector_t (*bmap)(struct address_space *, sector_t);
 	void (*invalidate_folio) (struct folio *, size_t offset, size_t len);
 	bool (*release_folio)(struct folio *, gfp_t);
-	void (*free_folio)(struct folio *folio);
+	void (*free_folio)(struct address_space *, struct folio *folio);
 	ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
 	/*
 	 * migrate the contents of a folio to the specified target. If
diff --git a/mm/filemap.c b/mm/filemap.c
index 751838ef05e5..3dd8ad922d80 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -226,11 +226,11 @@ void __filemap_remove_folio(struct folio *folio, void *shadow)
 
 void filemap_free_folio(struct address_space *mapping, struct folio *folio)
 {
-	void (*free_folio)(struct folio *);
+	void (*free_folio)(struct address_space *, struct folio *);
 
 	free_folio = mapping->a_ops->free_folio;
 	if (free_folio)
-		free_folio(folio);
+		free_folio(mapping, folio);
 
 	folio_put_refs(folio, folio_nr_pages(folio));
 }
@@ -820,7 +820,8 @@ EXPORT_SYMBOL(file_write_and_wait_range);
 void replace_page_cache_folio(struct folio *old, struct folio *new)
 {
 	struct address_space *mapping = old->mapping;
-	void (*free_folio)(struct folio *) = mapping->a_ops->free_folio;
+	void (*free_folio)(struct address_space *, struct folio *) =
+		mapping->a_ops->free_folio;
 	pgoff_t offset = old->index;
 	XA_STATE(xas, &mapping->i_pages, offset);
 
@@ -849,7 +850,7 @@ void replace_page_cache_folio(struct folio *old, struct folio *new)
 		__lruvec_stat_add_folio(new, NR_SHMEM);
 	xas_unlock_irq(&xas);
 	if (free_folio)
-		free_folio(old);
+		free_folio(mapping, old);
 	folio_put(old);
 }
 EXPORT_SYMBOL_GPL(replace_page_cache_folio);
diff --git a/mm/secretmem.c b/mm/secretmem.c
index 60137305bc20..422dcaa32506 100644
--- a/mm/secretmem.c
+++ b/mm/secretmem.c
@@ -150,7 +150,8 @@ static int secretmem_migrate_folio(struct address_space *mapping,
 	return -EBUSY;
 }
 
-static void secretmem_free_folio(struct folio *folio)
+static void secretmem_free_folio(struct address_space *mapping,
+				 struct folio *folio)
 {
 	set_direct_map_default_noflush(folio_page(folio, 0));
 	folio_zero_segment(folio, 0, folio_size(folio));
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a48aec8bfd92..559bd6ac965c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -788,7 +788,7 @@ static int __remove_mapping(struct address_space *mapping, struct folio *folio,
 		xa_unlock_irq(&mapping->i_pages);
 		put_swap_folio(folio, swap);
 	} else {
-		void (*free_folio)(struct folio *);
+		void (*free_folio)(struct address_space *, struct folio *);
 
 		free_folio = mapping->a_ops->free_folio;
 		/*
@@ -817,7 +817,7 @@ static int __remove_mapping(struct address_space *mapping, struct folio *folio,
 		spin_unlock(&mapping->host->i_lock);
 
 		if (free_folio)
-			free_folio(folio);
+			free_folio(mapping, folio);
 	}
 
 	return 1;
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 08a6bc7d25b6..9ec4c45e3cf2 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -430,7 +430,8 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
 }
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
-static void kvm_gmem_free_folio(struct folio *folio)
+static void kvm_gmem_free_folio(struct address_space *mapping,
+				struct folio *folio)
 {
 	struct page *page = folio_page(folio, 0);
 	kvm_pfn_t pfn = page_to_pfn(page);
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 02/12] arch: export set_direct_map_valid_noflush to KVM module
  2025-08-28  9:39 [PATCH v5 00/12] Direct Map Removal Support for guest_memfd Roy, Patrick
  2025-08-28  9:39 ` [PATCH v5 01/12] filemap: Pass address_space mapping to ->free_folio() Roy, Patrick
@ 2025-08-28  9:39 ` Roy, Patrick
  2025-08-28 10:07   ` Fuad Tabba
  2025-08-28  9:39 ` [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP Roy, Patrick
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Roy, Patrick @ 2025-08-28  9:39 UTC (permalink / raw)
  To: david@redhat.com, seanjc@google.com
  Cc: Roy, Patrick, tabba@google.com, ackerleytng@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek

Use the new per-module export functionality to allow KVM (and only KVM)
access to set_direct_map_valid_noflush(). This allows guest_memfd to
remove its memory from the direct map, even if KVM is built as a module.

Direct map removal gives guest_memfd the same protection that
memfd_secret enjoys, such as hardening against Spectre-like attacks
through in-kernel gadgets.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
---
 arch/arm64/mm/pageattr.c     | 1 +
 arch/loongarch/mm/pageattr.c | 1 +
 arch/riscv/mm/pageattr.c     | 1 +
 arch/s390/mm/pageattr.c      | 1 +
 arch/x86/mm/pat/set_memory.c | 1 +
 5 files changed, 5 insertions(+)

diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 04d4a8f676db..4f3cddfab9b0 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -291,6 +291,7 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
 
 	return set_memory_valid(addr, nr, valid);
 }
+EXPORT_SYMBOL_FOR_MODULES(set_direct_map_valid_noflush, "kvm");
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
 /*
diff --git a/arch/loongarch/mm/pageattr.c b/arch/loongarch/mm/pageattr.c
index f5e910b68229..d076bfd3fcbf 100644
--- a/arch/loongarch/mm/pageattr.c
+++ b/arch/loongarch/mm/pageattr.c
@@ -217,6 +217,7 @@ int set_direct_map_invalid_noflush(struct page *page)
 
 	return __set_memory(addr, 1, __pgprot(0), __pgprot(_PAGE_PRESENT | _PAGE_VALID));
 }
+EXPORT_SYMBOL_FOR_MODULES(set_direct_map_valid_noflush, "kvm");
 
 int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
 {
diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
index 3f76db3d2769..6db31040cd66 100644
--- a/arch/riscv/mm/pageattr.c
+++ b/arch/riscv/mm/pageattr.c
@@ -400,6 +400,7 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
 
 	return __set_memory((unsigned long)page_address(page), nr, set, clear);
 }
+EXPORT_SYMBOL_FOR_MODULES(set_direct_map_valid_noflush, "kvm");
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
 static int debug_pagealloc_set_page(pte_t *pte, unsigned long addr, void *data)
diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c
index 348e759840e7..8ffd9ef09bc6 100644
--- a/arch/s390/mm/pageattr.c
+++ b/arch/s390/mm/pageattr.c
@@ -413,6 +413,7 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
 
 	return __set_memory((unsigned long)page_to_virt(page), nr, flags);
 }
+EXPORT_SYMBOL_FOR_MODULES(set_direct_map_valid_noflush, "kvm");
 
 bool kernel_page_present(struct page *page)
 {
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 8834c76f91c9..87e9c7d2dcdc 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2661,6 +2661,7 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
 
 	return __set_pages_np(page, nr);
 }
+EXPORT_SYMBOL_FOR_MODULES(set_direct_map_valid_noflush, "kvm");
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
 void __kernel_map_pages(struct page *page, int numpages, int enable)
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
  2025-08-28  9:39 [PATCH v5 00/12] Direct Map Removal Support for guest_memfd Roy, Patrick
  2025-08-28  9:39 ` [PATCH v5 01/12] filemap: Pass address_space mapping to ->free_folio() Roy, Patrick
  2025-08-28  9:39 ` [PATCH v5 02/12] arch: export set_direct_map_valid_noflush to KVM module Roy, Patrick
@ 2025-08-28  9:39 ` Roy, Patrick
  2025-08-28 10:21   ` Fuad Tabba
                     ` (3 more replies)
  2025-08-28  9:39 ` [PATCH v5 04/12] KVM: guest_memfd: Add flag to remove from direct map Roy, Patrick
                   ` (9 subsequent siblings)
  12 siblings, 4 replies; 38+ messages in thread
From: Roy, Patrick @ 2025-08-28  9:39 UTC (permalink / raw)
  To: david@redhat.com, seanjc@google.com
  Cc: Roy, Patrick, tabba@google.com, ackerleytng@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek

Add AS_NO_DIRECT_MAP for mappings where direct map entries of folios are
set to not present . Currently, mappings that match this description are
secretmem mappings (memfd_secret()). Later, some guest_memfd
configurations will also fall into this category.

Reject this new type of mappings in all locations that currently reject
secretmem mappings, on the assumption that if secretmem mappings are
rejected somewhere, it is precisely because of an inability to deal with
folios without direct map entries, and then make memfd_secret() use
AS_NO_DIRECT_MAP on its address_space to drop its special
vma_is_secretmem()/secretmem_mapping() checks.

This drops a optimization in gup_fast_folio_allowed() where
secretmem_mapping() was only called if CONFIG_SECRETMEM=y. secretmem is
enabled by default since commit b758fe6df50d ("mm/secretmem: make it on
by default"), so the secretmem check did not actually end up elided in
most cases anymore anyway.

Use a new flag instead of overloading AS_INACCESSIBLE (which is already
set by guest_memfd) because not all guest_memfd mappings will end up
being direct map removed (e.g. in pKVM setups, parts of guest_memfd that
can be mapped to userspace should also be GUP-able, and generally not
have restrictions on who can access it).

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
---
 include/linux/pagemap.h   | 16 ++++++++++++++++
 include/linux/secretmem.h | 18 ------------------
 lib/buildid.c             |  4 ++--
 mm/gup.c                  | 14 +++-----------
 mm/mlock.c                |  2 +-
 mm/secretmem.c            |  6 +-----
 6 files changed, 23 insertions(+), 37 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 12a12dae727d..b52b28ae4636 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -211,6 +211,7 @@ enum mapping_flags {
 				   folio contents */
 	AS_INACCESSIBLE = 8,	/* Do not attempt direct R/W access to the mapping */
 	AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
+	AS_NO_DIRECT_MAP = 10,	/* Folios in the mapping are not in the direct map */
 	/* Bits 16-25 are used for FOLIO_ORDER */
 	AS_FOLIO_ORDER_BITS = 5,
 	AS_FOLIO_ORDER_MIN = 16,
@@ -346,6 +347,21 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(struct address_spac
 	return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
 }
 
+static inline void mapping_set_no_direct_map(struct address_space *mapping)
+{
+	set_bit(AS_NO_DIRECT_MAP, &mapping->flags);
+}
+
+static inline bool mapping_no_direct_map(struct address_space *mapping)
+{
+	return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);
+}
+
+static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma)
+{
+	return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
+}
+
 static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
 {
 	return mapping->gfp_mask;
diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
index e918f96881f5..0ae1fb057b3d 100644
--- a/include/linux/secretmem.h
+++ b/include/linux/secretmem.h
@@ -4,28 +4,10 @@
 
 #ifdef CONFIG_SECRETMEM
 
-extern const struct address_space_operations secretmem_aops;
-
-static inline bool secretmem_mapping(struct address_space *mapping)
-{
-	return mapping->a_ops == &secretmem_aops;
-}
-
-bool vma_is_secretmem(struct vm_area_struct *vma);
 bool secretmem_active(void);
 
 #else
 
-static inline bool vma_is_secretmem(struct vm_area_struct *vma)
-{
-	return false;
-}
-
-static inline bool secretmem_mapping(struct address_space *mapping)
-{
-	return false;
-}
-
 static inline bool secretmem_active(void)
 {
 	return false;
diff --git a/lib/buildid.c b/lib/buildid.c
index c4b0f376fb34..33f173a607ad 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -65,8 +65,8 @@ static int freader_get_folio(struct freader *r, loff_t file_off)
 
 	freader_put_folio(r);
 
-	/* reject secretmem folios created with memfd_secret() */
-	if (secretmem_mapping(r->file->f_mapping))
+	/* reject secretmem folios created with memfd_secret() or guest_memfd() */
+	if (mapping_no_direct_map(r->file->f_mapping))
 		return -EFAULT;
 
 	r->folio = filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT);
diff --git a/mm/gup.c b/mm/gup.c
index adffe663594d..8c988e076e5d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1234,7 +1234,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
 	if ((gup_flags & FOLL_SPLIT_PMD) && is_vm_hugetlb_page(vma))
 		return -EOPNOTSUPP;
 
-	if (vma_is_secretmem(vma))
+	if (vma_is_no_direct_map(vma))
 		return -EFAULT;
 
 	if (write) {
@@ -2751,7 +2751,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
 {
 	bool reject_file_backed = false;
 	struct address_space *mapping;
-	bool check_secretmem = false;
 	unsigned long mapping_flags;
 
 	/*
@@ -2763,14 +2762,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
 		reject_file_backed = true;
 
 	/* We hold a folio reference, so we can safely access folio fields. */
-
-	/* secretmem folios are always order-0 folios. */
-	if (IS_ENABLED(CONFIG_SECRETMEM) && !folio_test_large(folio))
-		check_secretmem = true;
-
-	if (!reject_file_backed && !check_secretmem)
-		return true;
-
 	if (WARN_ON_ONCE(folio_test_slab(folio)))
 		return false;
 
@@ -2812,8 +2803,9 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
 	 * At this point, we know the mapping is non-null and points to an
 	 * address_space object.
 	 */
-	if (check_secretmem && secretmem_mapping(mapping))
+	if (mapping_no_direct_map(mapping))
 		return false;
+
 	/* The only remaining allowed file system is shmem. */
 	return !reject_file_backed || shmem_mapping(mapping);
 }
diff --git a/mm/mlock.c b/mm/mlock.c
index a1d93ad33c6d..0def453fe881 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -474,7 +474,7 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
 
 	if (newflags == oldflags || (oldflags & VM_SPECIAL) ||
 	    is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm) ||
-	    vma_is_dax(vma) || vma_is_secretmem(vma) || (oldflags & VM_DROPPABLE))
+	    vma_is_dax(vma) || vma_is_no_direct_map(vma) || (oldflags & VM_DROPPABLE))
 		/* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */
 		goto out;
 
diff --git a/mm/secretmem.c b/mm/secretmem.c
index 422dcaa32506..a2daee0e63a5 100644
--- a/mm/secretmem.c
+++ b/mm/secretmem.c
@@ -134,11 +134,6 @@ static int secretmem_mmap_prepare(struct vm_area_desc *desc)
 	return 0;
 }
 
-bool vma_is_secretmem(struct vm_area_struct *vma)
-{
-	return vma->vm_ops == &secretmem_vm_ops;
-}
-
 static const struct file_operations secretmem_fops = {
 	.release	= secretmem_release,
 	.mmap_prepare	= secretmem_mmap_prepare,
@@ -206,6 +201,7 @@ static struct file *secretmem_file_create(unsigned long flags)
 
 	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
 	mapping_set_unevictable(inode->i_mapping);
+	mapping_set_no_direct_map(inode->i_mapping);
 
 	inode->i_op = &secretmem_iops;
 	inode->i_mapping->a_ops = &secretmem_aops;
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 04/12] KVM: guest_memfd: Add flag to remove from direct map
  2025-08-28  9:39 [PATCH v5 00/12] Direct Map Removal Support for guest_memfd Roy, Patrick
                   ` (2 preceding siblings ...)
  2025-08-28  9:39 ` [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP Roy, Patrick
@ 2025-08-28  9:39 ` Roy, Patrick
  2025-08-28 14:54   ` Mike Rapoport
  2025-08-28  9:39 ` [PATCH v5 05/12] KVM: Documentation: describe GUEST_MEMFD_FLAG_NO_DIRECT_MAP Roy, Patrick
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Roy, Patrick @ 2025-08-28  9:39 UTC (permalink / raw)
  To: david@redhat.com, seanjc@google.com
  Cc: Roy, Patrick, tabba@google.com, ackerleytng@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek

Add GUEST_MEMFD_FLAG_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD()
ioctl. When set, guest_memfd folios will be removed from the direct map
after preparation, with direct map entries only restored when the folios
are freed.

To ensure these folios do not end up in places where the kernel cannot
deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct
address_space if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is requested.

Add KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP to let userspace discover whether
guest_memfd supports GUEST_MEMFD_FLAG_NO_DIRECT_MAP. Support depends on
guest_memfd itself being supported, but also on whether KVM can
manipulate the direct map at page granularity at all (possible most of
the time, just arm64 is a notable outlier where its impossible if the
direct map has been setup using hugepages, as arm64 cannot break these
apart due to break-before-make semantics).

Note that this flag causes removal of direct map entries for all
guest_memfd folios independent of whether they are "shared" or "private"
(although current guest_memfd only supports either all folios in the
"shared" state, or all folios in the "private" state if
GUEST_MEMFD_FLAG_MMAP is not set). The usecase for removing direct map
entries of also the shared parts of guest_memfd are a special type of
non-CoCo VM where, host userspace is trusted to have access to all of
guest memory, but where Spectre-style transient execution attacks
through the host kernel's direct map should still be mitigated.  In this
setup, KVM retains access to guest memory via userspace mappings of
guest_memfd, which are reflected back into KVM's memslots via
userspace_addr. This is needed for things like MMIO emulation on x86_64
to work.

Do not perform TLB flushes after direct map manipulations. This is
because TLB flushes resulted in a up to 40x elongation of page faults in
guest_memfd (scaling with the number of CPU cores), or a 5x elongation
of memory population. TLB flushes are not needed for functional
correctness (the virt->phys mapping technically stays "correct",  the
kernel should simply to not it for a while). On the other hand, it means
that the desired protection from Spectre-style attacks is not perfect,
as an attacker could try to prevent a stale TLB entry from getting
evicted, keeping it alive until the page it refers to is used by the
guest for some sensitive data, and then targeting it using a
spectre-gadget.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
---
 arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
 include/linux/kvm_host.h          |  7 +++++++
 include/uapi/linux/kvm.h          |  2 ++
 virt/kvm/guest_memfd.c            | 29 +++++++++++++++++++++++++----
 virt/kvm/kvm_main.c               |  5 +++++
 5 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2f2394cce24e..0bfd8e5fd9de 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -19,6 +19,7 @@
 #include <linux/maple_tree.h>
 #include <linux/percpu.h>
 #include <linux/psci.h>
+#include <linux/set_memory.h>
 #include <asm/arch_gicv3.h>
 #include <asm/barrier.h>
 #include <asm/cpufeature.h>
@@ -1706,5 +1707,16 @@ void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt);
 void get_reg_fixed_bits(struct kvm *kvm, enum vcpu_sysreg reg, u64 *res0, u64 *res1);
 void check_feature_map(void);
 
+#ifdef CONFIG_KVM_GUEST_MEMFD
+static inline bool kvm_arch_gmem_supports_no_direct_map(void)
+{
+	/*
+	 * Without FWB, direct map access is needed in kvm_pgtable_stage2_map(),
+	 * as it calls dcache_clean_inval_poc().
+	 */
+	return can_set_direct_map() && cpus_have_final_cap(ARM64_HAS_STAGE2_FWB);
+}
+#define kvm_arch_gmem_supports_no_direct_map kvm_arch_gmem_supports_no_direct_map
+#endif /* CONFIG_KVM_GUEST_MEMFD */
 
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8b47891adca1..37553848e078 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -36,6 +36,7 @@
 #include <linux/rbtree.h>
 #include <linux/xarray.h>
 #include <asm/signal.h>
+#include <linux/set_memory.h>
 
 #include <linux/kvm.h>
 #include <linux/kvm_para.h>
@@ -731,6 +732,12 @@ static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
 bool kvm_arch_supports_gmem_mmap(struct kvm *kvm);
 #endif
 
+#ifdef CONFIG_KVM_GUEST_MEMFD
+#ifndef kvm_arch_gmem_supports_no_direct_map
+#define kvm_arch_gmem_supports_no_direct_map can_set_direct_map
+#endif
+#endif /* CONFIG_KVM_GUEST_MEMFD */
+
 #ifndef kvm_arch_has_readonly_mem
 static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
 {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6efa98a57ec1..33c8e8946019 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -963,6 +963,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_RISCV_MP_STATE_RESET 242
 #define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243
 #define KVM_CAP_GUEST_MEMFD_MMAP 244
+#define KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP 245
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
@@ -1600,6 +1601,7 @@ struct kvm_memory_attributes {
 
 #define KVM_CREATE_GUEST_MEMFD	_IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
 #define GUEST_MEMFD_FLAG_MMAP	(1ULL << 0)
+#define GUEST_MEMFD_FLAG_NO_DIRECT_MAP (1ULL << 1)
 
 struct kvm_create_guest_memfd {
 	__u64 size;
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 9ec4c45e3cf2..e3696880405c 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -4,6 +4,7 @@
 #include <linux/kvm_host.h>
 #include <linux/pagemap.h>
 #include <linux/anon_inodes.h>
+#include <linux/set_memory.h>
 
 #include "kvm_mm.h"
 
@@ -42,8 +43,18 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo
 	return 0;
 }
 
+static bool kvm_gmem_test_no_direct_map(struct inode *inode)
+{
+	return ((unsigned long) inode->i_private) & GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
+}
+
 static inline void kvm_gmem_mark_prepared(struct folio *folio)
 {
+	struct inode *inode = folio_inode(folio);
+
+	if (kvm_gmem_test_no_direct_map(inode))
+		set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio), false);
+
 	folio_mark_uptodate(folio);
 }
 
@@ -429,25 +440,29 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
 	return MF_DELAYED;
 }
 
-#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 static void kvm_gmem_free_folio(struct address_space *mapping,
 				struct folio *folio)
 {
 	struct page *page = folio_page(folio, 0);
+
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 	kvm_pfn_t pfn = page_to_pfn(page);
 	int order = folio_order(folio);
+#endif
 
+	if (kvm_gmem_test_no_direct_map(mapping->host))
+		WARN_ON_ONCE(set_direct_map_valid_noflush(page, folio_nr_pages(folio), true));
+
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 	kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
-}
 #endif
+}
 
 static const struct address_space_operations kvm_gmem_aops = {
 	.dirty_folio = noop_dirty_folio,
 	.migrate_folio	= kvm_gmem_migrate_folio,
 	.error_remove_folio = kvm_gmem_error_folio,
-#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 	.free_folio = kvm_gmem_free_folio,
-#endif
 };
 
 static int kvm_gmem_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
@@ -504,6 +519,9 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
 	/* Unmovable mappings are supposed to be marked unevictable as well. */
 	WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
 
+	if (flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP)
+		mapping_set_no_direct_map(inode->i_mapping);
+
 	kvm_get_kvm(kvm);
 	gmem->kvm = kvm;
 	xa_init(&gmem->bindings);
@@ -528,6 +546,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
 	if (kvm_arch_supports_gmem_mmap(kvm))
 		valid_flags |= GUEST_MEMFD_FLAG_MMAP;
 
+	if (kvm_arch_gmem_supports_no_direct_map())
+		valid_flags |= GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
+
 	if (flags & ~valid_flags)
 		return -EINVAL;
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 18f29ef93543..0dbfd17e1191 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -65,6 +65,7 @@
 #include <trace/events/kvm.h>
 
 #include <linux/kvm_dirty_ring.h>
+#include <linux/set_memory.h>
 
 
 /* Worst case buffer size needed for holding an integer. */
@@ -4916,6 +4917,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 		return kvm_supported_mem_attributes(kvm);
 #endif
 #ifdef CONFIG_KVM_GUEST_MEMFD
+	case KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP:
+		if (!can_set_direct_map())
+			return false;
+		fallthrough;
 	case KVM_CAP_GUEST_MEMFD:
 		return 1;
 	case KVM_CAP_GUEST_MEMFD_MMAP:
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 05/12] KVM: Documentation: describe GUEST_MEMFD_FLAG_NO_DIRECT_MAP
  2025-08-28  9:39 [PATCH v5 00/12] Direct Map Removal Support for guest_memfd Roy, Patrick
                   ` (3 preceding siblings ...)
  2025-08-28  9:39 ` [PATCH v5 04/12] KVM: guest_memfd: Add flag to remove from direct map Roy, Patrick
@ 2025-08-28  9:39 ` Roy, Patrick
  2025-08-28 10:27   ` David Hildenbrand
  2025-08-28  9:39 ` [PATCH v5 06/12] KVM: selftests: load elf via bounce buffer Roy, Patrick
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Roy, Patrick @ 2025-08-28  9:39 UTC (permalink / raw)
  To: david@redhat.com, seanjc@google.com
  Cc: Roy, Patrick, tabba@google.com, ackerleytng@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
---
 Documentation/virt/kvm/api.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index c17a87a0a5ac..b52c14d58798 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6418,6 +6418,11 @@ When the capability KVM_CAP_GUEST_MEMFD_MMAP is supported, the 'flags' field
 supports GUEST_MEMFD_FLAG_MMAP.  Setting this flag on guest_memfd creation
 enables mmap() and faulting of guest_memfd memory to host userspace.
 
+When the capability KVM_CAP_GMEM_NO_DIRECT_MAP is supported, the 'flags' field
+supports GUEST_MEMFG_FLAG_NO_DIRECT_MAP. Setting this flag makes the guest_memfd
+instance behave similarly to memfd_secret, and unmaps the memory backing it from
+the kernel's address space after allocation.
+
 When the KVM MMU performs a PFN lookup to service a guest fault and the backing
 guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
 consumed from guest_memfd, regardless of whether it is a shared or a private
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 06/12] KVM: selftests: load elf via bounce buffer
  2025-08-28  9:39 [PATCH v5 00/12] Direct Map Removal Support for guest_memfd Roy, Patrick
                   ` (4 preceding siblings ...)
  2025-08-28  9:39 ` [PATCH v5 05/12] KVM: Documentation: describe GUEST_MEMFD_FLAG_NO_DIRECT_MAP Roy, Patrick
@ 2025-08-28  9:39 ` Roy, Patrick
  2025-08-28  9:39 ` [PATCH v5 07/12] KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1 Roy, Patrick
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Roy, Patrick @ 2025-08-28  9:39 UTC (permalink / raw)
  To: david@redhat.com, seanjc@google.com
  Cc: Roy, Patrick, tabba@google.com, ackerleytng@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek

If guest memory is backed using a VMA that does not allow GUP (e.g. a
userspace mapping of guest_memfd when the fd was allocated using
KVM_GMEM_NO_DIRECT_MAP), then directly loading the test ELF binary into
it via read(2) potentially does not work. To nevertheless support
loading binaries in this cases, do the read(2) syscall using a bounce
buffer, and then memcpy from the bounce buffer into guest memory.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
---
 .../testing/selftests/kvm/include/test_util.h |  1 +
 tools/testing/selftests/kvm/lib/elf.c         |  8 +++----
 tools/testing/selftests/kvm/lib/io.c          | 23 +++++++++++++++++++
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
index c6ef895fbd9a..0409b7b96c94 100644
--- a/tools/testing/selftests/kvm/include/test_util.h
+++ b/tools/testing/selftests/kvm/include/test_util.h
@@ -46,6 +46,7 @@ do {								\
 
 ssize_t test_write(int fd, const void *buf, size_t count);
 ssize_t test_read(int fd, void *buf, size_t count);
+ssize_t test_read_bounce(int fd, void *buf, size_t count);
 int test_seq_read(const char *path, char **bufp, size_t *sizep);
 
 void __printf(5, 6) test_assert(bool exp, const char *exp_str,
diff --git a/tools/testing/selftests/kvm/lib/elf.c b/tools/testing/selftests/kvm/lib/elf.c
index f34d926d9735..e829fbe0a11e 100644
--- a/tools/testing/selftests/kvm/lib/elf.c
+++ b/tools/testing/selftests/kvm/lib/elf.c
@@ -31,7 +31,7 @@ static void elfhdr_get(const char *filename, Elf64_Ehdr *hdrp)
 	 * the real size of the ELF header.
 	 */
 	unsigned char ident[EI_NIDENT];
-	test_read(fd, ident, sizeof(ident));
+	test_read_bounce(fd, ident, sizeof(ident));
 	TEST_ASSERT((ident[EI_MAG0] == ELFMAG0) && (ident[EI_MAG1] == ELFMAG1)
 		&& (ident[EI_MAG2] == ELFMAG2) && (ident[EI_MAG3] == ELFMAG3),
 		"ELF MAGIC Mismatch,\n"
@@ -79,7 +79,7 @@ static void elfhdr_get(const char *filename, Elf64_Ehdr *hdrp)
 	offset_rv = lseek(fd, 0, SEEK_SET);
 	TEST_ASSERT(offset_rv == 0, "Seek to ELF header failed,\n"
 		"  rv: %zi expected: %i", offset_rv, 0);
-	test_read(fd, hdrp, sizeof(*hdrp));
+	test_read_bounce(fd, hdrp, sizeof(*hdrp));
 	TEST_ASSERT(hdrp->e_phentsize == sizeof(Elf64_Phdr),
 		"Unexpected physical header size,\n"
 		"  hdrp->e_phentsize: %x\n"
@@ -146,7 +146,7 @@ void kvm_vm_elf_load(struct kvm_vm *vm, const char *filename)
 
 		/* Read in the program header. */
 		Elf64_Phdr phdr;
-		test_read(fd, &phdr, sizeof(phdr));
+		test_read_bounce(fd, &phdr, sizeof(phdr));
 
 		/* Skip if this header doesn't describe a loadable segment. */
 		if (phdr.p_type != PT_LOAD)
@@ -187,7 +187,7 @@ void kvm_vm_elf_load(struct kvm_vm *vm, const char *filename)
 				"  expected: 0x%jx",
 				n1, errno, (intmax_t) offset_rv,
 				(intmax_t) phdr.p_offset);
-			test_read(fd, addr_gva2hva(vm, phdr.p_vaddr),
+			test_read_bounce(fd, addr_gva2hva(vm, phdr.p_vaddr),
 				phdr.p_filesz);
 		}
 	}
diff --git a/tools/testing/selftests/kvm/lib/io.c b/tools/testing/selftests/kvm/lib/io.c
index fedb2a741f0b..74419becc8bc 100644
--- a/tools/testing/selftests/kvm/lib/io.c
+++ b/tools/testing/selftests/kvm/lib/io.c
@@ -155,3 +155,26 @@ ssize_t test_read(int fd, void *buf, size_t count)
 
 	return num_read;
 }
+
+/* Test read via intermediary buffer
+ *
+ * Same as test_read, except read(2)s happen into a bounce buffer that is memcpy'd
+ * to buf. For use with buffers that cannot be GUP'd (e.g. guest_memfd VMAs if
+ * guest_memfd was created with GUEST_MEMFD_FLAG_NO_DIRECT_MAP).
+ */
+ssize_t test_read_bounce(int fd, void *buf, size_t count)
+{
+	void *bounce_buffer;
+	ssize_t num_read;
+
+	TEST_ASSERT(count >= 0, "Unexpected count, count: %li", count);
+
+	bounce_buffer = malloc(count);
+	TEST_ASSERT(bounce_buffer != NULL, "Failed to allocate bounce buffer");
+
+	num_read = test_read(fd, bounce_buffer, count);
+	memcpy(buf, bounce_buffer, num_read);
+	free(bounce_buffer);
+
+	return num_read;
+}
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 07/12] KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1
  2025-08-28  9:39 [PATCH v5 00/12] Direct Map Removal Support for guest_memfd Roy, Patrick
                   ` (5 preceding siblings ...)
  2025-08-28  9:39 ` [PATCH v5 06/12] KVM: selftests: load elf via bounce buffer Roy, Patrick
@ 2025-08-28  9:39 ` Roy, Patrick
  2025-08-28  9:39 ` [PATCH v5 08/12] KVM: selftests: Add guest_memfd based vm_mem_backing_src_types Roy, Patrick
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Roy, Patrick @ 2025-08-28  9:39 UTC (permalink / raw)
  To: david@redhat.com, seanjc@google.com
  Cc: Roy, Patrick, tabba@google.com, ackerleytng@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek

Have vm_mem_add() always set KVM_MEM_GUEST_MEMFD in the memslot flags if
a guest_memfd is passed in as an argument. This eliminates the
possibility where a guest_memfd instance is passed to vm_mem_add(), but
it ends up being ignored because the flags argument does not specify
KVM_MEM_GUEST_MEMFD at the same time.

This makes it easy to support more scenarios in which no vm_mem_add() is
not passed a guest_memfd instance, but is expected to allocate one.
Currently, this only happens if guest_memfd == -1 but flags &
KVM_MEM_GUEST_MEMFD != 0, but later vm_mem_add() will gain support for
loading the test code itself into guest_memfd (via
GUEST_MEMFD_FLAG_MMAP) if requested via a special
vm_mem_backing_src_type, at which point having to make sure the src_type
and flags are in-sync becomes cumbersome.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
---
 tools/testing/selftests/kvm/lib/kvm_util.c | 26 +++++++++++++---------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index c3f5142b0a54..cc67dfecbf65 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1107,22 +1107,26 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
 
 	region->backing_src_type = src_type;
 
-	if (flags & KVM_MEM_GUEST_MEMFD) {
-		if (guest_memfd < 0) {
+	if (guest_memfd < 0) {
+		if (flags & KVM_MEM_GUEST_MEMFD) {
 			uint32_t guest_memfd_flags = 0;
 			TEST_ASSERT(!guest_memfd_offset,
 				    "Offset must be zero when creating new guest_memfd");
 			guest_memfd = vm_create_guest_memfd(vm, mem_size, guest_memfd_flags);
-		} else {
-			/*
-			 * Install a unique fd for each memslot so that the fd
-			 * can be closed when the region is deleted without
-			 * needing to track if the fd is owned by the framework
-			 * or by the caller.
-			 */
-			guest_memfd = dup(guest_memfd);
-			TEST_ASSERT(guest_memfd >= 0, __KVM_SYSCALL_ERROR("dup()", guest_memfd));
 		}
+	} else {
+		/*
+		 * Install a unique fd for each memslot so that the fd
+		 * can be closed when the region is deleted without
+		 * needing to track if the fd is owned by the framework
+		 * or by the caller.
+		 */
+		guest_memfd = dup(guest_memfd);
+		TEST_ASSERT(guest_memfd >= 0, __KVM_SYSCALL_ERROR("dup()", guest_memfd));
+	}
+
+	if (guest_memfd > 0) {
+		flags |= KVM_MEM_GUEST_MEMFD;
 
 		region->region.guest_memfd = guest_memfd;
 		region->region.guest_memfd_offset = guest_memfd_offset;
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 08/12] KVM: selftests: Add guest_memfd based vm_mem_backing_src_types
  2025-08-28  9:39 [PATCH v5 00/12] Direct Map Removal Support for guest_memfd Roy, Patrick
                   ` (6 preceding siblings ...)
  2025-08-28  9:39 ` [PATCH v5 07/12] KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1 Roy, Patrick
@ 2025-08-28  9:39 ` Roy, Patrick
  2025-08-28  9:39 ` [PATCH v5 09/12] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape Roy, Patrick
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Roy, Patrick @ 2025-08-28  9:39 UTC (permalink / raw)
  To: david@redhat.com, seanjc@google.com
  Cc: Roy, Patrick, tabba@google.com, ackerleytng@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek

Allow selftests to configure their memslots such that userspace_addr is
set to a MAP_SHARED mapping of the guest_memfd that's associated with
the memslot. This setup is the configuration for non-CoCo VMs, where all
guest memory is backed by a guest_memfd whose folios are all marked
shared, but KVM is still able to access guest memory to provide
functionality such as MMIO emulation on x86.

Add backing types for normal guest_memfd, as well as direct map removed
guest_memfd.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
---
 .../testing/selftests/kvm/include/kvm_util.h  | 18 ++++++
 .../testing/selftests/kvm/include/test_util.h |  7 +++
 tools/testing/selftests/kvm/lib/kvm_util.c    | 63 ++++++++++---------
 tools/testing/selftests/kvm/lib/test_util.c   |  8 +++
 4 files changed, 66 insertions(+), 30 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 23a506d7eca3..5204a0a18a7f 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -635,6 +635,24 @@ static inline bool is_smt_on(void)
 
 void vm_create_irqchip(struct kvm_vm *vm);
 
+static inline uint32_t backing_src_guest_memfd_flags(enum vm_mem_backing_src_type t)
+{
+	uint32_t flags = 0;
+
+	switch (t) {
+	case VM_MEM_SRC_GUEST_MEMFD:
+		flags |= GUEST_MEMFD_FLAG_MMAP;
+		fallthrough;
+	case VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP:
+		flags |= GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
+		break;
+	default:
+		break;
+	}
+
+	return flags;
+}
+
 static inline int __vm_create_guest_memfd(struct kvm_vm *vm, uint64_t size,
 					uint64_t flags)
 {
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
index 0409b7b96c94..a56e53fc7b39 100644
--- a/tools/testing/selftests/kvm/include/test_util.h
+++ b/tools/testing/selftests/kvm/include/test_util.h
@@ -133,6 +133,8 @@ enum vm_mem_backing_src_type {
 	VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB,
 	VM_MEM_SRC_SHMEM,
 	VM_MEM_SRC_SHARED_HUGETLB,
+	VM_MEM_SRC_GUEST_MEMFD,
+	VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP,
 	NUM_SRC_TYPES,
 };
 
@@ -165,6 +167,11 @@ static inline bool backing_src_is_shared(enum vm_mem_backing_src_type t)
 	return vm_mem_backing_src_alias(t)->flag & MAP_SHARED;
 }
 
+static inline bool backing_src_is_guest_memfd(enum vm_mem_backing_src_type t)
+{
+	return t == VM_MEM_SRC_GUEST_MEMFD || t == VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP;
+}
+
 static inline bool backing_src_can_be_huge(enum vm_mem_backing_src_type t)
 {
 	return t != VM_MEM_SRC_ANONYMOUS && t != VM_MEM_SRC_SHMEM;
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index cc67dfecbf65..a81089f7c83f 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1060,6 +1060,34 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
 	alignment = 1;
 #endif
 
+	if (guest_memfd < 0) {
+		if ((flags & KVM_MEM_GUEST_MEMFD) || backing_src_is_guest_memfd(src_type)) {
+			uint32_t guest_memfd_flags = backing_src_guest_memfd_flags(src_type);
+
+			TEST_ASSERT(!guest_memfd_offset,
+				    "Offset must be zero when creating new guest_memfd");
+			guest_memfd = vm_create_guest_memfd(vm, mem_size, guest_memfd_flags);
+		}
+	} else {
+		/*
+		 * Install a unique fd for each memslot so that the fd
+		 * can be closed when the region is deleted without
+		 * needing to track if the fd is owned by the framework
+		 * or by the caller.
+		 */
+		guest_memfd = dup(guest_memfd);
+		TEST_ASSERT(guest_memfd >= 0, __KVM_SYSCALL_ERROR("dup()", guest_memfd));
+	}
+
+	if (guest_memfd > 0) {
+		flags |= KVM_MEM_GUEST_MEMFD;
+
+		region->region.guest_memfd = guest_memfd;
+		region->region.guest_memfd_offset = guest_memfd_offset;
+	} else {
+		region->region.guest_memfd = -1;
+	}
+
 	/*
 	 * When using THP mmap is not guaranteed to returned a hugepage aligned
 	 * address so we have to pad the mmap. Padding is not needed for HugeTLB
@@ -1075,10 +1103,13 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
 	if (alignment > 1)
 		region->mmap_size += alignment;
 
-	region->fd = -1;
-	if (backing_src_is_shared(src_type))
+	if (backing_src_is_guest_memfd(src_type))
+		region->fd = guest_memfd;
+	else if (backing_src_is_shared(src_type))
 		region->fd = kvm_memfd_alloc(region->mmap_size,
 					     src_type == VM_MEM_SRC_SHARED_HUGETLB);
+	else
+		region->fd = -1;
 
 	region->mmap_start = mmap(NULL, region->mmap_size,
 				  PROT_READ | PROT_WRITE,
@@ -1106,34 +1137,6 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
 	}
 
 	region->backing_src_type = src_type;
-
-	if (guest_memfd < 0) {
-		if (flags & KVM_MEM_GUEST_MEMFD) {
-			uint32_t guest_memfd_flags = 0;
-			TEST_ASSERT(!guest_memfd_offset,
-				    "Offset must be zero when creating new guest_memfd");
-			guest_memfd = vm_create_guest_memfd(vm, mem_size, guest_memfd_flags);
-		}
-	} else {
-		/*
-		 * Install a unique fd for each memslot so that the fd
-		 * can be closed when the region is deleted without
-		 * needing to track if the fd is owned by the framework
-		 * or by the caller.
-		 */
-		guest_memfd = dup(guest_memfd);
-		TEST_ASSERT(guest_memfd >= 0, __KVM_SYSCALL_ERROR("dup()", guest_memfd));
-	}
-
-	if (guest_memfd > 0) {
-		flags |= KVM_MEM_GUEST_MEMFD;
-
-		region->region.guest_memfd = guest_memfd;
-		region->region.guest_memfd_offset = guest_memfd_offset;
-	} else {
-		region->region.guest_memfd = -1;
-	}
-
 	region->unused_phy_pages = sparsebit_alloc();
 	if (vm_arch_has_protected_memory(vm))
 		region->protected_phy_pages = sparsebit_alloc();
diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c
index 03eb99af9b8d..b2baee680083 100644
--- a/tools/testing/selftests/kvm/lib/test_util.c
+++ b/tools/testing/selftests/kvm/lib/test_util.c
@@ -299,6 +299,14 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i)
 			 */
 			.flag = MAP_SHARED,
 		},
+		[VM_MEM_SRC_GUEST_MEMFD] = {
+			.name = "guest_memfd",
+			.flag = MAP_SHARED,
+		},
+		[VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP] = {
+			.name = "guest_memfd_no_direct_map",
+			.flag = MAP_SHARED,
+		}
 	};
 	_Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES,
 		       "Missing new backing src types?");
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 09/12] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape
  2025-08-28  9:39 [PATCH v5 00/12] Direct Map Removal Support for guest_memfd Roy, Patrick
                   ` (7 preceding siblings ...)
  2025-08-28  9:39 ` [PATCH v5 08/12] KVM: selftests: Add guest_memfd based vm_mem_backing_src_types Roy, Patrick
@ 2025-08-28  9:39 ` Roy, Patrick
  2025-08-28  9:39 ` [PATCH v5 10/12] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in mem conversion tests Roy, Patrick
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Roy, Patrick @ 2025-08-28  9:39 UTC (permalink / raw)
  To: david@redhat.com, seanjc@google.com
  Cc: Roy, Patrick, tabba@google.com, ackerleytng@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek

Use one of the padding fields in struct vm_shape to carry an enum
vm_mem_backing_src_type value, to give the option to overwrite the
default of VM_MEM_SRC_ANONYMOUS in __vm_create().

Overwriting this default will allow tests to create VMs where the test
code is backed by mmap'd guest_memfd instead of anonymous memory.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
---
 .../testing/selftests/kvm/include/kvm_util.h  | 19 ++++++++++---------
 tools/testing/selftests/kvm/lib/kvm_util.c    |  2 +-
 tools/testing/selftests/kvm/lib/x86/sev.c     |  1 +
 .../selftests/kvm/pre_fault_memory_test.c     |  1 +
 4 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 5204a0a18a7f..8baa0bbacd09 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -188,7 +188,7 @@ enum vm_guest_mode {
 struct vm_shape {
 	uint32_t type;
 	uint8_t  mode;
-	uint8_t  pad0;
+	uint8_t  src_type;
 	uint16_t pad1;
 };
 
@@ -196,14 +196,15 @@ kvm_static_assert(sizeof(struct vm_shape) == sizeof(uint64_t));
 
 #define VM_TYPE_DEFAULT			0
 
-#define VM_SHAPE(__mode)			\
-({						\
-	struct vm_shape shape = {		\
-		.mode = (__mode),		\
-		.type = VM_TYPE_DEFAULT		\
-	};					\
-						\
-	shape;					\
+#define VM_SHAPE(__mode)				\
+({							\
+	struct vm_shape shape = {			\
+		.mode	  = (__mode),			\
+		.type	  = VM_TYPE_DEFAULT,		\
+		.src_type = VM_MEM_SRC_ANONYMOUS	\
+	};						\
+							\
+	shape;						\
 })
 
 #if defined(__aarch64__)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index a81089f7c83f..3a22794bd959 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -495,7 +495,7 @@ struct kvm_vm *__vm_create(struct vm_shape shape, uint32_t nr_runnable_vcpus,
 	if (is_guest_memfd_required(shape))
 		flags |= KVM_MEM_GUEST_MEMFD;
 
-	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, flags);
+	vm_userspace_mem_region_add(vm, shape.src_type, 0, 0, nr_pages, flags);
 	for (i = 0; i < NR_MEM_REGIONS; i++)
 		vm->memslots[i] = 0;
 
diff --git a/tools/testing/selftests/kvm/lib/x86/sev.c b/tools/testing/selftests/kvm/lib/x86/sev.c
index c3a9838f4806..d920880e4fc0 100644
--- a/tools/testing/selftests/kvm/lib/x86/sev.c
+++ b/tools/testing/selftests/kvm/lib/x86/sev.c
@@ -164,6 +164,7 @@ struct kvm_vm *vm_sev_create_with_one_vcpu(uint32_t type, void *guest_code,
 	struct vm_shape shape = {
 		.mode = VM_MODE_DEFAULT,
 		.type = type,
+		.src_type = VM_MEM_SRC_ANONYMOUS,
 	};
 	struct kvm_vm *vm;
 	struct kvm_vcpu *cpus[1];
diff --git a/tools/testing/selftests/kvm/pre_fault_memory_test.c b/tools/testing/selftests/kvm/pre_fault_memory_test.c
index 0350a8896a2f..d403f8d2f26f 100644
--- a/tools/testing/selftests/kvm/pre_fault_memory_test.c
+++ b/tools/testing/selftests/kvm/pre_fault_memory_test.c
@@ -68,6 +68,7 @@ static void __test_pre_fault_memory(unsigned long vm_type, bool private)
 	const struct vm_shape shape = {
 		.mode = VM_MODE_DEFAULT,
 		.type = vm_type,
+		.src_type = VM_MEM_SRC_ANONYMOUS,
 	};
 	struct kvm_vcpu *vcpu;
 	struct kvm_run *run;
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 10/12] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in mem conversion tests
  2025-08-28  9:39 [PATCH v5 00/12] Direct Map Removal Support for guest_memfd Roy, Patrick
                   ` (8 preceding siblings ...)
  2025-08-28  9:39 ` [PATCH v5 09/12] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape Roy, Patrick
@ 2025-08-28  9:39 ` Roy, Patrick
  2025-08-28  9:39 ` [PATCH v5 11/12] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in guest_memfd_test.c Roy, Patrick
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Roy, Patrick @ 2025-08-28  9:39 UTC (permalink / raw)
  To: david@redhat.com, seanjc@google.com
  Cc: Roy, Patrick, tabba@google.com, ackerleytng@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek

Cover the scenario that the guest can fault in and write gmem-backed
guest memory even if its direct map removed.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
---
 .../selftests/kvm/x86/private_mem_conversions_test.c       | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
index 82a8d88b5338..8427d9fbdb23 100644
--- a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
+++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
@@ -367,7 +367,7 @@ static void *__test_mem_conversions(void *__vcpu)
 }
 
 static void test_mem_conversions(enum vm_mem_backing_src_type src_type, uint32_t nr_vcpus,
-				 uint32_t nr_memslots)
+				 uint32_t nr_memslots, uint64_t gmem_flags)
 {
 	/*
 	 * Allocate enough memory so that each vCPU's chunk of memory can be
@@ -394,7 +394,7 @@ static void test_mem_conversions(enum vm_mem_backing_src_type src_type, uint32_t
 
 	vm_enable_cap(vm, KVM_CAP_EXIT_HYPERCALL, (1 << KVM_HC_MAP_GPA_RANGE));
 
-	memfd = vm_create_guest_memfd(vm, memfd_size, 0);
+	memfd = vm_create_guest_memfd(vm, memfd_size, gmem_flags);
 
 	for (i = 0; i < nr_memslots; i++)
 		vm_mem_add(vm, src_type, BASE_DATA_GPA + slot_size * i,
@@ -477,7 +477,8 @@ int main(int argc, char *argv[])
 		}
 	}
 
-	test_mem_conversions(src_type, nr_vcpus, nr_memslots);
+	test_mem_conversions(src_type, nr_vcpus, nr_memslots, 0);
+	test_mem_conversions(src_type, nr_vcpus, nr_memslots, GUEST_MEMFD_FLAG_NO_DIRECT_MAP);
 
 	return 0;
 }
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 11/12] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in guest_memfd_test.c
  2025-08-28  9:39 [PATCH v5 00/12] Direct Map Removal Support for guest_memfd Roy, Patrick
                   ` (9 preceding siblings ...)
  2025-08-28  9:39 ` [PATCH v5 10/12] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in mem conversion tests Roy, Patrick
@ 2025-08-28  9:39 ` Roy, Patrick
  2025-08-28 10:26   ` David Hildenbrand
  2025-08-28  9:39 ` [PATCH v5 12/12] KVM: selftests: Test guest execution from direct map removed gmem Roy, Patrick
  2025-08-28 12:50 ` [PATCH v5 00/12] Direct Map Removal Support for guest_memfd David Hildenbrand
  12 siblings, 1 reply; 38+ messages in thread
From: Roy, Patrick @ 2025-08-28  9:39 UTC (permalink / raw)
  To: david@redhat.com, seanjc@google.com
  Cc: Roy, Patrick, tabba@google.com, ackerleytng@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
---
 tools/testing/selftests/kvm/guest_memfd_test.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index b3ca6737f304..1187438b6831 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -275,6 +275,8 @@ static void test_guest_memfd(unsigned long vm_type)
 
 	if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
 		flags |= GUEST_MEMFD_FLAG_MMAP;
+	if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP))
+		flags |= GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
 
 	test_create_guest_memfd_multiple(vm);
 	test_create_guest_memfd_invalid_sizes(vm, flags, page_size);
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 12/12] KVM: selftests: Test guest execution from direct map removed gmem
  2025-08-28  9:39 [PATCH v5 00/12] Direct Map Removal Support for guest_memfd Roy, Patrick
                   ` (10 preceding siblings ...)
  2025-08-28  9:39 ` [PATCH v5 11/12] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in guest_memfd_test.c Roy, Patrick
@ 2025-08-28  9:39 ` Roy, Patrick
  2025-08-28 12:50 ` [PATCH v5 00/12] Direct Map Removal Support for guest_memfd David Hildenbrand
  12 siblings, 0 replies; 38+ messages in thread
From: Roy, Patrick @ 2025-08-28  9:39 UTC (permalink / raw)
  To: david@redhat.com, seanjc@google.com
  Cc: Roy, Patrick, tabba@google.com, ackerleytng@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek

Add a selftest that loads itself into guest_memfd (via
GUEST_MEMFD_FLAG_MMAP) and triggers an MMIO exit when executed. This
exercises x86 MMIO emulation code inside KVM for guest_memfd-backed
memslots where the guest_memfd folios are direct map removed.
Particularly, it validates that x86 MMIO emulation code (guest page
table walks + instruction fetch) correctly accesses gmem through the VMA
that's been reflected into the memslot's userspace_addr field (instead
of trying to do direct map accesses).

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
---
 .../selftests/kvm/set_memory_region_test.c    | 50 +++++++++++++++++--
 1 file changed, 46 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c
index ce3ac0fd6dfb..cb3bc642d376 100644
--- a/tools/testing/selftests/kvm/set_memory_region_test.c
+++ b/tools/testing/selftests/kvm/set_memory_region_test.c
@@ -603,6 +603,41 @@ static void test_mmio_during_vectoring(void)
 
 	kvm_vm_free(vm);
 }
+
+static void guest_code_trigger_mmio(void)
+{
+	/*
+	 * Read some GPA that is not backed by a memslot. KVM consider this
+	 * as MMIO and tell userspace to emulate the read.
+	 */
+	READ_ONCE(*((uint64_t *)MEM_REGION_GPA));
+
+	GUEST_DONE();
+}
+
+static void test_guest_memfd_mmio(void)
+{
+	struct kvm_vm *vm;
+	struct kvm_vcpu *vcpu;
+	struct vm_shape shape = {
+		.mode = VM_MODE_DEFAULT,
+		.src_type = VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP,
+	};
+	pthread_t vcpu_thread;
+
+	pr_info("Testing MMIO emulation for instructions in gmem\n");
+
+	vm = __vm_create_shape_with_one_vcpu(shape, &vcpu, 0, guest_code_trigger_mmio);
+
+	virt_map(vm, MEM_REGION_GPA, MEM_REGION_GPA, 1);
+
+	pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
+
+	/* If the MMIO read was successfully emulated, the vcpu thread will exit */
+	pthread_join(vcpu_thread, NULL);
+
+	kvm_vm_free(vm);
+}
 #endif
 
 int main(int argc, char *argv[])
@@ -626,10 +661,17 @@ int main(int argc, char *argv[])
 	test_add_max_memory_regions();
 
 #ifdef __x86_64__
-	if (kvm_has_cap(KVM_CAP_GUEST_MEMFD) &&
-	    (kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM))) {
-		test_add_private_memory_region();
-		test_add_overlapping_private_memory_regions();
+	if (kvm_has_cap(KVM_CAP_GUEST_MEMFD)) {
+		if (kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM)) {
+			test_add_private_memory_region();
+			test_add_overlapping_private_memory_regions();
+		}
+
+		if (kvm_has_cap(KVM_CAP_GUEST_MEMFD_MMAP) &&
+			kvm_has_cap(KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP))
+			test_guest_memfd_mmio();
+		else
+			pr_info("Skipping tests requiring KVM_CAP_GUEST_MEMFD_MMAP | KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP");
 	} else {
 		pr_info("Skipping tests for KVM_MEM_GUEST_MEMFD memory regions\n");
 	}
-- 
2.50.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 02/12] arch: export set_direct_map_valid_noflush to KVM module
  2025-08-28  9:39 ` [PATCH v5 02/12] arch: export set_direct_map_valid_noflush to KVM module Roy, Patrick
@ 2025-08-28 10:07   ` Fuad Tabba
  2025-09-01 13:47     ` Roy, Patrick
  0 siblings, 1 reply; 38+ messages in thread
From: Fuad Tabba @ 2025-08-28 10:07 UTC (permalink / raw)
  To: Roy, Patrick
  Cc: david@redhat.com, seanjc@google.com, ackerleytng@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek

Hi Patrick,

On Thu, 28 Aug 2025 at 10:39, Roy, Patrick <roypat@amazon.co.uk> wrote:
>
> Use the new per-module export functionality to allow KVM (and only KVM)
> access to set_direct_map_valid_noflush(). This allows guest_memfd to
> remove its memory from the direct map, even if KVM is built as a module.
>
> Direct map removal gives guest_memfd the same protection that
> memfd_secret enjoys, such as hardening against Spectre-like attacks
> through in-kernel gadgets.
>
> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
> ---
>  arch/arm64/mm/pageattr.c     | 1 +
>  arch/loongarch/mm/pageattr.c | 1 +
>  arch/riscv/mm/pageattr.c     | 1 +
>  arch/s390/mm/pageattr.c      | 1 +
>  arch/x86/mm/pat/set_memory.c | 1 +
>  5 files changed, 5 insertions(+)
>
> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> index 04d4a8f676db..4f3cddfab9b0 100644
> --- a/arch/arm64/mm/pageattr.c
> +++ b/arch/arm64/mm/pageattr.c
> @@ -291,6 +291,7 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
>
>         return set_memory_valid(addr, nr, valid);
>  }
> +EXPORT_SYMBOL_FOR_MODULES(set_direct_map_valid_noflush, "kvm");
>
>  #ifdef CONFIG_DEBUG_PAGEALLOC
>  /*
> diff --git a/arch/loongarch/mm/pageattr.c b/arch/loongarch/mm/pageattr.c
> index f5e910b68229..d076bfd3fcbf 100644
> --- a/arch/loongarch/mm/pageattr.c
> +++ b/arch/loongarch/mm/pageattr.c
> @@ -217,6 +217,7 @@ int set_direct_map_invalid_noflush(struct page *page)
>
>         return __set_memory(addr, 1, __pgprot(0), __pgprot(_PAGE_PRESENT | _PAGE_VALID));
>  }
> +EXPORT_SYMBOL_FOR_MODULES(set_direct_map_valid_noflush, "kvm");

This should be after 'set_direct_map_valid_noflush', not 'invalid'.

With that fixed:

Reviewed-by: Fuad Tabba <tabba@google.com>

>  int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
>  {
> diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
> index 3f76db3d2769..6db31040cd66 100644
> --- a/arch/riscv/mm/pageattr.c
> +++ b/arch/riscv/mm/pageattr.c
> @@ -400,6 +400,7 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
>
>         return __set_memory((unsigned long)page_address(page), nr, set, clear);
>  }
> +EXPORT_SYMBOL_FOR_MODULES(set_direct_map_valid_noflush, "kvm");
>
>  #ifdef CONFIG_DEBUG_PAGEALLOC
>  static int debug_pagealloc_set_page(pte_t *pte, unsigned long addr, void *data)
> diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c
> index 348e759840e7..8ffd9ef09bc6 100644
> --- a/arch/s390/mm/pageattr.c
> +++ b/arch/s390/mm/pageattr.c
> @@ -413,6 +413,7 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
>
>         return __set_memory((unsigned long)page_to_virt(page), nr, flags);
>  }
> +EXPORT_SYMBOL_FOR_MODULES(set_direct_map_valid_noflush, "kvm");
>
>  bool kernel_page_present(struct page *page)
>  {
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 8834c76f91c9..87e9c7d2dcdc 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -2661,6 +2661,7 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
>
>         return __set_pages_np(page, nr);
>  }
> +EXPORT_SYMBOL_FOR_MODULES(set_direct_map_valid_noflush, "kvm");
>
>  #ifdef CONFIG_DEBUG_PAGEALLOC
>  void __kernel_map_pages(struct page *page, int numpages, int enable)
> --
> 2.50.1
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
  2025-08-28  9:39 ` [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP Roy, Patrick
@ 2025-08-28 10:21   ` Fuad Tabba
  2025-09-01 13:54     ` Roy, Patrick
  2025-08-28 14:26   ` Mike Rapoport
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 38+ messages in thread
From: Fuad Tabba @ 2025-08-28 10:21 UTC (permalink / raw)
  To: Roy, Patrick
  Cc: david@redhat.com, seanjc@google.com, ackerleytng@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek

Hi Patrick,

On Thu, 28 Aug 2025 at 10:39, Roy, Patrick <roypat@amazon.co.uk> wrote:
>
> Add AS_NO_DIRECT_MAP for mappings where direct map entries of folios are
> set to not present . Currently, mappings that match this description are
> secretmem mappings (memfd_secret()). Later, some guest_memfd
> configurations will also fall into this category.
>
> Reject this new type of mappings in all locations that currently reject
> secretmem mappings, on the assumption that if secretmem mappings are
> rejected somewhere, it is precisely because of an inability to deal with
> folios without direct map entries, and then make memfd_secret() use
> AS_NO_DIRECT_MAP on its address_space to drop its special
> vma_is_secretmem()/secretmem_mapping() checks.
>
> This drops a optimization in gup_fast_folio_allowed() where
> secretmem_mapping() was only called if CONFIG_SECRETMEM=y. secretmem is
> enabled by default since commit b758fe6df50d ("mm/secretmem: make it on
> by default"), so the secretmem check did not actually end up elided in
> most cases anymore anyway.
>
> Use a new flag instead of overloading AS_INACCESSIBLE (which is already
> set by guest_memfd) because not all guest_memfd mappings will end up
> being direct map removed (e.g. in pKVM setups, parts of guest_memfd that
> can be mapped to userspace should also be GUP-able, and generally not
> have restrictions on who can access it).
>
> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
> ---
>  include/linux/pagemap.h   | 16 ++++++++++++++++
>  include/linux/secretmem.h | 18 ------------------
>  lib/buildid.c             |  4 ++--
>  mm/gup.c                  | 14 +++-----------
>  mm/mlock.c                |  2 +-
>  mm/secretmem.c            |  6 +-----
>  6 files changed, 23 insertions(+), 37 deletions(-)
>
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 12a12dae727d..b52b28ae4636 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -211,6 +211,7 @@ enum mapping_flags {
>                                    folio contents */
>         AS_INACCESSIBLE = 8,    /* Do not attempt direct R/W access to the mapping */
>         AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
> +       AS_NO_DIRECT_MAP = 10,  /* Folios in the mapping are not in the direct map */
>         /* Bits 16-25 are used for FOLIO_ORDER */
>         AS_FOLIO_ORDER_BITS = 5,
>         AS_FOLIO_ORDER_MIN = 16,
> @@ -346,6 +347,21 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(struct address_spac
>         return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
>  }
>
> +static inline void mapping_set_no_direct_map(struct address_space *mapping)
> +{
> +       set_bit(AS_NO_DIRECT_MAP, &mapping->flags);
> +}
> +
> +static inline bool mapping_no_direct_map(struct address_space *mapping)
> +{
> +       return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);
> +}
> +
> +static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma)
> +{
> +       return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
> +}
> +

Any reason vma is const whereas mapping in the function that it calls
(defined above it) isn't?

Cheers,
/fuad

>  static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
>  {
>         return mapping->gfp_mask;
> diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
> index e918f96881f5..0ae1fb057b3d 100644
> --- a/include/linux/secretmem.h
> +++ b/include/linux/secretmem.h
> @@ -4,28 +4,10 @@
>
>  #ifdef CONFIG_SECRETMEM
>
> -extern const struct address_space_operations secretmem_aops;
> -
> -static inline bool secretmem_mapping(struct address_space *mapping)
> -{
> -       return mapping->a_ops == &secretmem_aops;
> -}
> -
> -bool vma_is_secretmem(struct vm_area_struct *vma);
>  bool secretmem_active(void);
>
>  #else
>
> -static inline bool vma_is_secretmem(struct vm_area_struct *vma)
> -{
> -       return false;
> -}
> -
> -static inline bool secretmem_mapping(struct address_space *mapping)
> -{
> -       return false;
> -}
> -
>  static inline bool secretmem_active(void)
>  {
>         return false;
> diff --git a/lib/buildid.c b/lib/buildid.c
> index c4b0f376fb34..33f173a607ad 100644
> --- a/lib/buildid.c
> +++ b/lib/buildid.c
> @@ -65,8 +65,8 @@ static int freader_get_folio(struct freader *r, loff_t file_off)
>
>         freader_put_folio(r);
>
> -       /* reject secretmem folios created with memfd_secret() */
> -       if (secretmem_mapping(r->file->f_mapping))
> +       /* reject secretmem folios created with memfd_secret() or guest_memfd() */
> +       if (mapping_no_direct_map(r->file->f_mapping))
>                 return -EFAULT;
>
>         r->folio = filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT);
> diff --git a/mm/gup.c b/mm/gup.c
> index adffe663594d..8c988e076e5d 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1234,7 +1234,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
>         if ((gup_flags & FOLL_SPLIT_PMD) && is_vm_hugetlb_page(vma))
>                 return -EOPNOTSUPP;
>
> -       if (vma_is_secretmem(vma))
> +       if (vma_is_no_direct_map(vma))
>                 return -EFAULT;
>
>         if (write) {
> @@ -2751,7 +2751,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
>  {
>         bool reject_file_backed = false;
>         struct address_space *mapping;
> -       bool check_secretmem = false;
>         unsigned long mapping_flags;
>
>         /*
> @@ -2763,14 +2762,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
>                 reject_file_backed = true;
>
>         /* We hold a folio reference, so we can safely access folio fields. */
> -
> -       /* secretmem folios are always order-0 folios. */
> -       if (IS_ENABLED(CONFIG_SECRETMEM) && !folio_test_large(folio))
> -               check_secretmem = true;
> -
> -       if (!reject_file_backed && !check_secretmem)
> -               return true;
> -
>         if (WARN_ON_ONCE(folio_test_slab(folio)))
>                 return false;
>
> @@ -2812,8 +2803,9 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
>          * At this point, we know the mapping is non-null and points to an
>          * address_space object.
>          */
> -       if (check_secretmem && secretmem_mapping(mapping))
> +       if (mapping_no_direct_map(mapping))
>                 return false;
> +
>         /* The only remaining allowed file system is shmem. */
>         return !reject_file_backed || shmem_mapping(mapping);
>  }
> diff --git a/mm/mlock.c b/mm/mlock.c
> index a1d93ad33c6d..0def453fe881 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -474,7 +474,7 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
>
>         if (newflags == oldflags || (oldflags & VM_SPECIAL) ||
>             is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm) ||
> -           vma_is_dax(vma) || vma_is_secretmem(vma) || (oldflags & VM_DROPPABLE))
> +           vma_is_dax(vma) || vma_is_no_direct_map(vma) || (oldflags & VM_DROPPABLE))
>                 /* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */
>                 goto out;
>
> diff --git a/mm/secretmem.c b/mm/secretmem.c
> index 422dcaa32506..a2daee0e63a5 100644
> --- a/mm/secretmem.c
> +++ b/mm/secretmem.c
> @@ -134,11 +134,6 @@ static int secretmem_mmap_prepare(struct vm_area_desc *desc)
>         return 0;
>  }
>
> -bool vma_is_secretmem(struct vm_area_struct *vma)
> -{
> -       return vma->vm_ops == &secretmem_vm_ops;
> -}
> -
>  static const struct file_operations secretmem_fops = {
>         .release        = secretmem_release,
>         .mmap_prepare   = secretmem_mmap_prepare,
> @@ -206,6 +201,7 @@ static struct file *secretmem_file_create(unsigned long flags)
>
>         mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
>         mapping_set_unevictable(inode->i_mapping);
> +       mapping_set_no_direct_map(inode->i_mapping);
>
>         inode->i_op = &secretmem_iops;
>         inode->i_mapping->a_ops = &secretmem_aops;
> --
> 2.50.1
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 11/12] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in guest_memfd_test.c
  2025-08-28  9:39 ` [PATCH v5 11/12] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in guest_memfd_test.c Roy, Patrick
@ 2025-08-28 10:26   ` David Hildenbrand
  0 siblings, 0 replies; 38+ messages in thread
From: David Hildenbrand @ 2025-08-28 10:26 UTC (permalink / raw)
  To: Roy, Patrick, seanjc@google.com
  Cc: tabba@google.com, ackerleytng@google.com, pbonzini@redhat.com,
	kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, rppt@kernel.org, will@kernel.org,
	vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita, Thomson, Jack,
	Manwaring, Derek

On 28.08.25 11:39, Roy, Patrick wrote:
> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
> ---

WARNING: Missing commit description - Add an appropriate one
WARNING: From:/Signed-off-by: email name mismatch: 'From: "Roy, Patrick" 
<roypat@amazon.co.uk>' != 'Signed-off-by: Patrick Roy <roypat@amazon.co.uk>'

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 05/12] KVM: Documentation: describe GUEST_MEMFD_FLAG_NO_DIRECT_MAP
  2025-08-28  9:39 ` [PATCH v5 05/12] KVM: Documentation: describe GUEST_MEMFD_FLAG_NO_DIRECT_MAP Roy, Patrick
@ 2025-08-28 10:27   ` David Hildenbrand
  2025-09-01 14:30     ` Roy, Patrick
  0 siblings, 1 reply; 38+ messages in thread
From: David Hildenbrand @ 2025-08-28 10:27 UTC (permalink / raw)
  To: Roy, Patrick, seanjc@google.com
  Cc: tabba@google.com, ackerleytng@google.com, pbonzini@redhat.com,
	kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, rppt@kernel.org, will@kernel.org,
	vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita, Thomson, Jack,
	Manwaring, Derek

On 28.08.25 11:39, Roy, Patrick wrote:
> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
> ---
>   Documentation/virt/kvm/api.rst | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index c17a87a0a5ac..b52c14d58798 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6418,6 +6418,11 @@ When the capability KVM_CAP_GUEST_MEMFD_MMAP is supported, the 'flags' field
>   supports GUEST_MEMFD_FLAG_MMAP.  Setting this flag on guest_memfd creation
>   enables mmap() and faulting of guest_memfd memory to host userspace.
>   
> +When the capability KVM_CAP_GMEM_NO_DIRECT_MAP is supported, the 'flags' field
> +supports GUEST_MEMFG_FLAG_NO_DIRECT_MAP. Setting this flag makes the guest_memfd
> +instance behave similarly to memfd_secret, and unmaps the memory backing it from
> +the kernel's address space after allocation.
> +
>   When the KVM MMU performs a PFN lookup to service a guest fault and the backing
>   guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
>   consumed from guest_memfd, regardless of whether it is a shared or a private

WARNING: Missing commit description - Add an appropriate one
WARNING: From:/Signed-off-by: email name mismatch: 'From: "Roy, Patrick" 
<roypat@amazon.co.uk>' != 'Signed-off-by: Patrick Roy <roypat@amazon.co.uk>'

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 00/12] Direct Map Removal Support for guest_memfd
  2025-08-28  9:39 [PATCH v5 00/12] Direct Map Removal Support for guest_memfd Roy, Patrick
                   ` (11 preceding siblings ...)
  2025-08-28  9:39 ` [PATCH v5 12/12] KVM: selftests: Test guest execution from direct map removed gmem Roy, Patrick
@ 2025-08-28 12:50 ` David Hildenbrand
  12 siblings, 0 replies; 38+ messages in thread
From: David Hildenbrand @ 2025-08-28 12:50 UTC (permalink / raw)
  To: Roy, Patrick, seanjc@google.com
  Cc: tabba@google.com, ackerleytng@google.com, pbonzini@redhat.com,
	kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, rppt@kernel.org, will@kernel.org,
	vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita, Thomson, Jack,
	Manwaring, Derek

On 28.08.25 11:39, Roy, Patrick wrote:
> [ based on kvm/next ]
> 
> Unmapping virtual machine guest memory from the host kernel's direct map is a
> successful mitigation against Spectre-style transient execution issues: If the
> kernel page tables do not contain entries pointing to guest memory, then any
> attempted speculative read through the direct map will necessarily be blocked
> by the MMU before any observable microarchitectural side-effects happen. This
> means that Spectre-gadgets and similar cannot be used to target virtual machine
> memory. Roughly 60% of speculative execution issues fall into this category [1,
> Table 1].
> 

As discussed, I'll be maintaining a guestmemfd-preview branch where I 
just pile patch sets to see how it will all look together.

It's currently based on kvm/next where "stage 1" resides, and has "Add 
NUMA mempolicy support for KVM guest-memfdAdd NUMA mempolicy support for 
KVM guest-memfd" [1] applied.

There are some minor conflicts with [1] in the "KVM: guest_memfd: Add 
flag to remove from direct map" patch, I tried to resolve them, let's 
see if I messed up.

https://git.kernel.org/pub/scm/linux/kernel/git/david/linux.git/log/?h=guestmemfd-preview

[1] https://lkml.kernel.org/r/20250827175247.83322-2-shivankg@amd.com

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
  2025-08-28  9:39 ` [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP Roy, Patrick
  2025-08-28 10:21   ` Fuad Tabba
@ 2025-08-28 14:26   ` Mike Rapoport
  2025-09-01 13:56     ` Roy, Patrick
  2025-08-28 21:00   ` David Hildenbrand
  2025-08-31 10:26   ` kernel test robot
  3 siblings, 1 reply; 38+ messages in thread
From: Mike Rapoport @ 2025-08-28 14:26 UTC (permalink / raw)
  To: Roy, Patrick
  Cc: david@redhat.com, seanjc@google.com, tabba@google.com,
	ackerleytng@google.com, pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, will@kernel.org,
	vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita, Thomson, Jack,
	Manwaring, Derek

On Thu, Aug 28, 2025 at 09:39:19AM +0000, Roy, Patrick wrote:
> Add AS_NO_DIRECT_MAP for mappings where direct map entries of folios are
> set to not present . Currently, mappings that match this description are
> secretmem mappings (memfd_secret()). Later, some guest_memfd
> configurations will also fall into this category.
> 
> Reject this new type of mappings in all locations that currently reject
> secretmem mappings, on the assumption that if secretmem mappings are
> rejected somewhere, it is precisely because of an inability to deal with
> folios without direct map entries, and then make memfd_secret() use
> AS_NO_DIRECT_MAP on its address_space to drop its special
> vma_is_secretmem()/secretmem_mapping() checks.
> 
> This drops a optimization in gup_fast_folio_allowed() where
> secretmem_mapping() was only called if CONFIG_SECRETMEM=y. secretmem is
> enabled by default since commit b758fe6df50d ("mm/secretmem: make it on
> by default"), so the secretmem check did not actually end up elided in
> most cases anymore anyway.
> 
> Use a new flag instead of overloading AS_INACCESSIBLE (which is already
> set by guest_memfd) because not all guest_memfd mappings will end up
> being direct map removed (e.g. in pKVM setups, parts of guest_memfd that
> can be mapped to userspace should also be GUP-able, and generally not
> have restrictions on who can access it).
> 
> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
> ---
>  include/linux/pagemap.h   | 16 ++++++++++++++++
>  include/linux/secretmem.h | 18 ------------------
>  lib/buildid.c             |  4 ++--
>  mm/gup.c                  | 14 +++-----------
>  mm/mlock.c                |  2 +-
>  mm/secretmem.c            |  6 +-----
>  6 files changed, 23 insertions(+), 37 deletions(-)
> 
> diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
> index e918f96881f5..0ae1fb057b3d 100644
> --- a/include/linux/secretmem.h
> +++ b/include/linux/secretmem.h
> @@ -4,28 +4,10 @@
>  
>  #ifdef CONFIG_SECRETMEM
>  
> -extern const struct address_space_operations secretmem_aops;

Please also make secretmem_aops static in mm/secretmem.c

> -static inline bool secretmem_mapping(struct address_space *mapping)
> -{
> -	return mapping->a_ops == &secretmem_aops;
> -}
> -

...

> diff --git a/mm/gup.c b/mm/gup.c
> index adffe663594d..8c988e076e5d 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1234,7 +1234,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
>  	if ((gup_flags & FOLL_SPLIT_PMD) && is_vm_hugetlb_page(vma))
>  		return -EOPNOTSUPP;
>  
> -	if (vma_is_secretmem(vma))
> +	if (vma_is_no_direct_map(vma))
>  		return -EFAULT;
>  
>  	if (write) {
> @@ -2751,7 +2751,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
>  {
>  	bool reject_file_backed = false;
>  	struct address_space *mapping;
> -	bool check_secretmem = false;
>  	unsigned long mapping_flags;
>  
>  	/*
> @@ -2763,14 +2762,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
>  		reject_file_backed = true;
>  
>  	/* We hold a folio reference, so we can safely access folio fields. */
> -
> -	/* secretmem folios are always order-0 folios. */
> -	if (IS_ENABLED(CONFIG_SECRETMEM) && !folio_test_large(folio))
> -		check_secretmem = true;
> -
> -	if (!reject_file_backed && !check_secretmem)
> -		return true;
> -
>  	if (WARN_ON_ONCE(folio_test_slab(folio)))
>  		return false;

There's a check for hugetlb after this and a comment there mentions
secretmem, please update that to "mapping with no direct map" or something
like that.

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 04/12] KVM: guest_memfd: Add flag to remove from direct map
  2025-08-28  9:39 ` [PATCH v5 04/12] KVM: guest_memfd: Add flag to remove from direct map Roy, Patrick
@ 2025-08-28 14:54   ` Mike Rapoport
  2025-09-01 14:22     ` Roy, Patrick
  0 siblings, 1 reply; 38+ messages in thread
From: Mike Rapoport @ 2025-08-28 14:54 UTC (permalink / raw)
  To: Roy, Patrick
  Cc: david@redhat.com, seanjc@google.com, tabba@google.com,
	ackerleytng@google.com, pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, will@kernel.org,
	vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita, Thomson, Jack,
	Manwaring, Derek

On Thu, Aug 28, 2025 at 09:39:21AM +0000, Roy, Patrick wrote:
> Add GUEST_MEMFD_FLAG_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD()
> ioctl. When set, guest_memfd folios will be removed from the direct map
> after preparation, with direct map entries only restored when the folios
> are freed.
> 
> To ensure these folios do not end up in places where the kernel cannot
> deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct
> address_space if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is requested.
> 
> Add KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP to let userspace discover whether
> guest_memfd supports GUEST_MEMFD_FLAG_NO_DIRECT_MAP. Support depends on
> guest_memfd itself being supported, but also on whether KVM can
> manipulate the direct map at page granularity at all (possible most of
> the time, just arm64 is a notable outlier where its impossible if the
> direct map has been setup using hugepages, as arm64 cannot break these
> apart due to break-before-make semantics).

There's also powerpc that does not select ARCH_HAS_SET_DIRECT_MAP
 
> Note that this flag causes removal of direct map entries for all
> guest_memfd folios independent of whether they are "shared" or "private"
> (although current guest_memfd only supports either all folios in the
> "shared" state, or all folios in the "private" state if
> GUEST_MEMFD_FLAG_MMAP is not set). The usecase for removing direct map
> entries of also the shared parts of guest_memfd are a special type of
> non-CoCo VM where, host userspace is trusted to have access to all of
> guest memory, but where Spectre-style transient execution attacks
> through the host kernel's direct map should still be mitigated.  In this
> setup, KVM retains access to guest memory via userspace mappings of
> guest_memfd, which are reflected back into KVM's memslots via
> userspace_addr. This is needed for things like MMIO emulation on x86_64
> to work.
> 
> Do not perform TLB flushes after direct map manipulations. This is
> because TLB flushes resulted in a up to 40x elongation of page faults in
> guest_memfd (scaling with the number of CPU cores), or a 5x elongation
> of memory population. TLB flushes are not needed for functional
> correctness (the virt->phys mapping technically stays "correct",  the
> kernel should simply to not it for a while). On the other hand, it means

                          ^ not use it?

> that the desired protection from Spectre-style attacks is not perfect,
> as an attacker could try to prevent a stale TLB entry from getting
> evicted, keeping it alive until the page it refers to is used by the
> guest for some sensitive data, and then targeting it using a
> spectre-gadget.
> 
> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
> ---
>  arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
>  include/linux/kvm_host.h          |  7 +++++++
>  include/uapi/linux/kvm.h          |  2 ++
>  virt/kvm/guest_memfd.c            | 29 +++++++++++++++++++++++++----
>  virt/kvm/kvm_main.c               |  5 +++++
>  5 files changed, 51 insertions(+), 4 deletions(-)

...
 
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 9ec4c45e3cf2..e3696880405c 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -4,6 +4,7 @@
>  #include <linux/kvm_host.h>
>  #include <linux/pagemap.h>
>  #include <linux/anon_inodes.h>
> +#include <linux/set_memory.h>
>  
>  #include "kvm_mm.h"
>  
> @@ -42,8 +43,18 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo
>  	return 0;
>  }
>  
> +static bool kvm_gmem_test_no_direct_map(struct inode *inode)
> +{
> +	return ((unsigned long) inode->i_private) & GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
> +}
> +
>  static inline void kvm_gmem_mark_prepared(struct folio *folio)
>  {
> +	struct inode *inode = folio_inode(folio);
> +
> +	if (kvm_gmem_test_no_direct_map(inode))
> +		set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio), false);

This may fail to split large mapping in the direct map. Why not move this
to kvm_gmem_prepare_folio() where you can handle returned error?

I think that using set_direct_map_invalid_noflush() here and
set_direct_map_default_noflush() in kvm_gmem_free_folio() better is
clearer and makes it more obvious that here the folio is removed from the
direct map and when freed it's direct mapping is restored.

This requires to export two symbols in patch 2, but I think it's worth it.

>  	folio_mark_uptodate(folio);
>  }
>  
> @@ -429,25 +440,29 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
>  	return MF_DELAYED;
>  }
>  
> -#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>  static void kvm_gmem_free_folio(struct address_space *mapping,
>  				struct folio *folio)
>  {
>  	struct page *page = folio_page(folio, 0);
> +
> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>  	kvm_pfn_t pfn = page_to_pfn(page);
>  	int order = folio_order(folio);
> +#endif
>  
> +	if (kvm_gmem_test_no_direct_map(mapping->host))
> +		WARN_ON_ONCE(set_direct_map_valid_noflush(page, folio_nr_pages(folio), true));

I don't think it can fail here. The direct map was split when you removed
the folio so here it will merely update the prot bits.

> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>  	kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
> -}
>  #endif
> +}

Instead of moving #ifdefs into kvm_gmem_free_folio() it's better to add, say,
kvm_gmem_invalidate() and move ifdefery there or even better have a static
inline stub for !CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE case.

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 18f29ef93543..0dbfd17e1191 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -65,6 +65,7 @@
>  #include <trace/events/kvm.h>
>  
>  #include <linux/kvm_dirty_ring.h>
> +#include <linux/set_memory.h>
>  
>  
>  /* Worst case buffer size needed for holding an integer. */
> @@ -4916,6 +4917,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  		return kvm_supported_mem_attributes(kvm);
>  #endif
>  #ifdef CONFIG_KVM_GUEST_MEMFD
> +	case KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP:
> +		if (!can_set_direct_map())

Shouldn't this check with kvm_arch_gmem_supports_no_direct_map()?

> +			return false;
> +		fallthrough;
>  	case KVM_CAP_GUEST_MEMFD:
>  		return 1;
>  	case KVM_CAP_GUEST_MEMFD_MMAP:
> -- 
> 2.50.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
  2025-08-28  9:39 ` [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP Roy, Patrick
  2025-08-28 10:21   ` Fuad Tabba
  2025-08-28 14:26   ` Mike Rapoport
@ 2025-08-28 21:00   ` David Hildenbrand
  2025-09-01 14:03     ` Roy, Patrick
  2025-08-31 10:26   ` kernel test robot
  3 siblings, 1 reply; 38+ messages in thread
From: David Hildenbrand @ 2025-08-28 21:00 UTC (permalink / raw)
  To: Roy, Patrick, seanjc@google.com
  Cc: tabba@google.com, ackerleytng@google.com, pbonzini@redhat.com,
	kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, rppt@kernel.org, will@kernel.org,
	vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita, Thomson, Jack,
	Manwaring, Derek

On 28.08.25 11:39, Roy, Patrick wrote:
> Add AS_NO_DIRECT_MAP for mappings where direct map entries of folios are
> set to not present . Currently, mappings that match this description are
> secretmem mappings (memfd_secret()). Later, some guest_memfd
> configurations will also fall into this category.
> 
> Reject this new type of mappings in all locations that currently reject
> secretmem mappings, on the assumption that if secretmem mappings are
> rejected somewhere, it is precisely because of an inability to deal with
> folios without direct map entries, and then make memfd_secret() use
> AS_NO_DIRECT_MAP on its address_space to drop its special
> vma_is_secretmem()/secretmem_mapping() checks.
> 
> This drops a optimization in gup_fast_folio_allowed() where
> secretmem_mapping() was only called if CONFIG_SECRETMEM=y. secretmem is
> enabled by default since commit b758fe6df50d ("mm/secretmem: make it on
> by default"), so the secretmem check did not actually end up elided in
> most cases anymore anyway.
> 
> Use a new flag instead of overloading AS_INACCESSIBLE (which is already
> set by guest_memfd) because not all guest_memfd mappings will end up
> being direct map removed (e.g. in pKVM setups, parts of guest_memfd that
> can be mapped to userspace should also be GUP-able, and generally not
> have restrictions on who can access it).
> 
> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
> ---
[...]

> +static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma)
> +{
> +	return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
> +}
> +

"vma_is_no_direct_map" reads a bit weird.

"vma_has_no_direct_map" or "vma_no_direct_mapping" might be better.

With the comment Mike and Fuad raised, this LGTM.


-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
  2025-08-28  9:39 ` [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP Roy, Patrick
                     ` (2 preceding siblings ...)
  2025-08-28 21:00   ` David Hildenbrand
@ 2025-08-31 10:26   ` kernel test robot
  2025-09-01 14:05     ` Roy, Patrick
  3 siblings, 1 reply; 38+ messages in thread
From: kernel test robot @ 2025-08-31 10:26 UTC (permalink / raw)
  To: Roy, Patrick, david@redhat.com, seanjc@google.com
  Cc: oe-kbuild-all, Roy, Patrick, tabba@google.com,
	ackerleytng@google.com, pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek

Hi Patrick,

kernel test robot noticed the following build warnings:

[auto build test WARNING on a6ad54137af92535cfe32e19e5f3bc1bb7dbd383]

url:    https://github.com/intel-lab-lkp/linux/commits/Roy-Patrick/filemap-Pass-address_space-mapping-to-free_folio/20250828-174202
base:   a6ad54137af92535cfe32e19e5f3bc1bb7dbd383
patch link:    https://lore.kernel.org/r/20250828093902.2719-4-roypat%40amazon.co.uk
patch subject: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
config: loongarch-randconfig-r133-20250831 (https://download.01.org/0day-ci/archive/20250831/202508311805.yfcdeaFC-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 14.3.0
reproduce: (https://download.01.org/0day-ci/archive/20250831/202508311805.yfcdeaFC-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508311805.yfcdeaFC-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> mm/secretmem.c:155:39: sparse: sparse: symbol 'secretmem_aops' was not declared. Should it be static?

vim +/secretmem_aops +155 mm/secretmem.c

1507f51255c9ff Mike Rapoport           2021-07-07  154  
1507f51255c9ff Mike Rapoport           2021-07-07 @155  const struct address_space_operations secretmem_aops = {
46de8b979492e1 Matthew Wilcox (Oracle  2022-02-09  156) 	.dirty_folio	= noop_dirty_folio,
6612ed24a24273 Matthew Wilcox (Oracle  2022-05-02  157) 	.free_folio	= secretmem_free_folio,
5409548df3876a Matthew Wilcox (Oracle  2022-06-06  158) 	.migrate_folio	= secretmem_migrate_folio,
1507f51255c9ff Mike Rapoport           2021-07-07  159  };
1507f51255c9ff Mike Rapoport           2021-07-07  160  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 02/12] arch: export set_direct_map_valid_noflush to KVM module
  2025-08-28 10:07   ` Fuad Tabba
@ 2025-09-01 13:47     ` Roy, Patrick
  0 siblings, 0 replies; 38+ messages in thread
From: Roy, Patrick @ 2025-09-01 13:47 UTC (permalink / raw)
  To: tabba@google.com
  Cc: ackerleytng@google.com, david@redhat.com, Manwaring, Derek,
	Thomson, Jack, Kalyazin, Nikita, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	pbonzini@redhat.com, Roy, Patrick, rppt@kernel.org,
	seanjc@google.com, vbabka@suse.cz, will@kernel.org, Cali, Marco

Hi Fuad!

On Thu, 2025-08-28 at 11:07 +0100, Fuad Tabba wrote:
>> diff --git a/arch/loongarch/mm/pageattr.c b/arch/loongarch/mm/pageattr.c
>> index f5e910b68229..d076bfd3fcbf 100644
>> --- a/arch/loongarch/mm/pageattr.c
>> +++ b/arch/loongarch/mm/pageattr.c
>> @@ -217,6 +217,7 @@ int set_direct_map_invalid_noflush(struct page *page)
>>
>>         return __set_memory(addr, 1, __pgprot(0), __pgprot(_PAGE_PRESENT | _PAGE_VALID));
>>  }
>> +EXPORT_SYMBOL_FOR_MODULES(set_direct_map_valid_noflush, "kvm");
> 
> This should be after 'set_direct_map_valid_noflush', not 'invalid'.
> 
> With that fixed:
> 
> Reviewed-by: Fuad Tabba <tabba@google.com>

Ah, yes, good catch, thanks!

Best,
Patrick


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
  2025-08-28 10:21   ` Fuad Tabba
@ 2025-09-01 13:54     ` Roy, Patrick
  2025-09-01 14:56       ` Roy, Patrick
  0 siblings, 1 reply; 38+ messages in thread
From: Roy, Patrick @ 2025-09-01 13:54 UTC (permalink / raw)
  To: tabba@google.com
  Cc: ackerleytng@google.com, david@redhat.com, Manwaring, Derek,
	Thomson, Jack, Kalyazin, Nikita, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	pbonzini@redhat.com, Roy, Patrick, rppt@kernel.org,
	seanjc@google.com, vbabka@suse.cz, will@kernel.org, Cali, Marco


Hi Fuad!

On Thu, 2025-08-28 at 11:21 +0100, Fuad Tabba wrote:
> Hi Patrick,
> 
> On Thu, 28 Aug 2025 at 10:39, Roy, Patrick <roypat@amazon.co.uk> wrote:
>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>> index 12a12dae727d..b52b28ae4636 100644
>> --- a/include/linux/pagemap.h
>> +++ b/include/linux/pagemap.h
>> @@ -211,6 +211,7 @@ enum mapping_flags {
>>                                    folio contents */
>>         AS_INACCESSIBLE = 8,    /* Do not attempt direct R/W access to the mapping */
>>         AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
>> +       AS_NO_DIRECT_MAP = 10,  /* Folios in the mapping are not in the direct map */
>>         /* Bits 16-25 are used for FOLIO_ORDER */
>>         AS_FOLIO_ORDER_BITS = 5,
>>         AS_FOLIO_ORDER_MIN = 16,
>> @@ -346,6 +347,21 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(struct address_spac
>>         return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
>>  }
>>
>> +static inline void mapping_set_no_direct_map(struct address_space *mapping)
>> +{
>> +       set_bit(AS_NO_DIRECT_MAP, &mapping->flags);
>> +}
>> +
>> +static inline bool mapping_no_direct_map(struct address_space *mapping)
>> +{
>> +       return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);
>> +}
>> +
>> +static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma)
>> +{
>> +       return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
>> +}
>> +
> Any reason vma is const whereas mapping in the function that it calls
> (defined above it) isn't?

Ah, I cannot say that that was a conscious decision, but rather an artifact of
the code that I looked at for reference when writing these two simply did it
this way.  Are you saying both should be const, or neither (in my mind, both
could be const, but the mapping_*() family of functions further up in this file
dont take const arguments, so I'm a bit unsure now)?

> Cheers,
> /fuad

Best,
Patrick


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
  2025-08-28 14:26   ` Mike Rapoport
@ 2025-09-01 13:56     ` Roy, Patrick
  0 siblings, 0 replies; 38+ messages in thread
From: Roy, Patrick @ 2025-09-01 13:56 UTC (permalink / raw)
  To: rppt@kernel.org
  Cc: ackerleytng@google.com, david@redhat.com, Manwaring, Derek,
	Thomson, Jack, Kalyazin, Nikita, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	pbonzini@redhat.com, Roy, Patrick, seanjc@google.com,
	tabba@google.com, vbabka@suse.cz, will@kernel.org, Cali, Marco

Hi Mike,

On Thu, 2025-08-28 at 15:26 +0100, Mike Rapoport wrote:
> On Thu, Aug 28, 2025 at 09:39:19AM +0000, Roy, Patrick wrote:
>> Add AS_NO_DIRECT_MAP for mappings where direct map entries of folios are
>> set to not present . Currently, mappings that match this description are
>> secretmem mappings (memfd_secret()). Later, some guest_memfd
>> configurations will also fall into this category.
>>
>> Reject this new type of mappings in all locations that currently reject
>> secretmem mappings, on the assumption that if secretmem mappings are
>> rejected somewhere, it is precisely because of an inability to deal with
>> folios without direct map entries, and then make memfd_secret() use
>> AS_NO_DIRECT_MAP on its address_space to drop its special
>> vma_is_secretmem()/secretmem_mapping() checks.
>>
>> This drops a optimization in gup_fast_folio_allowed() where
>> secretmem_mapping() was only called if CONFIG_SECRETMEM=y. secretmem is
>> enabled by default since commit b758fe6df50d ("mm/secretmem: make it on
>> by default"), so the secretmem check did not actually end up elided in
>> most cases anymore anyway.
>>
>> Use a new flag instead of overloading AS_INACCESSIBLE (which is already
>> set by guest_memfd) because not all guest_memfd mappings will end up
>> being direct map removed (e.g. in pKVM setups, parts of guest_memfd that
>> can be mapped to userspace should also be GUP-able, and generally not
>> have restrictions on who can access it).
>>
>> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
>> ---
>>  include/linux/pagemap.h   | 16 ++++++++++++++++
>>  include/linux/secretmem.h | 18 ------------------
>>  lib/buildid.c             |  4 ++--
>>  mm/gup.c                  | 14 +++-----------
>>  mm/mlock.c                |  2 +-
>>  mm/secretmem.c            |  6 +-----
>>  6 files changed, 23 insertions(+), 37 deletions(-)
>>
>> diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
>> index e918f96881f5..0ae1fb057b3d 100644
>> --- a/include/linux/secretmem.h
>> +++ b/include/linux/secretmem.h
>> @@ -4,28 +4,10 @@
>>
>>  #ifdef CONFIG_SECRETMEM
>>
>> -extern const struct address_space_operations secretmem_aops;
> 
> Please also make secretmem_aops static in mm/secretmem.c

Ack.

>> -static inline bool secretmem_mapping(struct address_space *mapping)
>> -{
>> -     return mapping->a_ops == &secretmem_aops;
>> -}
>> -
> 
> ...
> 
>> diff --git a/mm/gup.c b/mm/gup.c
>> index adffe663594d..8c988e076e5d 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -1234,7 +1234,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
>>       if ((gup_flags & FOLL_SPLIT_PMD) && is_vm_hugetlb_page(vma))
>>               return -EOPNOTSUPP;
>>
>> -     if (vma_is_secretmem(vma))
>> +     if (vma_is_no_direct_map(vma))
>>               return -EFAULT;
>>
>>       if (write) {
>> @@ -2751,7 +2751,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
>>  {
>>       bool reject_file_backed = false;
>>       struct address_space *mapping;
>> -     bool check_secretmem = false;
>>       unsigned long mapping_flags;
>>
>>       /*
>> @@ -2763,14 +2762,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
>>               reject_file_backed = true;
>>
>>       /* We hold a folio reference, so we can safely access folio fields. */
>> -
>> -     /* secretmem folios are always order-0 folios. */
>> -     if (IS_ENABLED(CONFIG_SECRETMEM) && !folio_test_large(folio))
>> -             check_secretmem = true;
>> -
>> -     if (!reject_file_backed && !check_secretmem)
>> -             return true;
>> -
>>       if (WARN_ON_ONCE(folio_test_slab(folio)))
>>               return false;
> 
> There's a check for hugetlb after this and a comment there mentions
> secretmem, please update that to "mapping with no direct map" or something
> like that.

Ack.

> --
> Sincerely yours,
> Mike.

Thanks,
Patrick


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
  2025-08-28 21:00   ` David Hildenbrand
@ 2025-09-01 14:03     ` Roy, Patrick
  0 siblings, 0 replies; 38+ messages in thread
From: Roy, Patrick @ 2025-09-01 14:03 UTC (permalink / raw)
  To: david@redhat.com
  Cc: ackerleytng@google.com, Manwaring, Derek, Thomson, Jack,
	Kalyazin, Nikita, kvm@vger.kernel.org, kvmarm@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	pbonzini@redhat.com, Roy, Patrick, rppt@kernel.org,
	seanjc@google.com, tabba@google.com, vbabka@suse.cz,
	will@kernel.org, Cali, Marco


On Thu, 2025-08-28 at 22:00 +0100, David Hildenbrand wrote:
> On 28.08.25 11:39, Roy, Patrick wrote:
>> Add AS_NO_DIRECT_MAP for mappings where direct map entries of folios are
>> set to not present . Currently, mappings that match this description are
>> secretmem mappings (memfd_secret()). Later, some guest_memfd
>> configurations will also fall into this category.
>>
>> Reject this new type of mappings in all locations that currently reject
>> secretmem mappings, on the assumption that if secretmem mappings are
>> rejected somewhere, it is precisely because of an inability to deal with
>> folios without direct map entries, and then make memfd_secret() use
>> AS_NO_DIRECT_MAP on its address_space to drop its special
>> vma_is_secretmem()/secretmem_mapping() checks.
>>
>> This drops a optimization in gup_fast_folio_allowed() where
>> secretmem_mapping() was only called if CONFIG_SECRETMEM=y. secretmem is
>> enabled by default since commit b758fe6df50d ("mm/secretmem: make it on
>> by default"), so the secretmem check did not actually end up elided in
>> most cases anymore anyway.
>>
>> Use a new flag instead of overloading AS_INACCESSIBLE (which is already
>> set by guest_memfd) because not all guest_memfd mappings will end up
>> being direct map removed (e.g. in pKVM setups, parts of guest_memfd that
>> can be mapped to userspace should also be GUP-able, and generally not
>> have restrictions on who can access it).
>>
>> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
>> ---
> [...]
> 
>> +static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma)
>> +{
>> +     return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
>> +}
>> +
> 
> "vma_is_no_direct_map" reads a bit weird.
> 
> "vma_has_no_direct_map" or "vma_no_direct_mapping" might be better.

I went with "vma_has_no_direct_map" for now, because vma_no_direct_mapping
would imply (to me at least) changing "mapping_no_direct_map" to
"mapping_no_direct_mapping", which also reads a bit weird imo.

> With the comment Mike and Fuad raised, this LGTM.
> 
> 
> -- 
> Cheers
> 
> David / dhildenb

Best,
Patrick





^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
  2025-08-31 10:26   ` kernel test robot
@ 2025-09-01 14:05     ` Roy, Patrick
  0 siblings, 0 replies; 38+ messages in thread
From: Roy, Patrick @ 2025-09-01 14:05 UTC (permalink / raw)
  To: lkp@intel.com
  Cc: ackerleytng@google.com, david@redhat.com, Manwaring, Derek,
	Thomson, Jack, Kalyazin, Nikita, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	oe-kbuild-all@lists.linux.dev, pbonzini@redhat.com, Roy, Patrick,
	rppt@kernel.org, seanjc@google.com, tabba@google.com,
	vbabka@suse.cz, will@kernel.org, Cali, Marco


On Sun, 2025-08-31 at 11:26 +0100, kernel test robot wrote:
> Hi Patrick,
> 
> kernel test robot noticed the following build warnings:
> 
> [auto build test WARNING on a6ad54137af92535cfe32e19e5f3bc1bb7dbd383]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Roy-Patrick/filemap-Pass-address_space-mapping-to-free_folio/20250828-174202
> base:   a6ad54137af92535cfe32e19e5f3bc1bb7dbd383
> patch link:    https://lore.kernel.org/r/20250828093902.2719-4-roypat%40amazon.co.uk
> patch subject: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
> config: loongarch-randconfig-r133-20250831 (https://download.01.org/0day-ci/archive/20250831/202508311805.yfcdeaFC-lkp@intel.com/config)
> compiler: loongarch64-linux-gcc (GCC) 14.3.0
> reproduce: (https://download.01.org/0day-ci/archive/20250831/202508311805.yfcdeaFC-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202508311805.yfcdeaFC-lkp@intel.com/
> 
> sparse warnings: (new ones prefixed by >>)
>>> mm/secretmem.c:155:39: sparse: sparse: symbol 'secretmem_aops' was not declared. Should it be static?
> 
> vim +/secretmem_aops +155 mm/secretmem.c
> 
> 1507f51255c9ff Mike Rapoport           2021-07-07  154
> 1507f51255c9ff Mike Rapoport           2021-07-07 @155  const struct address_space_operations secretmem_aops = {
> 46de8b979492e1 Matthew Wilcox (Oracle  2022-02-09  156)         .dirty_folio    = noop_dirty_folio,
> 6612ed24a24273 Matthew Wilcox (Oracle  2022-05-02  157)         .free_folio     = secretmem_free_folio,
> 5409548df3876a Matthew Wilcox (Oracle  2022-06-06  158)         .migrate_folio  = secretmem_migrate_folio,
> 1507f51255c9ff Mike Rapoport           2021-07-07  159  };
> 1507f51255c9ff Mike Rapoport           2021-07-07  160
> 
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki

This is the same thing Mike already pointed out (making `secretmem_aops` static)


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 04/12] KVM: guest_memfd: Add flag to remove from direct map
  2025-08-28 14:54   ` Mike Rapoport
@ 2025-09-01 14:22     ` Roy, Patrick
  2025-09-01 14:27       ` Mike Rapoport
  0 siblings, 1 reply; 38+ messages in thread
From: Roy, Patrick @ 2025-09-01 14:22 UTC (permalink / raw)
  To: rppt@kernel.org
  Cc: ackerleytng@google.com, david@redhat.com, Manwaring, Derek,
	Thomson, Jack, Kalyazin, Nikita, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	pbonzini@redhat.com, Roy, Patrick, seanjc@google.com,
	tabba@google.com, vbabka@suse.cz, will@kernel.org, Cali, Marco

On Thu, 2025-08-28 at 15:54 +0100, Mike Rapoport wrote:
> On Thu, Aug 28, 2025 at 09:39:21AM +0000, Roy, Patrick wrote:
>> Add GUEST_MEMFD_FLAG_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD()
>> ioctl. When set, guest_memfd folios will be removed from the direct map
>> after preparation, with direct map entries only restored when the folios
>> are freed.
>>
>> To ensure these folios do not end up in places where the kernel cannot
>> deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct
>> address_space if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is requested.
>>
>> Add KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP to let userspace discover whether
>> guest_memfd supports GUEST_MEMFD_FLAG_NO_DIRECT_MAP. Support depends on
>> guest_memfd itself being supported, but also on whether KVM can
>> manipulate the direct map at page granularity at all (possible most of
>> the time, just arm64 is a notable outlier where its impossible if the
>> direct map has been setup using hugepages, as arm64 cannot break these
>> apart due to break-before-make semantics).
> 
> There's also powerpc that does not select ARCH_HAS_SET_DIRECT_MAP

Ah, thanks! Although powerpc also doesnt support guest_memfd in the first
place, but will mention.

>> Note that this flag causes removal of direct map entries for all
>> guest_memfd folios independent of whether they are "shared" or "private"
>> (although current guest_memfd only supports either all folios in the
>> "shared" state, or all folios in the "private" state if
>> GUEST_MEMFD_FLAG_MMAP is not set). The usecase for removing direct map
>> entries of also the shared parts of guest_memfd are a special type of
>> non-CoCo VM where, host userspace is trusted to have access to all of
>> guest memory, but where Spectre-style transient execution attacks
>> through the host kernel's direct map should still be mitigated.  In this
>> setup, KVM retains access to guest memory via userspace mappings of
>> guest_memfd, which are reflected back into KVM's memslots via
>> userspace_addr. This is needed for things like MMIO emulation on x86_64
>> to work.
>>
>> Do not perform TLB flushes after direct map manipulations. This is
>> because TLB flushes resulted in a up to 40x elongation of page faults in
>> guest_memfd (scaling with the number of CPU cores), or a 5x elongation
>> of memory population. TLB flushes are not needed for functional
>> correctness (the virt->phys mapping technically stays "correct",  the
>> kernel should simply to not it for a while). On the other hand, it means
> 
>                           ^ not use it?

Yup, thanks!

>> that the desired protection from Spectre-style attacks is not perfect,
>> as an attacker could try to prevent a stale TLB entry from getting
>> evicted, keeping it alive until the page it refers to is used by the
>> guest for some sensitive data, and then targeting it using a
>> spectre-gadget.
>>
>> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
>> ---
>>  arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
>>  include/linux/kvm_host.h          |  7 +++++++
>>  include/uapi/linux/kvm.h          |  2 ++
>>  virt/kvm/guest_memfd.c            | 29 +++++++++++++++++++++++++----
>>  virt/kvm/kvm_main.c               |  5 +++++
>>  5 files changed, 51 insertions(+), 4 deletions(-)
> 
> ...
> 
>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> index 9ec4c45e3cf2..e3696880405c 100644
>> --- a/virt/kvm/guest_memfd.c
>> +++ b/virt/kvm/guest_memfd.c
>> @@ -4,6 +4,7 @@
>>  #include <linux/kvm_host.h>
>>  #include <linux/pagemap.h>
>>  #include <linux/anon_inodes.h>
>> +#include <linux/set_memory.h>
>>
>>  #include "kvm_mm.h"
>>
>> @@ -42,8 +43,18 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo
>>       return 0;
>>  }
>>
>> +static bool kvm_gmem_test_no_direct_map(struct inode *inode)
>> +{
>> +     return ((unsigned long) inode->i_private) & GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
>> +}
>> +
>>  static inline void kvm_gmem_mark_prepared(struct folio *folio)
>>  {
>> +     struct inode *inode = folio_inode(folio);
>> +
>> +     if (kvm_gmem_test_no_direct_map(inode))
>> +             set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio), false);
> 
> This may fail to split large mapping in the direct map. Why not move this
> to kvm_gmem_prepare_folio() where you can handle returned error?

Argh, yeah, got that the wrong way around. Will update the error handling.

> I think that using set_direct_map_invalid_noflush() here and
> set_direct_map_default_noflush() in kvm_gmem_free_folio() better is
> clearer and makes it more obvious that here the folio is removed from the
> direct map and when freed it's direct mapping is restored.
> 
> This requires to export two symbols in patch 2, but I think it's worth it.

Mh, but set_direct_map_[default|invalid]_noflush() only take a single struct
page * argument, so they'd either need to gain a npages argument, or we add yet
more functions to set_memory.h.  Do you still think that's worth it? 

>>       folio_mark_uptodate(folio);
>>  }
>>
>> @@ -429,25 +440,29 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
>>       return MF_DELAYED;
>>  }
>>
>> -#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>>  static void kvm_gmem_free_folio(struct address_space *mapping,
>>                               struct folio *folio)
>>  {
>>       struct page *page = folio_page(folio, 0);
>> +
>> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>>       kvm_pfn_t pfn = page_to_pfn(page);
>>       int order = folio_order(folio);
>> +#endif
>>
>> +     if (kvm_gmem_test_no_direct_map(mapping->host))
>> +             WARN_ON_ONCE(set_direct_map_valid_noflush(page, folio_nr_pages(folio), true));
> 
> I don't think it can fail here. The direct map was split when you removed
> the folio so here it will merely update the prot bits.

Yup, will drop this WARN_ON_ONCE.

>> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>>       kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
>> -}
>>  #endif
>> +}
> 
> Instead of moving #ifdefs into kvm_gmem_free_folio() it's better to add, say,
> kvm_gmem_invalidate() and move ifdefery there or even better have a static
> inline stub for !CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE case.

Ack, will do the latter

>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 18f29ef93543..0dbfd17e1191 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -65,6 +65,7 @@
>>  #include <trace/events/kvm.h>
>>
>>  #include <linux/kvm_dirty_ring.h>
>> +#include <linux/set_memory.h>
>>
>>
>>  /* Worst case buffer size needed for holding an integer. */
>> @@ -4916,6 +4917,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>>               return kvm_supported_mem_attributes(kvm);
>>  #endif
>>  #ifdef CONFIG_KVM_GUEST_MEMFD
>> +     case KVM_CAP_GUEST_MEMFD_NO_DIRECT_MAP:
>> +             if (!can_set_direct_map())
> 
> Shouldn't this check with kvm_arch_gmem_supports_no_direct_map()?

Absolutely, thanks for catching!

>> +                     return false;
>> +             fallthrough;
>>       case KVM_CAP_GUEST_MEMFD:
>>               return 1;
>>       case KVM_CAP_GUEST_MEMFD_MMAP:
>> --
>> 2.50.1
>>
> 
> --
> Sincerely yours,
> Mike.

Best,
Patrick


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 04/12] KVM: guest_memfd: Add flag to remove from direct map
  2025-09-01 14:22     ` Roy, Patrick
@ 2025-09-01 14:27       ` Mike Rapoport
  0 siblings, 0 replies; 38+ messages in thread
From: Mike Rapoport @ 2025-09-01 14:27 UTC (permalink / raw)
  To: Roy, Patrick
  Cc: ackerleytng@google.com, david@redhat.com, Manwaring, Derek,
	Thomson, Jack, Kalyazin, Nikita, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	pbonzini@redhat.com, seanjc@google.com, tabba@google.com,
	vbabka@suse.cz, will@kernel.org, Cali, Marco

On Mon, Sep 01, 2025 at 02:22:22PM +0000, Roy, Patrick wrote:
> On Thu, 2025-08-28 at 15:54 +0100, Mike Rapoport wrote:
> > On Thu, Aug 28, 2025 at 09:39:21AM +0000, Roy, Patrick wrote:
> >
> >>  static inline void kvm_gmem_mark_prepared(struct folio *folio)
> >>  {
> >> +     struct inode *inode = folio_inode(folio);
> >> +
> >> +     if (kvm_gmem_test_no_direct_map(inode))
> >> +             set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio), false);
> > 
> > This may fail to split large mapping in the direct map. Why not move this
> > to kvm_gmem_prepare_folio() where you can handle returned error?
> 
> Argh, yeah, got that the wrong way around. Will update the error handling.
> 
> > I think that using set_direct_map_invalid_noflush() here and
> > set_direct_map_default_noflush() in kvm_gmem_free_folio() better is
> > clearer and makes it more obvious that here the folio is removed from the
> > direct map and when freed it's direct mapping is restored.
> > 
> > This requires to export two symbols in patch 2, but I think it's worth it.
> 
> Mh, but set_direct_map_[default|invalid]_noflush() only take a single struct
> page * argument, so they'd either need to gain a npages argument, or we add yet
> more functions to set_memory.h.  Do you still think that's worth it? 

Ah, right, misremembered that. Let's keep set_direct_map_valid_noflush().
 
-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 05/12] KVM: Documentation: describe GUEST_MEMFD_FLAG_NO_DIRECT_MAP
  2025-08-28 10:27   ` David Hildenbrand
@ 2025-09-01 14:30     ` Roy, Patrick
  2025-09-01 14:43       ` David Hildenbrand
  0 siblings, 1 reply; 38+ messages in thread
From: Roy, Patrick @ 2025-09-01 14:30 UTC (permalink / raw)
  To: david@redhat.com
  Cc: ackerleytng@google.com, Manwaring, Derek, Thomson, Jack,
	Kalyazin, Nikita, kvm@vger.kernel.org, kvmarm@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	pbonzini@redhat.com, Roy, Patrick, rppt@kernel.org,
	seanjc@google.com, tabba@google.com, vbabka@suse.cz,
	will@kernel.org, Cali, Marco

On Thu, 2025-08-28 at 11:27 +0100, David Hildenbrand wrote:
> On 28.08.25 11:39, Roy, Patrick wrote:
>> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
>> ---
>>   Documentation/virt/kvm/api.rst | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index c17a87a0a5ac..b52c14d58798 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -6418,6 +6418,11 @@ When the capability KVM_CAP_GUEST_MEMFD_MMAP is supported, the 'flags' field
>>   supports GUEST_MEMFD_FLAG_MMAP.  Setting this flag on guest_memfd creation
>>   enables mmap() and faulting of guest_memfd memory to host userspace.
>>
>> +When the capability KVM_CAP_GMEM_NO_DIRECT_MAP is supported, the 'flags' field
>> +supports GUEST_MEMFG_FLAG_NO_DIRECT_MAP. Setting this flag makes the guest_memfd
>> +instance behave similarly to memfd_secret, and unmaps the memory backing it from
>> +the kernel's address space after allocation.
>> +
>>   When the KVM MMU performs a PFN lookup to service a guest fault and the backing
>>   guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
>>   consumed from guest_memfd, regardless of whether it is a shared or a private
> 
> WARNING: Missing commit description - Add an appropriate one

Admittedly wasn't sure what to say that wouldn't just repeat the commit title
or the contents. Maybe that just means this shouldn't be its own patch. Will
squash in the previous one (same for PATCH 11/12).

> WARNING: From:/Signed-off-by: email name mismatch: 'From: "Roy, Patrick"
> <roypat@amazon.co.uk>' != 'Signed-off-by: Patrick Roy <roypat@amazon.co.uk>'

Heh, my git config only ever uses "Patrick Roy <roypat@amazon.co.uk>". Not sure
where "Roy, Patrick" comes from, could it be the mail server mangling things?

> -- 
> Cheers
> 
> David / dhildenb
> 
Best,
Patrick


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 05/12] KVM: Documentation: describe GUEST_MEMFD_FLAG_NO_DIRECT_MAP
  2025-09-01 14:30     ` Roy, Patrick
@ 2025-09-01 14:43       ` David Hildenbrand
  0 siblings, 0 replies; 38+ messages in thread
From: David Hildenbrand @ 2025-09-01 14:43 UTC (permalink / raw)
  To: Roy, Patrick
  Cc: ackerleytng@google.com, Manwaring, Derek, Thomson, Jack,
	Kalyazin, Nikita, kvm@vger.kernel.org, kvmarm@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	pbonzini@redhat.com, rppt@kernel.org, seanjc@google.com,
	tabba@google.com, vbabka@suse.cz, will@kernel.org, Cali, Marco

On 01.09.25 16:30, Roy, Patrick wrote:
> On Thu, 2025-08-28 at 11:27 +0100, David Hildenbrand wrote:
>> On 28.08.25 11:39, Roy, Patrick wrote:
>>> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
>>> ---
>>>    Documentation/virt/kvm/api.rst | 5 +++++
>>>    1 file changed, 5 insertions(+)
>>>
>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>>> index c17a87a0a5ac..b52c14d58798 100644
>>> --- a/Documentation/virt/kvm/api.rst
>>> +++ b/Documentation/virt/kvm/api.rst
>>> @@ -6418,6 +6418,11 @@ When the capability KVM_CAP_GUEST_MEMFD_MMAP is supported, the 'flags' field
>>>    supports GUEST_MEMFD_FLAG_MMAP.  Setting this flag on guest_memfd creation
>>>    enables mmap() and faulting of guest_memfd memory to host userspace.
>>>
>>> +When the capability KVM_CAP_GMEM_NO_DIRECT_MAP is supported, the 'flags' field
>>> +supports GUEST_MEMFG_FLAG_NO_DIRECT_MAP. Setting this flag makes the guest_memfd
>>> +instance behave similarly to memfd_secret, and unmaps the memory backing it from
>>> +the kernel's address space after allocation.
>>> +
>>>    When the KVM MMU performs a PFN lookup to service a guest fault and the backing
>>>    guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
>>>    consumed from guest_memfd, regardless of whether it is a shared or a private
>>
>> WARNING: Missing commit description - Add an appropriate one
> 
> Admittedly wasn't sure what to say that wouldn't just repeat the commit title
> or the contents. Maybe that just means this shouldn't be its own patch. Will
> squash in the previous one (same for PATCH 11/12).

Very right :)

If there is nothing to say then probably everything was already said 
(other patch), lol

> 
>> WARNING: From:/Signed-off-by: email name mismatch: 'From: "Roy, Patrick"
>> <roypat@amazon.co.uk>' != 'Signed-off-by: Patrick Roy <roypat@amazon.co.uk>'
> 
> Heh, my git config only ever uses "Patrick Roy <roypat@amazon.co.uk>". Not sure
> where "Roy, Patrick" comes from, could it be the mail server mangling things?

Good question. Does it only happen through git?

When Nikita sends through amazon.com[1] it seems to be fine.

But when he also sent through amazon.co.uk [2], it's also messed up.

I suggest contacting the amazon.co.uk admin or ... sending through 
amazon.com :)


[1] 
https://lore.kernel.org/all/cda7c46b-c474-48f4-b703-e2f988470f3b@amazon.com/T/#u
[1] 
https://lore.kernel.org/all/20250828153049.3922-1-kalyazin@amazon.com/T/#u

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
  2025-09-01 13:54     ` Roy, Patrick
@ 2025-09-01 14:56       ` Roy, Patrick
  2025-09-02  7:59         ` Fuad Tabba
  0 siblings, 1 reply; 38+ messages in thread
From: Roy, Patrick @ 2025-09-01 14:56 UTC (permalink / raw)
  To: Roy, Patrick
  Cc: ackerleytng@google.com, david@redhat.com, Manwaring, Derek,
	Thomson, Jack, Kalyazin, Nikita, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	pbonzini@redhat.com, rppt@kernel.org, seanjc@google.com,
	tabba@google.com, vbabka@suse.cz, will@kernel.org, Cali, Marco

On Mon, 2025-09-01 at 14:54 +0100, "Roy, Patrick" wrote:
> 
> Hi Fuad!
> 
> On Thu, 2025-08-28 at 11:21 +0100, Fuad Tabba wrote:
>> Hi Patrick,
>>
>> On Thu, 28 Aug 2025 at 10:39, Roy, Patrick <roypat@amazon.co.uk> wrote:
>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>>> index 12a12dae727d..b52b28ae4636 100644
>>> --- a/include/linux/pagemap.h
>>> +++ b/include/linux/pagemap.h
>>> @@ -211,6 +211,7 @@ enum mapping_flags {
>>>                                    folio contents */
>>>         AS_INACCESSIBLE = 8,    /* Do not attempt direct R/W access to the mapping */
>>>         AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
>>> +       AS_NO_DIRECT_MAP = 10,  /* Folios in the mapping are not in the direct map */
>>>         /* Bits 16-25 are used for FOLIO_ORDER */
>>>         AS_FOLIO_ORDER_BITS = 5,
>>>         AS_FOLIO_ORDER_MIN = 16,
>>> @@ -346,6 +347,21 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(struct address_spac
>>>         return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
>>>  }
>>>
>>> +static inline void mapping_set_no_direct_map(struct address_space *mapping)
>>> +{
>>> +       set_bit(AS_NO_DIRECT_MAP, &mapping->flags);
>>> +}
>>> +
>>> +static inline bool mapping_no_direct_map(struct address_space *mapping)
>>> +{
>>> +       return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);
>>> +}
>>> +
>>> +static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma)
>>> +{
>>> +       return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
>>> +}
>>> +
>> Any reason vma is const whereas mapping in the function that it calls
>> (defined above it) isn't?
> 
> Ah, I cannot say that that was a conscious decision, but rather an artifact of
> the code that I looked at for reference when writing these two simply did it
> this way.  Are you saying both should be const, or neither (in my mind, both
> could be const, but the mapping_*() family of functions further up in this file
> dont take const arguments, so I'm a bit unsure now)?

Hah, just saw
https://lore.kernel.org/linux-mm/20250901123028.3383461-3-max.kellermann@ionos.com/.
Guess that means "both should be const" then :D

>> Cheers,
>> /fuad
> 
> Best,
> Patrick



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
  2025-09-01 14:56       ` Roy, Patrick
@ 2025-09-02  7:59         ` Fuad Tabba
  2025-09-02  8:46           ` David Hildenbrand
  0 siblings, 1 reply; 38+ messages in thread
From: Fuad Tabba @ 2025-09-02  7:59 UTC (permalink / raw)
  To: Roy, Patrick
  Cc: ackerleytng@google.com, david@redhat.com, Manwaring, Derek,
	Thomson, Jack, Kalyazin, Nikita, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	pbonzini@redhat.com, rppt@kernel.org, seanjc@google.com,
	vbabka@suse.cz, will@kernel.org, Cali, Marco

Hi Patrick,

On Mon, 1 Sept 2025 at 15:56, Roy, Patrick <roypat@amazon.co.uk> wrote:
>
> On Mon, 2025-09-01 at 14:54 +0100, "Roy, Patrick" wrote:
> >
> > Hi Fuad!
> >
> > On Thu, 2025-08-28 at 11:21 +0100, Fuad Tabba wrote:
> >> Hi Patrick,
> >>
> >> On Thu, 28 Aug 2025 at 10:39, Roy, Patrick <roypat@amazon.co.uk> wrote:
> >>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> >>> index 12a12dae727d..b52b28ae4636 100644
> >>> --- a/include/linux/pagemap.h
> >>> +++ b/include/linux/pagemap.h
> >>> @@ -211,6 +211,7 @@ enum mapping_flags {
> >>>                                    folio contents */
> >>>         AS_INACCESSIBLE = 8,    /* Do not attempt direct R/W access to the mapping */
> >>>         AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
> >>> +       AS_NO_DIRECT_MAP = 10,  /* Folios in the mapping are not in the direct map */
> >>>         /* Bits 16-25 are used for FOLIO_ORDER */
> >>>         AS_FOLIO_ORDER_BITS = 5,
> >>>         AS_FOLIO_ORDER_MIN = 16,
> >>> @@ -346,6 +347,21 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(struct address_spac
> >>>         return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
> >>>  }
> >>>
> >>> +static inline void mapping_set_no_direct_map(struct address_space *mapping)
> >>> +{
> >>> +       set_bit(AS_NO_DIRECT_MAP, &mapping->flags);
> >>> +}
> >>> +
> >>> +static inline bool mapping_no_direct_map(struct address_space *mapping)
> >>> +{
> >>> +       return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);
> >>> +}
> >>> +
> >>> +static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma)
> >>> +{
> >>> +       return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
> >>> +}
> >>> +
> >> Any reason vma is const whereas mapping in the function that it calls
> >> (defined above it) isn't?
> >
> > Ah, I cannot say that that was a conscious decision, but rather an artifact of
> > the code that I looked at for reference when writing these two simply did it
> > this way.  Are you saying both should be const, or neither (in my mind, both
> > could be const, but the mapping_*() family of functions further up in this file
> > dont take const arguments, so I'm a bit unsure now)?
>
> Hah, just saw
> https://lore.kernel.org/linux-mm/20250901123028.3383461-3-max.kellermann@ionos.com/.
> Guess that means "both should be const" then :D

I don't have any strong preference regarding which way, as long as
it's consistent. The thing that should be avoided is having one
function with a parameter marked as const, pass that parameter (or
something derived from it), to a non-const function. Instead of
helping, this could cause a lot of headaches when trying to debug
things in the future, and figuring out what something that's supposed
to be "const" is being "corrupted".

Cheers,
/fuad


>
> >> Cheers,
> >> /fuad
> >
> > Best,
> > Patrick
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
  2025-09-02  7:59         ` Fuad Tabba
@ 2025-09-02  8:46           ` David Hildenbrand
  2025-09-02  8:50             ` Fuad Tabba
  0 siblings, 1 reply; 38+ messages in thread
From: David Hildenbrand @ 2025-09-02  8:46 UTC (permalink / raw)
  To: Fuad Tabba, Roy, Patrick
  Cc: ackerleytng@google.com, Manwaring, Derek, Thomson, Jack,
	Kalyazin, Nikita, kvm@vger.kernel.org, kvmarm@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	pbonzini@redhat.com, rppt@kernel.org, seanjc@google.com,
	vbabka@suse.cz, will@kernel.org, Cali, Marco

On 02.09.25 09:59, Fuad Tabba wrote:
> Hi Patrick,
> 
> On Mon, 1 Sept 2025 at 15:56, Roy, Patrick <roypat@amazon.co.uk> wrote:
>>
>> On Mon, 2025-09-01 at 14:54 +0100, "Roy, Patrick" wrote:
>>>
>>> Hi Fuad!
>>>
>>> On Thu, 2025-08-28 at 11:21 +0100, Fuad Tabba wrote:
>>>> Hi Patrick,
>>>>
>>>> On Thu, 28 Aug 2025 at 10:39, Roy, Patrick <roypat@amazon.co.uk> wrote:
>>>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>>>>> index 12a12dae727d..b52b28ae4636 100644
>>>>> --- a/include/linux/pagemap.h
>>>>> +++ b/include/linux/pagemap.h
>>>>> @@ -211,6 +211,7 @@ enum mapping_flags {
>>>>>                                     folio contents */
>>>>>          AS_INACCESSIBLE = 8,    /* Do not attempt direct R/W access to the mapping */
>>>>>          AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
>>>>> +       AS_NO_DIRECT_MAP = 10,  /* Folios in the mapping are not in the direct map */
>>>>>          /* Bits 16-25 are used for FOLIO_ORDER */
>>>>>          AS_FOLIO_ORDER_BITS = 5,
>>>>>          AS_FOLIO_ORDER_MIN = 16,
>>>>> @@ -346,6 +347,21 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(struct address_spac
>>>>>          return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
>>>>>   }
>>>>>
>>>>> +static inline void mapping_set_no_direct_map(struct address_space *mapping)
>>>>> +{
>>>>> +       set_bit(AS_NO_DIRECT_MAP, &mapping->flags);
>>>>> +}
>>>>> +
>>>>> +static inline bool mapping_no_direct_map(struct address_space *mapping)
>>>>> +{
>>>>> +       return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);
>>>>> +}
>>>>> +
>>>>> +static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma)
>>>>> +{
>>>>> +       return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
>>>>> +}
>>>>> +
>>>> Any reason vma is const whereas mapping in the function that it calls
>>>> (defined above it) isn't?
>>>
>>> Ah, I cannot say that that was a conscious decision, but rather an artifact of
>>> the code that I looked at for reference when writing these two simply did it
>>> this way.  Are you saying both should be const, or neither (in my mind, both
>>> could be const, but the mapping_*() family of functions further up in this file
>>> dont take const arguments, so I'm a bit unsure now)?
>>
>> Hah, just saw
>> https://lore.kernel.org/linux-mm/20250901123028.3383461-3-max.kellermann@ionos.com/.
>> Guess that means "both should be const" then :D
> 
> I don't have any strong preference regarding which way, as long as
> it's consistent. The thing that should be avoided is having one
> function with a parameter marked as const, pass that parameter (or
> something derived from it), to a non-const function. 

I think the compiler will tell you that that is not ok (and you'd have 
to force-cast the const it away).

Agreed that we should be using const * for these simple getter/test 
functions.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
  2025-09-02  8:46           ` David Hildenbrand
@ 2025-09-02  8:50             ` Fuad Tabba
  2025-09-02  9:18               ` Roy, Patrick
  0 siblings, 1 reply; 38+ messages in thread
From: Fuad Tabba @ 2025-09-02  8:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Roy, Patrick, ackerleytng@google.com, Manwaring, Derek,
	Thomson, Jack, Kalyazin, Nikita, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	pbonzini@redhat.com, rppt@kernel.org, seanjc@google.com,
	vbabka@suse.cz, will@kernel.org, Cali, Marco

On Tue, 2 Sept 2025 at 09:46, David Hildenbrand <david@redhat.com> wrote:
>
> On 02.09.25 09:59, Fuad Tabba wrote:
> > Hi Patrick,
> >
> > On Mon, 1 Sept 2025 at 15:56, Roy, Patrick <roypat@amazon.co.uk> wrote:
> >>
> >> On Mon, 2025-09-01 at 14:54 +0100, "Roy, Patrick" wrote:
> >>>
> >>> Hi Fuad!
> >>>
> >>> On Thu, 2025-08-28 at 11:21 +0100, Fuad Tabba wrote:
> >>>> Hi Patrick,
> >>>>
> >>>> On Thu, 28 Aug 2025 at 10:39, Roy, Patrick <roypat@amazon.co.uk> wrote:
> >>>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> >>>>> index 12a12dae727d..b52b28ae4636 100644
> >>>>> --- a/include/linux/pagemap.h
> >>>>> +++ b/include/linux/pagemap.h
> >>>>> @@ -211,6 +211,7 @@ enum mapping_flags {
> >>>>>                                     folio contents */
> >>>>>          AS_INACCESSIBLE = 8,    /* Do not attempt direct R/W access to the mapping */
> >>>>>          AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
> >>>>> +       AS_NO_DIRECT_MAP = 10,  /* Folios in the mapping are not in the direct map */
> >>>>>          /* Bits 16-25 are used for FOLIO_ORDER */
> >>>>>          AS_FOLIO_ORDER_BITS = 5,
> >>>>>          AS_FOLIO_ORDER_MIN = 16,
> >>>>> @@ -346,6 +347,21 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(struct address_spac
> >>>>>          return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
> >>>>>   }
> >>>>>
> >>>>> +static inline void mapping_set_no_direct_map(struct address_space *mapping)
> >>>>> +{
> >>>>> +       set_bit(AS_NO_DIRECT_MAP, &mapping->flags);
> >>>>> +}
> >>>>> +
> >>>>> +static inline bool mapping_no_direct_map(struct address_space *mapping)
> >>>>> +{
> >>>>> +       return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);
> >>>>> +}
> >>>>> +
> >>>>> +static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma)
> >>>>> +{
> >>>>> +       return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
> >>>>> +}
> >>>>> +
> >>>> Any reason vma is const whereas mapping in the function that it calls
> >>>> (defined above it) isn't?
> >>>
> >>> Ah, I cannot say that that was a conscious decision, but rather an artifact of
> >>> the code that I looked at for reference when writing these two simply did it
> >>> this way.  Are you saying both should be const, or neither (in my mind, both
> >>> could be const, but the mapping_*() family of functions further up in this file
> >>> dont take const arguments, so I'm a bit unsure now)?
> >>
> >> Hah, just saw
> >> https://lore.kernel.org/linux-mm/20250901123028.3383461-3-max.kellermann@ionos.com/.
> >> Guess that means "both should be const" then :D
> >
> > I don't have any strong preference regarding which way, as long as
> > it's consistent. The thing that should be avoided is having one
> > function with a parameter marked as const, pass that parameter (or
> > something derived from it), to a non-const function.
>
> I think the compiler will tell you that that is not ok (and you'd have
> to force-cast the const it away).

Not for the scenario I'm worried about. The compiler didn't complain
about this (from this patch):

+static inline bool mapping_no_direct_map(struct address_space *mapping)
+{
+       return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);
+}
+
+static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma)
+{
+       return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
+}

vma_is_no_direct_map() takes a const, but mapping_no_direct_map()
doesn't. For now, mapping_no_direct_map() doesn't modify anything. But
it could, and the compiler wouldn't complain.

Cheers,
/fuad


> Agreed that we should be using const * for these simple getter/test
> functions.
>
> --
> Cheers
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
  2025-09-02  8:50             ` Fuad Tabba
@ 2025-09-02  9:18               ` Roy, Patrick
  2025-09-02  9:21                 ` Fuad Tabba
  0 siblings, 1 reply; 38+ messages in thread
From: Roy, Patrick @ 2025-09-02  9:18 UTC (permalink / raw)
  To: tabba@google.com
  Cc: ackerleytng@google.com, david@redhat.com, Manwaring, Derek,
	Thomson, Jack, Kalyazin, Nikita, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	pbonzini@redhat.com, Roy, Patrick, rppt@kernel.org,
	seanjc@google.com, vbabka@suse.cz, will@kernel.org, Cali, Marco

On Tue, 2025-09-02 at 09:50 +0100, Fuad Tabba wrote:
> On Tue, 2 Sept 2025 at 09:46, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 02.09.25 09:59, Fuad Tabba wrote:
>>> Hi Patrick,
>>>
>>> On Mon, 1 Sept 2025 at 15:56, Roy, Patrick <roypat@amazon.co.uk> wrote:
>>>>
>>>> On Mon, 2025-09-01 at 14:54 +0100, "Roy, Patrick" wrote:
>>>>>
>>>>> Hi Fuad!
>>>>>
>>>>> On Thu, 2025-08-28 at 11:21 +0100, Fuad Tabba wrote:
>>>>>> Hi Patrick,
>>>>>>
>>>>>> On Thu, 28 Aug 2025 at 10:39, Roy, Patrick <roypat@amazon.co.uk> wrote:
>>>>>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>>>>>>> index 12a12dae727d..b52b28ae4636 100644
>>>>>>> --- a/include/linux/pagemap.h
>>>>>>> +++ b/include/linux/pagemap.h
>>>>>>> @@ -211,6 +211,7 @@ enum mapping_flags {
>>>>>>>                                     folio contents */
>>>>>>>          AS_INACCESSIBLE = 8,    /* Do not attempt direct R/W access to the mapping */
>>>>>>>          AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
>>>>>>> +       AS_NO_DIRECT_MAP = 10,  /* Folios in the mapping are not in the direct map */
>>>>>>>          /* Bits 16-25 are used for FOLIO_ORDER */
>>>>>>>          AS_FOLIO_ORDER_BITS = 5,
>>>>>>>          AS_FOLIO_ORDER_MIN = 16,
>>>>>>> @@ -346,6 +347,21 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(struct address_spac
>>>>>>>          return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
>>>>>>>   }
>>>>>>>
>>>>>>> +static inline void mapping_set_no_direct_map(struct address_space *mapping)
>>>>>>> +{
>>>>>>> +       set_bit(AS_NO_DIRECT_MAP, &mapping->flags);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static inline bool mapping_no_direct_map(struct address_space *mapping)
>>>>>>> +{
>>>>>>> +       return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma)
>>>>>>> +{
>>>>>>> +       return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
>>>>>>> +}
>>>>>>> +
>>>>>> Any reason vma is const whereas mapping in the function that it calls
>>>>>> (defined above it) isn't?
>>>>>
>>>>> Ah, I cannot say that that was a conscious decision, but rather an artifact of
>>>>> the code that I looked at for reference when writing these two simply did it
>>>>> this way.  Are you saying both should be const, or neither (in my mind, both
>>>>> could be const, but the mapping_*() family of functions further up in this file
>>>>> dont take const arguments, so I'm a bit unsure now)?
>>>>
>>>> Hah, just saw
>>>> https://lore.kernel.org/linux-mm/20250901123028.3383461-3-max.kellermann@ionos.com/.
>>>> Guess that means "both should be const" then :D
>>>
>>> I don't have any strong preference regarding which way, as long as
>>> it's consistent. The thing that should be avoided is having one
>>> function with a parameter marked as const, pass that parameter (or
>>> something derived from it), to a non-const function.
>>
>> I think the compiler will tell you that that is not ok (and you'd have
>> to force-cast the const it away).
> 
> Not for the scenario I'm worried about. The compiler didn't complain
> about this (from this patch):
> 
> +static inline bool mapping_no_direct_map(struct address_space *mapping)
> +{
> +       return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);
> +}
> +
> +static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma)
> +{
> +       return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
> +}
> 
> vma_is_no_direct_map() takes a const, but mapping_no_direct_map()
> doesn't. For now, mapping_no_direct_map() doesn't modify anything. But
> it could, and the compiler wouldn't complain.

Wouldn't this only be a problem if vma->vm_file->f_mapping was a 'const struct
address_space *const'? I thought const-ness doesn't leak into pointers (e.g.
even above, vma_is_no_direct_map isn't allowed to make vma point at something
else, but it could modify the pointed-to vm_area_struct).

> Cheers,
> /fuad
> 
> 
>> Agreed that we should be using const * for these simple getter/test
>> functions.
>>
>> --
>> Cheers
>>
>> David / dhildenb
>>



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
  2025-09-02  9:18               ` Roy, Patrick
@ 2025-09-02  9:21                 ` Fuad Tabba
  2025-09-02  9:54                   ` David Hildenbrand
  0 siblings, 1 reply; 38+ messages in thread
From: Fuad Tabba @ 2025-09-02  9:21 UTC (permalink / raw)
  To: Roy, Patrick
  Cc: ackerleytng@google.com, david@redhat.com, Manwaring, Derek,
	Thomson, Jack, Kalyazin, Nikita, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	pbonzini@redhat.com, rppt@kernel.org, seanjc@google.com,
	vbabka@suse.cz, will@kernel.org, Cali, Marco

On Tue, 2 Sept 2025 at 10:18, Roy, Patrick <roypat@amazon.co.uk> wrote:
>
> On Tue, 2025-09-02 at 09:50 +0100, Fuad Tabba wrote:
> > On Tue, 2 Sept 2025 at 09:46, David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 02.09.25 09:59, Fuad Tabba wrote:
> >>> Hi Patrick,
> >>>
> >>> On Mon, 1 Sept 2025 at 15:56, Roy, Patrick <roypat@amazon.co.uk> wrote:
> >>>>
> >>>> On Mon, 2025-09-01 at 14:54 +0100, "Roy, Patrick" wrote:
> >>>>>
> >>>>> Hi Fuad!
> >>>>>
> >>>>> On Thu, 2025-08-28 at 11:21 +0100, Fuad Tabba wrote:
> >>>>>> Hi Patrick,
> >>>>>>
> >>>>>> On Thu, 28 Aug 2025 at 10:39, Roy, Patrick <roypat@amazon.co.uk> wrote:
> >>>>>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> >>>>>>> index 12a12dae727d..b52b28ae4636 100644
> >>>>>>> --- a/include/linux/pagemap.h
> >>>>>>> +++ b/include/linux/pagemap.h
> >>>>>>> @@ -211,6 +211,7 @@ enum mapping_flags {
> >>>>>>>                                     folio contents */
> >>>>>>>          AS_INACCESSIBLE = 8,    /* Do not attempt direct R/W access to the mapping */
> >>>>>>>          AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
> >>>>>>> +       AS_NO_DIRECT_MAP = 10,  /* Folios in the mapping are not in the direct map */
> >>>>>>>          /* Bits 16-25 are used for FOLIO_ORDER */
> >>>>>>>          AS_FOLIO_ORDER_BITS = 5,
> >>>>>>>          AS_FOLIO_ORDER_MIN = 16,
> >>>>>>> @@ -346,6 +347,21 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(struct address_spac
> >>>>>>>          return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
> >>>>>>>   }
> >>>>>>>
> >>>>>>> +static inline void mapping_set_no_direct_map(struct address_space *mapping)
> >>>>>>> +{
> >>>>>>> +       set_bit(AS_NO_DIRECT_MAP, &mapping->flags);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +static inline bool mapping_no_direct_map(struct address_space *mapping)
> >>>>>>> +{
> >>>>>>> +       return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma)
> >>>>>>> +{
> >>>>>>> +       return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
> >>>>>>> +}
> >>>>>>> +
> >>>>>> Any reason vma is const whereas mapping in the function that it calls
> >>>>>> (defined above it) isn't?
> >>>>>
> >>>>> Ah, I cannot say that that was a conscious decision, but rather an artifact of
> >>>>> the code that I looked at for reference when writing these two simply did it
> >>>>> this way.  Are you saying both should be const, or neither (in my mind, both
> >>>>> could be const, but the mapping_*() family of functions further up in this file
> >>>>> dont take const arguments, so I'm a bit unsure now)?
> >>>>
> >>>> Hah, just saw
> >>>> https://lore.kernel.org/linux-mm/20250901123028.3383461-3-max.kellermann@ionos.com/.
> >>>> Guess that means "both should be const" then :D
> >>>
> >>> I don't have any strong preference regarding which way, as long as
> >>> it's consistent. The thing that should be avoided is having one
> >>> function with a parameter marked as const, pass that parameter (or
> >>> something derived from it), to a non-const function.
> >>
> >> I think the compiler will tell you that that is not ok (and you'd have
> >> to force-cast the const it away).
> >
> > Not for the scenario I'm worried about. The compiler didn't complain
> > about this (from this patch):
> >
> > +static inline bool mapping_no_direct_map(struct address_space *mapping)
> > +{
> > +       return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);
> > +}
> > +
> > +static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma)
> > +{
> > +       return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
> > +}
> >
> > vma_is_no_direct_map() takes a const, but mapping_no_direct_map()
> > doesn't. For now, mapping_no_direct_map() doesn't modify anything. But
> > it could, and the compiler wouldn't complain.
>
> Wouldn't this only be a problem if vma->vm_file->f_mapping was a 'const struct
> address_space *const'? I thought const-ness doesn't leak into pointers (e.g.
> even above, vma_is_no_direct_map isn't allowed to make vma point at something
> else, but it could modify the pointed-to vm_area_struct).

That's the thing, constness checks don't carry over to pointers within
a struct, but a person reading the code would assume that a function
with a parameter marked as const wouldn't modify anything related to
that parameter.

Cheers,
/fuad

> > Cheers,
> > /fuad
> >
> >
> >> Agreed that we should be using const * for these simple getter/test
> >> functions.
> >>
> >> --
> >> Cheers
> >>
> >> David / dhildenb
> >>
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP
  2025-09-02  9:21                 ` Fuad Tabba
@ 2025-09-02  9:54                   ` David Hildenbrand
  0 siblings, 0 replies; 38+ messages in thread
From: David Hildenbrand @ 2025-09-02  9:54 UTC (permalink / raw)
  To: Fuad Tabba, Roy, Patrick
  Cc: ackerleytng@google.com, Manwaring, Derek, Thomson, Jack,
	Kalyazin, Nikita, kvm@vger.kernel.org, kvmarm@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	pbonzini@redhat.com, rppt@kernel.org, seanjc@google.com,
	vbabka@suse.cz, will@kernel.org, Cali, Marco

On 02.09.25 11:21, Fuad Tabba wrote:
> On Tue, 2 Sept 2025 at 10:18, Roy, Patrick <roypat@amazon.co.uk> wrote:
>>
>> On Tue, 2025-09-02 at 09:50 +0100, Fuad Tabba wrote:
>>> On Tue, 2 Sept 2025 at 09:46, David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 02.09.25 09:59, Fuad Tabba wrote:
>>>>> Hi Patrick,
>>>>>
>>>>> On Mon, 1 Sept 2025 at 15:56, Roy, Patrick <roypat@amazon.co.uk> wrote:
>>>>>>
>>>>>> On Mon, 2025-09-01 at 14:54 +0100, "Roy, Patrick" wrote:
>>>>>>>
>>>>>>> Hi Fuad!
>>>>>>>
>>>>>>> On Thu, 2025-08-28 at 11:21 +0100, Fuad Tabba wrote:
>>>>>>>> Hi Patrick,
>>>>>>>>
>>>>>>>> On Thu, 28 Aug 2025 at 10:39, Roy, Patrick <roypat@amazon.co.uk> wrote:
>>>>>>>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>>>>>>>>> index 12a12dae727d..b52b28ae4636 100644
>>>>>>>>> --- a/include/linux/pagemap.h
>>>>>>>>> +++ b/include/linux/pagemap.h
>>>>>>>>> @@ -211,6 +211,7 @@ enum mapping_flags {
>>>>>>>>>                                      folio contents */
>>>>>>>>>           AS_INACCESSIBLE = 8,    /* Do not attempt direct R/W access to the mapping */
>>>>>>>>>           AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
>>>>>>>>> +       AS_NO_DIRECT_MAP = 10,  /* Folios in the mapping are not in the direct map */
>>>>>>>>>           /* Bits 16-25 are used for FOLIO_ORDER */
>>>>>>>>>           AS_FOLIO_ORDER_BITS = 5,
>>>>>>>>>           AS_FOLIO_ORDER_MIN = 16,
>>>>>>>>> @@ -346,6 +347,21 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(struct address_spac
>>>>>>>>>           return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
>>>>>>>>>    }
>>>>>>>>>
>>>>>>>>> +static inline void mapping_set_no_direct_map(struct address_space *mapping)
>>>>>>>>> +{
>>>>>>>>> +       set_bit(AS_NO_DIRECT_MAP, &mapping->flags);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static inline bool mapping_no_direct_map(struct address_space *mapping)
>>>>>>>>> +{
>>>>>>>>> +       return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma)
>>>>>>>>> +{
>>>>>>>>> +       return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>> Any reason vma is const whereas mapping in the function that it calls
>>>>>>>> (defined above it) isn't?
>>>>>>>
>>>>>>> Ah, I cannot say that that was a conscious decision, but rather an artifact of
>>>>>>> the code that I looked at for reference when writing these two simply did it
>>>>>>> this way.  Are you saying both should be const, or neither (in my mind, both
>>>>>>> could be const, but the mapping_*() family of functions further up in this file
>>>>>>> dont take const arguments, so I'm a bit unsure now)?
>>>>>>
>>>>>> Hah, just saw
>>>>>> https://lore.kernel.org/linux-mm/20250901123028.3383461-3-max.kellermann@ionos.com/.
>>>>>> Guess that means "both should be const" then :D
>>>>>
>>>>> I don't have any strong preference regarding which way, as long as
>>>>> it's consistent. The thing that should be avoided is having one
>>>>> function with a parameter marked as const, pass that parameter (or
>>>>> something derived from it), to a non-const function.
>>>>
>>>> I think the compiler will tell you that that is not ok (and you'd have
>>>> to force-cast the const it away).
>>>
>>> Not for the scenario I'm worried about. The compiler didn't complain
>>> about this (from this patch):
>>>
>>> +static inline bool mapping_no_direct_map(struct address_space *mapping)
>>> +{
>>> +       return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);
>>> +}
>>> +
>>> +static inline bool vma_is_no_direct_map(const struct vm_area_struct *vma)
>>> +{
>>> +       return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
>>> +}
>>>
>>> vma_is_no_direct_map() takes a const, but mapping_no_direct_map()
>>> doesn't. For now, mapping_no_direct_map() doesn't modify anything. But
>>> it could, and the compiler wouldn't complain.
>>
>> Wouldn't this only be a problem if vma->vm_file->f_mapping was a 'const struct
>> address_space *const'? I thought const-ness doesn't leak into pointers (e.g.
>> even above, vma_is_no_direct_map isn't allowed to make vma point at something
>> else, but it could modify the pointed-to vm_area_struct).
> 
> That's the thing, constness checks don't carry over to pointers within
> a struct, but a person reading the code would assume that a function
> with a parameter marked as const wouldn't modify anything related to
> that parameter.

Ah, thanks, I forgot that detail, it's only for embedded structs but not 
pointers.

I wonder if something (sparse?) could detect such cases.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2025-09-02  9:55 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-28  9:39 [PATCH v5 00/12] Direct Map Removal Support for guest_memfd Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 01/12] filemap: Pass address_space mapping to ->free_folio() Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 02/12] arch: export set_direct_map_valid_noflush to KVM module Roy, Patrick
2025-08-28 10:07   ` Fuad Tabba
2025-09-01 13:47     ` Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP Roy, Patrick
2025-08-28 10:21   ` Fuad Tabba
2025-09-01 13:54     ` Roy, Patrick
2025-09-01 14:56       ` Roy, Patrick
2025-09-02  7:59         ` Fuad Tabba
2025-09-02  8:46           ` David Hildenbrand
2025-09-02  8:50             ` Fuad Tabba
2025-09-02  9:18               ` Roy, Patrick
2025-09-02  9:21                 ` Fuad Tabba
2025-09-02  9:54                   ` David Hildenbrand
2025-08-28 14:26   ` Mike Rapoport
2025-09-01 13:56     ` Roy, Patrick
2025-08-28 21:00   ` David Hildenbrand
2025-09-01 14:03     ` Roy, Patrick
2025-08-31 10:26   ` kernel test robot
2025-09-01 14:05     ` Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 04/12] KVM: guest_memfd: Add flag to remove from direct map Roy, Patrick
2025-08-28 14:54   ` Mike Rapoport
2025-09-01 14:22     ` Roy, Patrick
2025-09-01 14:27       ` Mike Rapoport
2025-08-28  9:39 ` [PATCH v5 05/12] KVM: Documentation: describe GUEST_MEMFD_FLAG_NO_DIRECT_MAP Roy, Patrick
2025-08-28 10:27   ` David Hildenbrand
2025-09-01 14:30     ` Roy, Patrick
2025-09-01 14:43       ` David Hildenbrand
2025-08-28  9:39 ` [PATCH v5 06/12] KVM: selftests: load elf via bounce buffer Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 07/12] KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1 Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 08/12] KVM: selftests: Add guest_memfd based vm_mem_backing_src_types Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 09/12] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 10/12] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in mem conversion tests Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 11/12] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in guest_memfd_test.c Roy, Patrick
2025-08-28 10:26   ` David Hildenbrand
2025-08-28  9:39 ` [PATCH v5 12/12] KVM: selftests: Test guest execution from direct map removed gmem Roy, Patrick
2025-08-28 12:50 ` [PATCH v5 00/12] Direct Map Removal Support for guest_memfd David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).